How to pass a vector or data.frame from R to Oracle as a bind variable in a select statement using ROracle.

Recently I was asked how to bind an R variable to a SQL statement executed with ROracle. In this case it wasn’t a simple “select * from table where x = :b1 and y = :b2.”

You can do those quite easily with a data.frame with only one row in it using

dbGetQuery(connection, sqlstatement, data.frame)

It will also work for some bulk binding, such as a multi-row insert from a data.frame or a vector.

Here though, the developer wanted to pass in an R set as an In-List for a where clause. That is something like “select * from table where (x,y) in (:b1)” where the bind variable was a data.frame consisting of a few hundred observations of 2 variables each. Of course, the syntax shown is only pseudo-code, which further complicated the requirements. How to create the query with valid syntax and how to pass the data.frame to the query as a bind variable?

In the context of a select statement neither a data.frame nor a vector is legal as an input parameter if they have more than one row. Unfortunately that means we’ll have to take the extra step of converting the data into a legal data type to pass it through the dbGetQuery interface and then do something with that converted data within the SQL statement.

A simple and, more importantly, legal option is to convert the data into a delimited chr variable. On the Oracle side this will be interpreted as a VARCHAR2 or a CLOB depending on the size, either of which is easily parsed back into the original rows and columns as needed.

First, let’s use a simple vector with just a few elements. We’ll look up a few employees from the sample SCOTT.EMP table. We can’t use “in (:b1)” as shown above, but if we can generate a collection we can use “member of :b1.”

Since we’re going to pass a delimited chr value from R, we need to convert that text into a collection. Previously I posted how to split a clob into a nested table of varchar2 values. We’ll use that here too. For this first example the clob handling will be a bit of overkill since our text string will be small; but it will still illustrate the process.

So, first, we’ll construct a vector, then make a chr variable from the vector. Finally, pass the chr into a query using the split_clob function to create a nested table collection for use with the member of condition.

> employees_vector <- c("SCOTT","KING","ADAMS")
> employees_vector
[1] "SCOTT" "KING" "ADAMS"

> employees_chr <- paste(employees_vector,collapse=",")
> employees_chr
[1] "SCOTT,KING,ADAMS"

> employees_df <- dbGetQuery(con,"select * from emp where ename member of split_clob(:b1,',')",employees_chr)
> employees_df
  EMPNO ENAME       JOB   MGR            HIREDATE  SAL COMM DEPTNO
1  7788 SCOTT   ANALYST  7566 1987-04-19 00:00:00 3000   NA     20
2  7839 KING  PRESIDENT    NA 1981-11-16 23:00:00 5000   NA     10
3  7876 ADAMS     CLERK  7788 1987-05-23 00:00:00 1100   NA     20

It is possible to use an IN clause in the query, but you must create a subquery for the condition. So, using the same chr variable we still use split_clob, but we then use the TABLE function to use the resulting collection as a data source for the subquery.

> employees_df2 <- dbGetQuery(con,"select * from emp where ename in (select * from table(split_clob(:b1,',')))",employees_chr)
> employees_df2
  EMPNO ENAME       JOB  MGR            HIREDATE  SAL COMM DEPTNO
1  7788 SCOTT   ANALYST 7566 1987-04-19 00:00:00 3000   NA     20
2  7839 KING  PRESIDENT   NA 1981-11-16 23:00:00 5000   NA     10
3  7876 ADAMS     CLERK 7788 1987-05-23 00:00:00 1100   NA     20

It is also possible to use the text field directly with a simple INSTR (where instr(‘KING,SCOTT,ADAMS’,ename) > 0,) but doing so reliably is more difficult. Also, by leaving the table column untouched then indexes on the table can be used more reliably. Due to these limitations I’m not providing examples. Again, it is possible to do so, but not recommended.

The examples above are for a sets where each row only contains a single value (a vector, or a data.frame of a single column.) But what if you need multi-column checks? I.e. Something of the form “select * from table1 where (a,b) in (select x,y from table2).” Where “table2” is somehow based on our data.frame contents.

While the basic idea is the same, the use of multiple columns in the condition creates an additional challenge because we need to somehow encode the rows and columns into a chr field such that the fields are distinct but still grouped by row. Also the split_clob function will only generate one value for each row instead of reconstructing all of the individual fields.

First, on the R side, we’ll use paste function again, but twice, once with the separation delimiter and then again with the collapse delimiter.

> input_df <- data.frame(jobs=c("CLERK","CLERK","ANALYST","PRESIDENT"),depts=c("SALES","ACCOUNTING","RESEARCH","ACCOUNTING"))
> input_df
       jobs      depts
1     CLERK      SALES
2     CLERK ACCOUNTING
3   ANALYST   RESEARCH
4 PRESIDENT ACCOUNTING

> input_chr <- paste(paste(input_df$jobs,input_df$depts,sep="|"),collapse=",")
> input_chr
[1] "CLERK|SALES,CLERK|ACCOUNTING,ANALYST|RESEARCH,PRESIDENT|ACCOUNTING"

Now we have a single chr field, where each row is delimited with commas and the fields within the row are delimited with pipes. Using split_clob we can separate the string into 4 fields and then parse each of those into a row of 2 fields.

Removing R for a moment, we can test the splitting and parsing to see what the subquery will return.

SELECT SUBSTR(COLUMN_VALUE, 1, INSTR(COLUMN_VALUE, '|') - 1) jobs,
 SUBSTR(COLUMN_VALUE, INSTR(COLUMN_VALUE, '|') + 1) depts
 FROM TABLE(split_clob('CLERK|SALES,CLERK|ACCOUNTING,ANALYST|RESEARCH,PRESIDENT|ACCOUNTING', ','));

JOBS         DEPTS 
------------ --------------
CLERK        SALES 
CLERK        ACCOUNTING 
ANALYST      RESEARCH 
PRESIDENT    ACCOUNTING

4 rows selected.

Now we put it all together and pull some information from the emp and dept tables about our job/department pairs.

> emp_dept_df <- dbGetQuery(con,"SELECT emp.ename, emp.job, emp.sal, dept.dname, dept.loc
+                                  FROM emp INNER JOIN dept ON emp.deptno = dept.deptno
+                                 WHERE (job, dname) IN
+                                      (SELECT SUBSTR(COLUMN_VALUE, 1, INSTR(COLUMN_VALUE, '|') - 1) jobs,
+                                              SUBSTR(COLUMN_VALUE, INSTR(COLUMN_VALUE, '|') + 1) depts
+                                         FROM TABLE(split_clob( :b1, ',')))
+                                ORDER BY dname, ename"
+                          ,input_chr)
> emp_dept_df
   ENAME       JOB  SAL      DNAME      LOC
1   KING PRESIDENT 5000 ACCOUNTING NEW YORK
2 MILLER     CLERK 1300 ACCOUNTING NEW YORK
3   FORD   ANALYST 3000   RESEARCH   DALLAS
4  SCOTT   ANALYST 3000   RESEARCH   DALLAS
5  JAMES     CLERK  950      SALES  CHICAGO

Obviously as the number of fields in a data.frame expands the parsing will get longer and more complicated. If you’ll be performing similar queries often, you may want to build a dedicated function within the database that combines the split_clob functionality with the parsing of the individual fields and returns a collection of user-defined types. Also, these examples used text fields and columns but could include dates or numeric values. In that case you would need to ensure consistent formatting in the string construction so the parsing can be accomplished correctly and reliably. The overall technique remains the same though.

As mentioned above, these steps do entail extra processing on both the R side as well as the Oracle side of the interface; but for most uses cases the extra resource consumption will hopefully be minor and this method provides a work around to a limitation of syntax in the ROracle/DBI functionality suite.

I hope you find it useful.

Leave a ReplyCancel reply