Recently I was asked how to bind an R variable to a SQL statement executed with ROracle. In this case it wasn’t a simple “select * from table where x = :b1 and y = :b2.”
You can do those quite easily with a data.frame with only one row in it using
dbGetQuery(connection, sqlstatement, data.frame)
It will also work for some bulk binding, such as a multi-row insert from a data.frame or a vector.
Here though, the developer wanted to pass in an R set as an In-List for a where clause. That is something like “select * from table where (x,y) in (:b1)” where the bind variable was a data.frame consisting of a few hundred observations of 2 variables each. Of course, the syntax shown is only pseudo-code, which further complicated the requirements. How to create the query with valid syntax and how to pass the data.frame to the query as a bind variable?
In the context of a select statement neither a data.frame nor a vector is legal as an input parameter if they have more than one row. Unfortunately that means we’ll have to take the extra step of converting the data into a legal data type to pass it through the dbGetQuery interface and then do something with that converted data within the SQL statement.
A simple and, more importantly, legal option is to convert the data into a delimited chr variable. On the Oracle side this will be interpreted as a VARCHAR2 or a CLOB depending on the size, either of which is easily parsed back into the original rows and columns as needed.
First, let’s use a simple vector with just a few elements. We’ll look up a few employees from the sample SCOTT.EMP table. We can’t use “in (:b1)” as shown above, but if we can generate a collection we can use “member of :b1.”
Since we’re going to pass a delimited chr value from R, we need to convert that text into a collection. Previously I posted how to split a clob into a nested table of varchar2 values. We’ll use that here too. For this first example the clob handling will be a bit of overkill since our text string will be small; but it will still illustrate the process.
So, first, we’ll construct a vector, then make a chr variable from the vector. Finally, pass the chr into a query using the split_clob function to create a nested table collection for use with the member of condition.
> employees_vector <- c("SCOTT","KING","ADAMS") > employees_vector [1] "SCOTT" "KING" "ADAMS" > employees_chr <- paste(employees_vector,collapse=",") > employees_chr [1] "SCOTT,KING,ADAMS" > employees_df <- dbGetQuery(con,"select * from emp where ename member of split_clob(:b1,',')",employees_chr) > employees_df EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO 1 7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 NA 20 2 7839 KING PRESIDENT NA 1981-11-16 23:00:00 5000 NA 10 3 7876 ADAMS CLERK 7788 1987-05-23 00:00:00 1100 NA 20
It is possible to use an IN clause in the query, but you must create a subquery for the condition. So, using the same chr variable we still use split_clob, but we then use the TABLE function to use the resulting collection as a data source for the subquery.
> employees_df2 <- dbGetQuery(con,"select * from emp where ename in (select * from table(split_clob(:b1,',')))",employees_chr) > employees_df2 EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO 1 7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 NA 20 2 7839 KING PRESIDENT NA 1981-11-16 23:00:00 5000 NA 10 3 7876 ADAMS CLERK 7788 1987-05-23 00:00:00 1100 NA 20
It is also possible to use the text field directly with a simple INSTR (where instr(‘KING,SCOTT,ADAMS’,ename) > 0,) but doing so reliably is more difficult. Also, by leaving the table column untouched then indexes on the table can be used more reliably. Due to these limitations I’m not providing examples. Again, it is possible to do so, but not recommended.
The examples above are for a sets where each row only contains a single value (a vector, or a data.frame of a single column.) But what if you need multi-column checks? I.e. Something of the form “select * from table1 where (a,b) in (select x,y from table2).” Where “table2” is somehow based on our data.frame contents.
While the basic idea is the same, the use of multiple columns in the condition creates an additional challenge because we need to somehow encode the rows and columns into a chr field such that the fields are distinct but still grouped by row. Also the split_clob function will only generate one value for each row instead of reconstructing all of the individual fields.
First, on the R side, we’ll use paste function again, but twice, once with the separation delimiter and then again with the collapse delimiter.
> input_df <- data.frame(jobs=c("CLERK","CLERK","ANALYST","PRESIDENT"),depts=c("SALES","ACCOUNTING","RESEARCH","ACCOUNTING")) > input_df jobs depts 1 CLERK SALES 2 CLERK ACCOUNTING 3 ANALYST RESEARCH 4 PRESIDENT ACCOUNTING > input_chr <- paste(paste(input_df$jobs,input_df$depts,sep="|"),collapse=",") > input_chr [1] "CLERK|SALES,CLERK|ACCOUNTING,ANALYST|RESEARCH,PRESIDENT|ACCOUNTING"
Now we have a single chr field, where each row is delimited with commas and the fields within the row are delimited with pipes. Using split_clob we can separate the string into 4 fields and then parse each of those into a row of 2 fields.
Removing R for a moment, we can test the splitting and parsing to see what the subquery will return.
SELECT SUBSTR(COLUMN_VALUE, 1, INSTR(COLUMN_VALUE, '|') - 1) jobs, SUBSTR(COLUMN_VALUE, INSTR(COLUMN_VALUE, '|') + 1) depts FROM TABLE(split_clob('CLERK|SALES,CLERK|ACCOUNTING,ANALYST|RESEARCH,PRESIDENT|ACCOUNTING', ',')); JOBS DEPTS ------------ -------------- CLERK SALES CLERK ACCOUNTING ANALYST RESEARCH PRESIDENT ACCOUNTING 4 rows selected.
Now we put it all together and pull some information from the emp and dept tables about our job/department pairs.
> emp_dept_df <- dbGetQuery(con,"SELECT emp.ename, emp.job, emp.sal, dept.dname, dept.loc + FROM emp INNER JOIN dept ON emp.deptno = dept.deptno + WHERE (job, dname) IN + (SELECT SUBSTR(COLUMN_VALUE, 1, INSTR(COLUMN_VALUE, '|') - 1) jobs, + SUBSTR(COLUMN_VALUE, INSTR(COLUMN_VALUE, '|') + 1) depts + FROM TABLE(split_clob( :b1, ','))) + ORDER BY dname, ename" + ,input_chr) > emp_dept_df ENAME JOB SAL DNAME LOC 1 KING PRESIDENT 5000 ACCOUNTING NEW YORK 2 MILLER CLERK 1300 ACCOUNTING NEW YORK 3 FORD ANALYST 3000 RESEARCH DALLAS 4 SCOTT ANALYST 3000 RESEARCH DALLAS 5 JAMES CLERK 950 SALES CHICAGO
Obviously as the number of fields in a data.frame expands the parsing will get longer and more complicated. If you’ll be performing similar queries often, you may want to build a dedicated function within the database that combines the split_clob functionality with the parsing of the individual fields and returns a collection of user-defined types. Also, these examples used text fields and columns but could include dates or numeric values. In that case you would need to ensure consistent formatting in the string construction so the parsing can be accomplished correctly and reliably. The overall technique remains the same though.
As mentioned above, these steps do entail extra processing on both the R side as well as the Oracle side of the interface; but for most uses cases the extra resource consumption will hopefully be minor and this method provides a work around to a limitation of syntax in the ROracle/DBI functionality suite.
I hope you find it useful.