Recently I’ve been working with a data source that generates a new file each day for me to import. That seemed simple enough, I created an external table to read the file and then run an insert statement to load the data into a target table. However, the source files sometimes overlap each other. That is, on day 1, I might get ids 1,2,3,4,5. Then on day 2, I might get ids 3,4,5,6,7,8,9. I could simply use a MERGE statement to update matching records and then insert the new unmatched ones. That’s what MERGE is for; but in this case I know the source data has static history. So, if I update ID 3 on the second day because I find a match from the first day, I know the update won’t actually change any data. Thus making for an inefficient process doing dummy work.
As an alternate solution, I could use insert with a unique constraint to prevent duplicates and use DBMS_ERRLOG to create a log table to catch the duplicate rows, thus allowing the insert to continue rather than aborting when a duplicate is found. This too though seems like a waste of resources.
Of course, another option would be to do a select from the source table that excludes the matching ids with an anti-join, not-exists subquery, or not-in subquery.
All of these variations would work but are less desirable in that they require extra steps or are needlessly complex.
The easiest method I found was to simply use a one-sided merge. That is, only define a WHEN NOT MATCHED condition, so the matching records simply drop out of the merge statement when they are detected without extra processing or raising exceptions.
As a test case to illustrate, I’ll use a normal table instead of an external table as the source and simulate 3 days of results to load into the target table.
CREATE TABLE target ( id NUMBER, textval VARCHAR2(10), dateval DATE, numval NUMBER ); CREATE TABLE source ( id NUMBER, textval VARCHAR2(10), dateval DATE, numval NUMBER );
On day 1 we receive 5 rows of data.
-- Source data Day-1 truncate table source; Insert into source (id, textval, dateval, numval) values (1,'one', date '2018-09-02',1); Insert into source (id, textval, dateval, numval) values (2,'two', date '2018-09-02',4); Insert into source (id, textval, dateval, numval) values (3,'three',date '2018-09-02',9); Insert into source (id, textval, dateval, numval) values (4,'four', date '2018-09-02',16); Insert into source (id, textval, dateval, numval) values (5,'five', date '2018-09-02',25); commit;
Then use merge to insert the first day’s data.
SQL> MERGE INTO target t 2 USING source s 3 ON (s.id = t.id) 4 WHEN NOT MATCHED 5 THEN 6 INSERT (id, 7 textval, 8 dateval, 9 numval) 10 VALUES (s.id, 11 s.textval, 12 s.dateval, 13 s.numval); 5 rows merged. SQL> COMMIT; Commit complete. SQL> SELECT * FROM target ORDER BY id; ID TEXTVAL DATEVAL NUMVAL ---------- ---------- --------- ---------- 1 one 02-SEP-18 1 2 two 02-SEP-18 4 3 three 02-SEP-18 9 4 four 02-SEP-18 16 5 five 02-SEP-18 25
Then on day 2 we receive 4 rows of data, 2 repeats and 2 new one ones.
-- Source data Day-2 truncate table source; Insert into source (id, textval, dateval, numval) values (4,'four', date '2018-09-02',16); Insert into source (id, textval, dateval, numval) values (5,'five', date '2018-09-02',25); Insert into source (id, textval, dateval, numval) values (6,'six', date '2018-09-03',36); Insert into source (id, textval, dateval, numval) values (7,'seven',date '2018-09-03',49); commit;
Then merge again. Note only the two new rows are merged.
SQL> MERGE INTO target t 2 USING source s 3 ON (s.id = t.id) 4 WHEN NOT MATCHED 5 THEN 6 INSERT (id, 7 textval, 8 dateval, 9 numval) 10 VALUES (s.id, 11 s.textval, 12 s.dateval, 13 s.numval); 2 rows merged. SQL> COMMIT; Commit complete. SQL> SELECT * FROM target ORDER BY id; ID TEXTVAL DATEVAL NUMVAL ---------- ---------- --------- ---------- 1 one 02-SEP-18 1 2 two 02-SEP-18 4 3 three 02-SEP-18 9 4 four 02-SEP-18 16 5 five 02-SEP-18 25 6 six 03-SEP-18 36 7 seven 03-SEP-18 49
On day 3 we receive 4 more rows, this time with 1 duplicates and 3 new ones.
-- Source data Day-3 truncate table source; Insert into source (id, textval, dateval, numval) values (7, 'seven',date '2018-09-03',49); Insert into source (id, textval, dateval, numval) values (8, 'eight',date '2018-09-04',64); Insert into source (id, textval, dateval, numval) values (9, 'nine', date '2018-09-04',81); Insert into source (id, textval, dateval, numval) values (10,'ten', date '2018-09-04',100); commit;
Merge one more time and we have 3 new rows merged and only the 10 distinct copies from all 3 days in the target table.
SQL> MERGE INTO target t 2 USING source s 3 ON (s.id = t.id) 4 WHEN NOT MATCHED 5 THEN 6 INSERT (id, 7 textval, 8 dateval, 9 numval) 10 VALUES (s.id, 11 s.textval, 12 s.dateval, 13 s.numval); 3 rows merged. SQL> COMMIT; Commit complete. SQL> SELECT * FROM target ORDER BY id; ID TEXTVAL DATEVAL NUMVAL ---------- ---------- --------- ---------- 1 one 02-SEP-18 1 2 two 02-SEP-18 4 3 three 02-SEP-18 9 4 four 02-SEP-18 16 5 five 02-SEP-18 25 6 six 03-SEP-18 36 7 seven 03-SEP-18 49 8 eight 04-SEP-18 64 9 nine 04-SEP-18 81 10 ten 04-SEP-18 100