Close

Using MERGE to INSERT overlapping source data

Recently I’ve been working with a data source that generates a new file each day for me to import. That seemed simple enough, I created an external table to read the file and then run an insert statement to load the data into a target table. However, the source files sometimes overlap each other. That is, on day 1, I might get ids 1,2,3,4,5. Then on day 2, I might get ids 3,4,5,6,7,8,9. I could simply use a MERGE statement to update matching records and then insert the new unmatched ones. That’s what MERGE is for; but in this case I know the source data has static history. So, if I update ID 3 on the second day because I find a match from the first day, I know the update won’t actually change any data. Thus making for an inefficient process doing dummy work.

As an alternate solution, I could use insert with a unique constraint to prevent duplicates and use DBMS_ERRLOG to create a log table to catch the duplicate rows, thus allowing the insert to continue rather than aborting when a duplicate is found. This too though seems like a waste of resources.

Of course, another option would be to do a select from the source table that excludes the matching ids with an anti-join, not-exists subquery, or not-in subquery.

All of these variations would work but are less desirable in that they require extra steps or are needlessly complex.

The easiest method I found was to simply use a one-sided merge. That is, only define a WHEN NOT MATCHED condition, so the matching records simply drop out of the merge statement when they are detected without extra processing or raising exceptions.

As a test case to illustrate, I’ll use a normal table instead of an external table as the source and simulate 3 days of results to load into the target table.

CREATE TABLE target
(
    id         NUMBER,
    textval    VARCHAR2(10),
    dateval    DATE,
    numval     NUMBER
);

CREATE TABLE source
(
    id         NUMBER,
    textval    VARCHAR2(10),
    dateval    DATE,
    numval     NUMBER
);

On day 1 we receive 5 rows of data.

-- Source data Day-1   
truncate table source;            
Insert into source (id, textval, dateval, numval) values (1,'one',  date '2018-09-02',1);
Insert into source (id, textval, dateval, numval) values (2,'two',  date '2018-09-02',4);
Insert into source (id, textval, dateval, numval) values (3,'three',date '2018-09-02',9);
Insert into source (id, textval, dateval, numval) values (4,'four', date '2018-09-02',16);
Insert into source (id, textval, dateval, numval) values (5,'five', date '2018-09-02',25);
commit;

Then use merge to insert the first day’s data.

SQL> MERGE INTO target t
  2       USING source s
  3          ON (s.id = t.id)
  4  WHEN NOT MATCHED
  5  THEN
  6      INSERT     (id,
  7                  textval,
  8                  dateval,
  9                  numval)
 10          VALUES (s.id,
 11                  s.textval,
 12                  s.dateval,
 13                  s.numval);

5 rows merged.

SQL> COMMIT;

Commit complete.

SQL> SELECT * FROM target ORDER BY id;

        ID TEXTVAL    DATEVAL       NUMVAL
---------- ---------- --------- ----------
         1 one        02-SEP-18          1
         2 two        02-SEP-18          4
         3 three      02-SEP-18          9
         4 four       02-SEP-18         16
         5 five       02-SEP-18         25

Then on day 2 we receive 4 rows of data, 2 repeats and 2 new one ones.

-- Source data Day-2 
truncate table source;  
Insert into source (id, textval, dateval, numval) values (4,'four', date '2018-09-02',16);
Insert into source (id, textval, dateval, numval) values (5,'five', date '2018-09-02',25);
Insert into source (id, textval, dateval, numval) values (6,'six',  date '2018-09-03',36);
Insert into source (id, textval, dateval, numval) values (7,'seven',date '2018-09-03',49);
commit;

Then merge again. Note only the two new rows are merged.

SQL> MERGE INTO target t
  2       USING source s
  3          ON (s.id = t.id)
  4  WHEN NOT MATCHED
  5  THEN
  6      INSERT     (id,
  7                  textval,
  8                  dateval,
  9                  numval)
 10          VALUES (s.id,
 11                  s.textval,
 12                  s.dateval,
 13                  s.numval);

2 rows merged.

SQL> COMMIT;

Commit complete.

SQL> SELECT * FROM target ORDER BY id;

        ID TEXTVAL    DATEVAL       NUMVAL
---------- ---------- --------- ----------
         1 one        02-SEP-18          1
         2 two        02-SEP-18          4
         3 three      02-SEP-18          9
         4 four       02-SEP-18         16
         5 five       02-SEP-18         25
         6 six        03-SEP-18         36
         7 seven      03-SEP-18         49

On day 3 we receive 4 more rows, this time with 1 duplicates and 3 new ones.

-- Source data Day-3
truncate table source;
Insert into source (id, textval, dateval, numval) values (7, 'seven',date '2018-09-03',49);
Insert into source (id, textval, dateval, numval) values (8, 'eight',date '2018-09-04',64);
Insert into source (id, textval, dateval, numval) values (9, 'nine', date '2018-09-04',81);
Insert into source (id, textval, dateval, numval) values (10,'ten',  date '2018-09-04',100);
commit;

Merge one more time and we have 3 new rows merged and only the 10 distinct copies from all 3 days in the target table.

SQL> MERGE INTO target t
  2       USING source s
  3          ON (s.id = t.id)
  4  WHEN NOT MATCHED
  5  THEN
  6      INSERT     (id,
  7                  textval,
  8                  dateval,
  9                  numval)
 10          VALUES (s.id,
 11                  s.textval,
 12                  s.dateval,
 13                  s.numval);

3 rows merged.

SQL> COMMIT;

Commit complete.

SQL> SELECT * FROM target ORDER BY id;

        ID TEXTVAL    DATEVAL       NUMVAL
---------- ---------- --------- ----------
         1 one        02-SEP-18          1
         2 two        02-SEP-18          4
         3 three      02-SEP-18          9
         4 four       02-SEP-18         16
         5 five       02-SEP-18         25
         6 six        03-SEP-18         36
         7 seven      03-SEP-18         49
         8 eight      04-SEP-18         64
         9 nine       04-SEP-18         81
        10 ten        04-SEP-18        100