One of the most common operation performed in any Database is “TO INSERT RECORDS IN A TABLE ONLY IF IT IS A NEW RECORD ELSE UPDATE THE EXISTING RECORD“.
There can be different possible solution for it depending on the RDBMS you are using like MERGE or UPSERT etc. Some even suggests using some inbuilt functions specific to some Databases. However I thought of sharing a generic method which can be easily applied to any database and without any dependency.
The easiest way to implement INSERT RECORDS ONLY IF IT IS NEW ELSE UPDATE is by adding new column name FLAG which will help in determining whether we need to insert record or update it. Now this may come with overhead of having an extra table but in some RDBMS like REDSHIFT this is the preferred method.
SELECT SRC.* , CASE WHEN TGT.COL IS NOT NULL THEN 'UPDATE' ELSE 'INSERT' END FLAG FROM SOURCE SRC LEFT OUTER JOIN TARGET TGT ON SRC.KEYCOLUMN = TGT.KEYCOLUMN;
We will understand above query in detail below by taking example of two tables (SOURCE and TARGET) with just two columns NAME and CITY. If we receive same NAME from source which already exists then it will be UPDATE else it will INSERT.
Let’s create Source Table & Target table first.
CREATE TABLE DATABASE.SOURCE_tbl ( NAME VARCHAR(20) , CITY VARCHAR(20)) PRIMARY INDEX (NAME); CREATE TABLE DATABASE.TARGET_tbl ( NAME VARCHAR(20) , CITY VARCHAR(20)) PRIMARY INDEX (NAME);
Now situation is if we are receiving NAME from source which already exists in target then it will be UPDATE else it will be INSERT.
Source Data:
NAME CITY
NITI LUCKNOW
NITIN MAHANAGAR
ALOK MANGALORE
RAJ WAYANAD
MOHI DELHI
Target Data:
NAME CITY
NITI LKO
ALOK MYSORE
RAJ BLORE
MOHI DELHI
VINAY MANGALORE
SELECT CASE WHEN TGT.NAME IS NOT NULL THEN 'UPDATE' ELSE 'INSERT' END FLAG, TGT.NAME, TGT.CITY FROM SOURCE_tbl SRC LEFT OUTER JOIN TARGET_tbl TGT ON SRC.NAME = TGT.NAME;
Output of the above query will be :
FLAG NAME CITY
UPDATE NITI LUCKNOW
INSERT NITIN MAHANAGAR
UPDATE ALOK MANGALORE
UPDATE RAJ WAYANAD
UPDATE MOHI DELHI
Explanation of the above query :
The main part here is the CASE statement.
CASE WHEN N.NAME IS NOT NULL THEN ‘UPDATE’ ELSE ‘INSERT’ END FLAG
Here we are using Left Outer Join Source with Target, so we will get matching records from our Target and all the records from Source.However the value of Target columns will be NULL if there is no match else it will be some value.So when we do Left Outer Join between Source and Target , the NULL values in Target columns indicate that NO MATCH FOUND and following record does not exist.So in this way we can easily find out whether we want to update or insert.
This approach is also called as SCD-1 or Slowly Changing Dimension -1. If you wish you can read more about it in detail in these posts: SCD-1
If you want more clarity or you have any doubt regarding above query feel free to comment here.