In SCD-1, we overwrite the existing information in the table with the most recent info received from Source. This is very common SCD type used in the data warehouse and is good only for situations when you don't want to keep historical data for that entity. Let's quickly jump to technical example to understand it:
Create Target Table and load it with sample data:
drop table customer; create table customer ( c_mobile bigint, c_name varchar(30), c_dob date, c_email varchar(100), c_insert_dt date, c_update_dt date ) primary index(c_mobile); insert into customer values (111111111,'Jon Groff','1980-03-10',,date,date); insert into customer values (222222222,'Kenneth Jarrett','1986-08-24',,date,date); insert into customer values (333333333,'Daniel Jensen','1972-08-22',,date,date); insert into customer values (444444444,'Jeffrey Kramer','1987-04-15',,date,date); insert into customer values (555555555,'Pete Bartels','1989-11-16',,date,date);
The table data will look like
C_MOBILE | C_NAME | C_DOB | C_EMAIL | C_INSERT_DT | C_UPDATE_DT |
---|---|---|---|---|---|
111111111 | Jon Groff | 1980-03-10 | NULL | 2019-05-24 | 2019-05-24 |
222222222 | Kenneth Jarrett | 1986-08-24 | NULL | 2019-05-24 | 2019-05-24 |
333333333 | Daniel Jensen | 1972-08-22 | NULL | 2019-05-24 | 2019-05-24 |
444444444 | Jeffrey Kramer | 1987-04-15 | NULL | 2019-05-24 | 2019-05-24 |
555555555 | Pete Bartels | 1989-11-16 | NULL | 2019-05-24 | 2019-05-24 |
Now let's create a staging table which holds most recent info shared from Source.
create table stg_customer ( c_mobile bigint, c_name varchar(30), c_dob date, c_email varchar(100) ) primary index(c_mobile); insert into stg_customer values (111111111,'Jon Groff','1980-03-10','jon.groff@gmail.com'); insert into stg_customer values (222222222,'Kenneth Jarrett','1986-10-24',NULL); insert into stg_customer values (333333333,'Daniel Craig','1972-08-22',NULL); insert into stg_customer values (666666666,'John Snow','1982-02-11',NULL); insert into stg_customer values (777777777,'Arya Stark','1998-05-16',NULL);
The data in staging table looks like
C_MOBILE | C_NAME | C_DOB | C_EMAIL |
---|---|---|---|
111111111 | Jon Groff | 1980-03-10 | jon.groff@gmail.com |
222222222 | Kenneth Jarrett | 1986-10-24 | NULL |
333333333 | Daniel Craig | 1972-08-22 | NULL |
666666666 | John Snow | 1982-02-11 | NULL |
777777777 | Arya Stark | 1998-05-16 | NULL |
Now to implement SCD-1 in Teradata , the easiest method is to run a MERGE statement on Target table "customer" using Staging table "stg_customer"
MERGE customer using stg_customer on customer.c_mobile = stg_customer.c_mobile WHEN MATCHED THEN UPDATE SET c_name = stg_customer.c_name, c_dob = stg_customer.c_dob, c_email = stg_customer.c_email, c_update_dt = current_date WHEN NOT MATCHED THEN INSERT values ( stg_customer.c_mobile, stg_customer.c_name, stg_customer.c_dob, stg_customer.c_email, current_date, current_date );
Now the data in Target table will look like
C_MOBILE | C_NAME | C_DOB | C_EMAIL | C_INSERT_DT | C_UPDATE_DT | COMMENT |
---|---|---|---|---|---|---|
111111111 | Jon Groff | 1980-03-10 | jon.groff@gmail.com | 2019-05-24 | 2019-05-25 | Email Update |
222222222 | Kenneth Jarrett | 1986-10-24 | NULL | 2019-05-24 | 2019-05-25 | DOB Update |
333333333 | Daniel Craig | 1972-08-22 | NULL | 2019-05-24 | 2019-05-25 | Name Update |
444444444 | Jeffrey Kramer | 1987-04-15 | NULL | 2019-05-24 | 2019-05-24 | No Change |
555555555 | Pete Bartels | 1989-11-16 | NULL | 2019-05-24 | 2019-05-24 | No Change |
666666666 | John Snow | 1982-02-11 | NULL | 2019-05-25 | 2019-05-25 | New Record |
777777777 | Arya Stark | 1998-05-16 | NULL | 2019-05-25 | 2019-05-25 | New Record |
*Comment is not the actual column of the table. It is just for understanding purpose I have added above.
You can also run separate UPDATE+INSERT to replicate same functionality.Now let's talk about advantage and disadvantage of SCD-1.
Advantage: Easy to implement, Table Size stays in control, Unique records in the table, Table can be accessed directly without worrying about recent or old record.
Disadvantage: No history is maintained. May not be adequate for data analysis.