In the previous post, we saw that you can create table in redshift by specifying 3 minimum details : Table Name, Column Name & Column Datatype. Redshift automatically adds encoding & distribution style to the table if nothing is specified explicitly. In this post, we will see 4 ways in which can create table in Redshift.
1) CREATE Table by specifying DDL in Redshift
This is the most common way of creating table in redshift by supplying DDL. In this user can specify Table Name, Column Name, Column Datatype, Distribution style , Distribution Key & Sort Key. Also we will add few columns as NOT NULL in table structure else default is NULL for columns.
CREATE TABLE IF NOT EXISTS h_part( p_partkey BIGINT NOT NULL, p_name VARCHAR(40) , p_brand VARCHAR(10) NOT NULL, p_retailprice DECIMAL(10,2) , p_active date NOT NULL, p_inactive date NOT NULL ) diststyle key distkey (p_partkey) compound sortkey (p_active,p_brand);
Let's look at the table components once it is created.
select cast(tablename as varchar(30)) as tbl, cast("column" as varchar(30)) as col, "type", "encoding", "distkey", "sortkey", "notnull" from pg_table_def where tablename in ('h_part') order by 1;
tbl | col | type | encoding | distkey | sortkey | notnull |
---|---|---|---|---|---|---|
h_part | p_partkey | bigint | az64 | true | 0 | true |
h_part | p_name | character varying(40) | lzo | false | 0 | false |
h_part | p_brand | character varying(10) | none | false | 2 | true |
h_part | p_retailprice | numeric(10,2) | az64 | false | 0 | false |
h_part | p_active | date | none | false | 1 | true |
h_part | p_inactive | date | az64 | false | 0 | true |
So we see default encoding of AZ64 is applied to NUMERIC & DATE columns, LZO compression is applied to STRING columns. Sort Key columns are not encoded and are kept as RAW only. Why ? Because encoding sort key columns may result in overhead while computing. More details on this in future post. So we can see proper distkey , sortkey & NOT NULL columns in the output.
2) CREATE TABLE LIKE in Redshift
We can create new table from existing table in redshift by using LIKE command. It is a one-line command which will copy most of the properties of Source table into new Target table. Let's look at the example below:
create table h_part_like (LIKE h_part);
Let's see the table components after it is created.
tbl | col | type | encoding | distkey | sortkey | notnull |
---|---|---|---|---|---|---|
h_part_like | p_partkey | bigint | az64 | true | 0 | true |
h_part_like | p_name | character varying(40) | lzo | false | 0 | false |
h_part_like | p_brand | character varying(10) | none | false | 2 | true |
h_part_like | p_retailprice | numeric(10,2) | az64 | false | 0 | false |
h_part_like | p_active | date | none | false | 1 | true |
h_part_like | p_inactive | date | az64 | false | 0 | true |
The output looks exactly same as creating table via DDL. So if you want most of the table properties in the new table then LIKE is the best choice.
Note: CREATE TABLE LIKE creates empty table. If you want to create a back-up of any table with data then either you run INSERT statement once the table is created or create table using other method which we have shared below.
3) CREATE TABLE AS (CTAS) in Redshift
CTAS is a common method available in most of the RDBMS including Redshift to create a new table from existing table. With this method you can also copy data from Source to Target table. Let's look at the example below:
create table h_part_ctas as (select * from h_part);
Let's check the table components below:
tbl | col | type | encoding | distkey | sortkey | notnull |
---|---|---|---|---|---|---|
h_part_ctas | p_partkey | bigint | az64 | true | 0 | false |
h_part_ctas | p_active | date | none | false | 1 | false |
h_part_ctas | p_inactive | date | az64 | false | 0 | false |
h_part_ctas | p_name | character varying(40) | lzo | false | 0 | false |
h_part_ctas | p_brand | character varying(10) | none | false | 2 | false |
h_part_ctas | p_retailprice | numeric(10,2) | az64 | false | 0 | false |
The table structure with CTAS looks identical but the column property like NOT NULL is lost. So if you wish to keep column level properties then CTAS may lose it and you may want to use LIKE + INSERT instead.
4) SELECT INTO new Table in Redshift
You can also create new table using SELECT INTO statement which is same as we have in many other RDBMS. With SELECT INTO you can create a new table with same table structure & with data. Let's see the example below:
select * INTO h_part_select from h_part;
tbl | col | type | encoding | distkey | sortkey | notnull |
---|---|---|---|---|---|---|
h_part_select | p_partkey | bigint | az64 | true | 0 | false |
h_part_select | p_active | date | none | false | 1 | false |
h_part_select | p_inactive | date | az64 | false | 0 | false |
h_part_select | p_name | character varying(40) | lzo | false | 0 | false |
h_part_select | p_brand | character varying(10) | none | false | 2 | false |
h_part_select | p_retailprice | numeric(10,2) | az64 | false | 0 | false |
This is same output as we saw in CTAS. Column level properties like NOT NULL is lost. We do get table with same distkey & sortkey and with Data using this approach.
You can pick any of the 4 way of creating new table in Redshift depending on your requirement. For fresh tables, it is generally the first approach of CREATING tables via DDL and for ETL you may want to use other approaches like LIKE (without data) and CTAS (with data).