4 ways to Create Table in Redshift

Redshift

In the previous post, we saw that you can create table in redshift by specifying 3 minimum details : Table Name, Column Name & Column Datatype. Redshift automatically adds encoding & distribution style to the table if nothing is specified explicitly. In this post, we will see 4 ways in which can create table in Redshift.

1) CREATE Table by specifying DDL in Redshift

This is the most common way of creating table in redshift by supplying DDL. In this user can specify Table Name, Column Name, Column Datatype, Distribution style , Distribution Key & Sort Key. Also we will add few columns as NOT NULL in table structure else default is NULL for columns.

CREATE TABLE IF NOT EXISTS h_part(
p_partkey BIGINT NOT NULL,
p_name VARCHAR(40) ,
p_brand VARCHAR(10) NOT NULL,
p_retailprice DECIMAL(10,2) ,
p_active date NOT NULL,
p_inactive date NOT NULL
)
diststyle key
distkey (p_partkey)
compound sortkey (p_active,p_brand);

Let's look at the table components once it is created.

select
	cast(tablename as varchar(30)) as tbl,
	cast("column" as varchar(30)) as col,
	"type",
	"encoding",
	"distkey",
	"sortkey",
	"notnull"
from
	pg_table_def
where
	tablename in ('h_part')
order by
	1;

tblcoltypeencodingdistkeysortkeynotnull
h_partp_partkeybigintaz64true0true
h_partp_namecharacter varying(40)lzofalse0false
h_partp_brandcharacter varying(10)nonefalse2true
h_partp_retailpricenumeric(10,2)az64false0false
h_partp_activedatenonefalse1true
h_partp_inactivedateaz64false0true

So we see default encoding of AZ64 is applied to NUMERIC & DATE columns, LZO compression is applied to STRING columns. Sort Key columns are not encoded and are kept as RAW only. Why ? Because encoding sort key columns may result in overhead while computing. More details on this in future post. So we can see proper distkey , sortkey & NOT NULL columns in the output.

2) CREATE TABLE LIKE in Redshift

We can create new table from existing table in redshift by using LIKE command. It is a one-line command which will copy most of the properties of Source table into new Target table. Let's look at the example below:

create table h_part_like (LIKE h_part);

Let's see the table components after it is created.

tblcoltypeencodingdistkeysortkeynotnull
h_part_likep_partkeybigintaz64true0true
h_part_likep_namecharacter varying(40)lzofalse0false
h_part_likep_brandcharacter varying(10)nonefalse2true
h_part_likep_retailpricenumeric(10,2)az64false0false
h_part_likep_activedatenonefalse1true
h_part_likep_inactivedateaz64false0true

The output looks exactly same as creating table via DDL. So if you want most of the table properties in the new table then LIKE is the best choice.

Note: CREATE TABLE LIKE creates empty table. If you want to create a back-up of any table with data then either you run INSERT statement once the table is created or create table using other method which we have shared below.

3) CREATE TABLE AS (CTAS) in Redshift

CTAS is a common method available in most of the RDBMS including Redshift to create a new table from existing table. With this method you can also copy data from Source to Target table. Let's look at the example below:

 create table  h_part_ctas as  (select * from h_part); 

Let's check the table components below:

tblcoltypeencodingdistkeysortkeynotnull
h_part_ctasp_partkeybigintaz64true0false
h_part_ctasp_activedatenonefalse1false
h_part_ctasp_inactivedateaz64false0false
h_part_ctasp_namecharacter varying(40)lzofalse0false
h_part_ctasp_brandcharacter varying(10)nonefalse2false
h_part_ctasp_retailpricenumeric(10,2)az64false0false

The table structure with CTAS looks identical but the column property like NOT NULL is lost. So if you wish to keep column level properties then CTAS may lose it and you may want to use LIKE + INSERT instead.

4) SELECT INTO new Table in Redshift

You can also create new table using SELECT INTO statement which is same as we have in many other RDBMS. With SELECT INTO you can create a new table with same table structure & with data. Let's see the example below:

select * INTO h_part_select from h_part;
tblcoltypeencodingdistkeysortkeynotnull
h_part_selectp_partkeybigintaz64true0false
h_part_selectp_activedatenonefalse1false
h_part_selectp_inactivedateaz64false0false
h_part_selectp_namecharacter varying(40)lzofalse0false
h_part_selectp_brandcharacter varying(10)nonefalse2false
h_part_selectp_retailpricenumeric(10,2)az64false0false

This is same output as we saw in CTAS. Column level properties like NOT NULL is lost. We do get table with same distkey & sortkey and with Data using this approach.

You can pick any of the 4 way of creating new table in Redshift depending on your requirement. For fresh tables, it is generally the first approach of CREATING tables via DDL and for ETL you may want to use other approaches like LIKE (without data) and CTAS (with data).

Leave a Reply

Your email address will not be published. Required fields are marked *