In this post, we will see a very simple example in which we will create a Redshift table with basic structure and then we will see what all additional properties Redshift will add to it by default. Then we will quickly discuss about those properties and in subsequent posts we will see how these properties impact the overall query performance of these tables. I will be referring to TPC-H tables and queries in Redshift related posts.
Let's create a sample table now.
CREATE TABLE IF NOT EXISTS h_part( p_partkey BIGINT , p_name VARCHAR(55) , p_mfgr VARCHAR(25) , p_brand VARCHAR(10) , p_type VARCHAR(25) , p_size DECIMAL(38,0) , p_container VARCHAR(10) , p_retailprice DECIMAL(38,18) , p_comment VARCHAR(23) );
So I am creating "h_part" table with few columns and I have just specified the datatype for all the columns. This is minimum table level properties you must specify in order to create any table in Redshift. Now let's check the table definition in Redshift.
CREATE TABLE IF NOT EXISTS public.h_part ( " p_partkey BIGINT ENCODE az64" " ,p_name VARCHAR(55) ENCODE lzo" " ,p_mfgr VARCHAR(25) ENCODE lzo" " ,p_brand VARCHAR(10) ENCODE lzo" " ,p_type VARCHAR(25) ENCODE lzo" " ,p_size NUMERIC(38,0) ENCODE az64" " ,p_container VARCHAR(10) ENCODE lzo" " ,p_retailprice NUMERIC(38,18) ENCODE az64" " ,p_comment VARCHAR(23) ENCODE lzo" ) DISTSTYLE AUTO ;
Two important points to notice here:
1) ENCODE: Appropriate encoding (compression technique) is added to each column. Since Redshift is columnar database, it leverages advantage of having specific compression algorithm for each column as per datatype rather than uniform compression for entire table.
2) DISTSTYLE: Distribution style of "AUTO" is added to the table which will work as "ALL" when table is small and will switch to "EVEN" as the table size will grow.
We will not change anything for now. We will keep table structure as-is and will proceed to data loading in the next post. We will talk more about these properties in details in later posts.