ETL SQL

Learn Extract Transform Load using SQL & PySpark

Teradata Create Table (Permanent/Volatile) Example- SET/Multiset/CTAS

Tables are two-dimensional representation of data in rows & columns. Tables are the core of any RDBMS environment. In TERADATA, table creation is not as straightforward as it looks as it comes with so many options.

teradata create table

You can create Tables with minimal options as we have in any ANSI standard. However creating a table in Teradata with proper options can help in saving resources and tuning the queries running on the table .

The generic syntax for CREATING TABLE in TERADATA is :

CREATE [SET/MULTISET] TABLE DATABASENAME.TABLENAME,
[NO] FALLBACK,
[NO] BEFORE JOURNAL,
[NO] AFTER JOURNAL,
(
COLUMN1 DATATYPE,
COLUMN2 DATATYPE,
.
.
.
COLUMNZ DATATYPE
)
[UNIQUE] PRIMARY INDEX [INDEXNAME] (COLUMN1,COLUMN2...COLUMNZ)
[UNIQUE] INDEX [INDEXNAME] (COLUMN1.COLUMN2...COLUMNZ)

Now let us see what few of the common options actually mean:

SET and MULTISET tables in teradata

SET tables discard complete duplicate records however MULTISET allows multiple instances of same row in the table.

If you know that you will be getting distinct rows , it is advisable to go with MULTISET table as it will not check for duplicate rows. Hence, saving time and enhancing performance.

If you know that you would be getting duplicate records however you need to process only one of the duplicate rows then create SET tables. TERADATA default is SET however ANSI default is MULTISET.

teradata Fallback and no Fallback

FALLBACK is TERADATA mechanism to save table data in case of AMP Failures. If you define any table with FALLBACK option , a duplicate copy of table data is maintained in some other AMP. So in case of AMP Failure, the FALLBACK AMP is used for reading the data.

Fallback is used to create highly available tables in Teradata

For critical tables, it is recommended to use FALLBACK option. This will require double the Storage Space so should be used only for critical tables.

In TERADATA, default is NO FALLBACK. This means table data is not copied in back-up AMP.

journal in teradata

There are four types of Permanent Journals in teradata.

BEFORE Journal – The BEFORE JOURNAL holds the image of impacted rows before any changes are made.

AFTER Journal – AFTER JOURNAL holds the image of affected rows after changes are done.

DUAL BEFORE Journal – In DUAL BEFORE Journal, two images are taken and are stored in two different AMP’s before any changes are made to data in table.

DUAL AFTER Journal – In DUAL AFTER Journal, two images are taken and are stored in two different AMP’s after changes to rows are done in the table.

As it was also discussed earlier that PERMANENT JOURNALS use PERMANENT SPACE so if these JOURNALS are not required any more, it is advisable to drop JOURNALS to save some space.

Also, there are several methods of creating tables in TERADATA. The most basic method is shown above. However if you want to CREATE TABLE using some other table then you can use below command:

CREATE TABLE TABLE_1 AS (
SELECT * FROM TABLE_2
) 
WITH [DATA/NO DATA] 
AND [STATS/NO STATS];

teradata create table example

Now we understand the theory part so let us quickly look into few examples to further strengthen our understanding.

  1. Check/create/SET current database
  2. CREATE table
  3. SHOW table
  4. SET/MULTISET example
  5. INSERT RECORDS
  6. CTAS with/without data

Check Current Database in Teradata

select database;

 *** Query completed. One row found. One column returned.
 *** Total elapsed time was 1 second.

Database
---------------------------------------------------------------------------
DBC

create a database with 5GB of space. DBC is the default database in the Teradata and is considered as “root” in the Database tree.

Create new database in Teradata

create database db_emp FROM DBC
AS
PERM = 5000000000;

 *** Database has been created.
 *** Total elapsed time was 1 second.

Set new database as the default database for this session.

database db_emp;

 *** New default database accepted.
 *** Total elapsed time was 1 second.

Create a table with minimal options in Teradata

CREATE TABLE emp (
  emp_no integer,
  birth_date date,
  first_name varchar(14),
  last_name varchar(16),
  gender char(1),
  hire_date date,
  phone_no varchar(10),
  alt_phone_no varchar(10)
);

 *** Table has been created.
 *** Total elapsed time was 1 second.

To check Table DDL in Teradata we use SHOW command

SHOW table in Teradata

show table emp;

 *** Text of DDL statement returned.
 *** Total elapsed time was 1 second.

---------------------------------------------------------------------------
CREATE SET TABLE DB_EMP.emp ,FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO,
     MAP = TD_MAP1
     (
      emp_no INTEGER,
      birth_date DATE FORMAT 'YY/MM/DD',
      first_name VARCHAR(14) CHARACTER SET LATIN NOT CASESPECIFIC,
      last_name VARCHAR(16) CHARACTER SET LATIN NOT CASESPECIFIC,
      gender CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      hire_date DATE FORMAT 'YY/MM/DD',
      phone_no VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
      alt_phone_no VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( emp_no );

Few points to notice here:

  1. SET table is the default table type.
  2. Table is defined as FALLBACK so copy of the data is stored in another AMP.
  3. DATE columns have default format i.e. ‘YY/MM/DD’.
  4. String columns VARCHAR & CHAR are defined as NOT CASESPECIFIC.
  5. First column is picked as PRIMARY INDEX if not explicitly mentioned in table definition and unique, primary key is also absent.

Let’s create few more tables with some of the params for table/columns explicitly defined.

create set table teradata

CREATE SET TABLE emp_set (
  emp_no integer NOT NULL,
  birth_date date format 'yyyy-mm-dd',
  first_name varchar(14),
  last_name varchar(16),
  gender char(1) NOT NULL default 'M',
  hire_date date format 'y4-mm-dd',
  phone_no varchar(10),
  alt_phone_no varchar(10)
)
PRIMARY INDEX (emp_no);

 *** Table has been created.
 *** Total elapsed time was 1 second.

create multiset table teradata

CREATE MULTISET TABLE emp_multiset (
  emp_no integer NOT NULL,
  birth_date date format 'yyyy-mm-dd',
  first_name varchar(14),
  last_name varchar(16),
  gender char(1) NOT NULL default 'M',
  hire_date date format 'y4-mm-dd',
  phone_no varchar(10),
  alt_phone_no varchar(10)
)
PRIMARY INDEX (emp_no);

 *** Table has been created.
 *** Total elapsed time was 1 second.

Now we have tables defined in Teradata, let us insert some records into the table too.

insert into set table teradata

INSERT INTO emp_set (emp_no,birth_date,first_name,last_name,gender,hire_date,phone_no,alt_phone_no) VALUES (10001,'1978-09-02','Georgi','Facello','M','2006-06-26','9234728034','9234728034');

 *** Insert completed. One row added.
 *** Total elapsed time was 1 second.

 BTEQ -- Enter your SQL request or BTEQ command:
INSERT INTO emp_set (emp_no,birth_date,first_name,last_name,gender,hire_date,phone_no,alt_phone_no) VALUES (10002,'1989-06-02','Bezalel','Simmel','F','2005-11-21','9564989814','9564989814');

 *** Insert completed. One row added.
 *** Total elapsed time was 1 second.

 BTEQ -- Enter your SQL request or BTEQ command:
INSERT INTO emp_set (emp_no,birth_date,first_name,last_name,gender,hire_date,phone_no,alt_phone_no) VALUES (10003,'1984-12-03','Parto','Bamford','M','2006-08-28','9622254431','9520937883');

 *** Insert completed. One row added.
 *** Total elapsed time was 1 second.

 BTEQ -- Enter your SQL request or BTEQ command:
INSERT INTO emp_set (emp_no,birth_date,first_name,last_name,gender,hire_date,phone_no,alt_phone_no) VALUES (10004,'1979-05-01','Chirstian','Koblick','M','2006-12-01','9737926154','9126817150');

 *** Insert completed. One row added.
 *** Total elapsed time was 1 second.

 BTEQ -- Enter your SQL request or BTEQ command:
INSERT INTO emp_set (emp_no,birth_date,first_name,last_name,gender,hire_date,phone_no,alt_phone_no) VALUES (10005,'1980-01-21','Kyoichi','Maliniak','M','2009-09-12','9420307319','9721085175');

 *** Insert completed. One row added.
 *** Total elapsed time was 1 second.

insert into multiset table teradata

INSERT INTO emp_multiset (emp_no,birth_date,first_name,last_name,gender,hire_date,phone_no,alt_phone_no) VALUES (10001,'1978-09-02','Georgi','Facello','M','2006-06-26','9234728034','9234728034');

 *** Insert completed. One row added.
 *** Total elapsed time was 1 second.

 BTEQ -- Enter your SQL request or BTEQ command:
INSERT INTO emp_multiset (emp_no,birth_date,first_name,last_name,gender,hire_date,phone_no,alt_phone_no) VALUES (10002,'1989-06-02','Bezalel','Simmel','F','2005-11-21','9564989814','9564989814');

 *** Insert completed. One row added.
 *** Total elapsed time was 1 second.

 BTEQ -- Enter your SQL request or BTEQ command:
 INSERT INTO emp_multiset (emp_no,birth_date,first_name,last_name,gender,hire_date,phone_no,alt_phone_no) VALUES (10003,'1984-12-03','Parto','Bamford','M','2006-08-28','9622254431','9520937883');

 *** Insert completed. One row added.
 *** Total elapsed time was 1 second.

 BTEQ -- Enter your SQL request or BTEQ command:
INSERT INTO emp_multiset (emp_no,birth_date,first_name,last_name,gender,hire_date,phone_no,alt_phone_no) VALUES (10004,'1979-05-01','Chirstian','Koblick','M','2006-12-01','9737926154','9126817150');

 *** Insert completed. One row added.
 *** Total elapsed time was 1 second.

 BTEQ -- Enter your SQL request or BTEQ command:
INSERT INTO emp_multiset (emp_no,birth_date,first_name,last_name,gender,hire_date,phone_no,alt_phone_no) VALUES (10005,'1980-01-21','Kyoichi','Maliniak','M','2009-09-12','9420307319','9721085175');

 *** Insert completed. One row added.
 *** Total elapsed time was 1 second.

SET Table load same record

INSERT INTO emp_set (emp_no,birth_date,first_name,last_name,gender,hire_date,phone_no,alt_phone_no) VALUES (10001,'1978-09-02','Georgi','Facello','M','2006-06-26','9234728034','9234728034');

 *** Failure 2802 Duplicate row error in db_emp.emp_set.
                Statement# 1, Info =0
 *** Total elapsed time was 1 second.

SET table gave the error when we tried to insert same record again. If it would have been in same transaction then SET table would have quietly ignored the record. But in this case, it is different transaction hence it gave the error. Now let’s check for the MULTISET table.

MULTISET Table load same record

INSERT INTO emp_multiset (emp_no,birth_date,first_name,last_name,gender,hire_date,phone_no,alt_phone_no) VALUES (10001,'1978-09-02','Georgi','Facello','M','2006-06-26','9234728034','9234728034');

 *** Insert completed. One row added.
 *** Total elapsed time was 1 second.

So multiset table allow duplicate records where as SET table throws error if same record comes in different transaction and discard if in same transaction.

check multiset table records

select * from emp_multiset;

 *** Query completed. 6 rows found. 8 columns returned.
 *** Total elapsed time was 1 second.

     emp_no  birth_date  first_name      last_name         gender   hire_date  phone_no    alt_phone_no
-----------  ----------  --------------  ----------------  ------  ----------  ----------  ------------
      10003  1984-12-03  Parto           Bamford           M       2006-08-28  9622254431  9520937883
      10002  1989-06-02  Bezalel         Simmel            F       2005-11-21  9564989814  9564989814
      10005  1980-01-21  Kyoichi         Maliniak          M       2009-09-12  9420307319  9721085175
      10001  1978-09-02  Georgi          Facello           M       2006-06-26  9234728034  9234728034
      10001  1978-09-02  Georgi          Facello           M       2006-06-26  9234728034  9234728034
      10004  1979-05-01  Chirstian       Koblick           M       2006-12-01  9737926154  9126817150

create table as select teradata

In teradata you can create table as select query output. This is also called as CTAS in short form. You can create table as select with data or without data in teradata.

Create table as select from another table with data

create table emp_ctas1 as (select * from emp_multiset) with data;

 *** Table has been created.
 *** Total elapsed time was 1 second.

Check rows of CTAS table just created

select * from emp_ctas1;

 *** Query completed. 5 rows found. 8 columns returned.
 *** Total elapsed time was 1 second.

     emp_no  birth_date  first_name      last_name         gender  hire_date  phone_no    alt_phone_no
-----------  ----------  --------------  ----------------  ------  ---------  ----------  ------------
      10003    84/12/03  Parto           Bamford           M        06/08/28  9622254431  9520937883
      10002    89/06/02  Bezalel         Simmel            F        05/11/21  9564989814  9564989814
      10005    80/01/21  Kyoichi         Maliniak          M        09/09/12  9420307319  9721085175
      10001    78/09/02  Georgi          Facello           M        06/06/26  9234728034  9234728034
      10004    79/05/01  Chirstian       Koblick           M        06/12/01  9737926154  9126817150

Any guesses why there are just 5 rows where as source table has 6 rows ? Yes, you guessed it right. Since default table type is SET , the new table discards the duplicate records. Also let’s check the table structure.

show table emp_ctas1;

 *** Text of DDL statement returned.
 *** Total elapsed time was 1 second.


CREATE SET TABLE DB_EMP.emp_ctas1 ,FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO,
     MAP = TD_MAP1
     (
      emp_no INTEGER,
      birth_date DATE FORMAT 'YY/MM/DD',
      first_name VARCHAR(14) CHARACTER SET LATIN NOT CASESPECIFIC,
      last_name VARCHAR(16) CHARACTER SET LATIN NOT CASESPECIFIC,
      gender CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      hire_date DATE FORMAT 'YY/MM/DD',
      phone_no VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
      alt_phone_no VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( emp_no );

So you can see all the table/column options we have set earlier are gone. Like table is converted to SET date formats are converted to ‘YY/MM/DD’ in place of ‘YYYY-MM-DD’. Also default value for column gender is lost too. So be careful while creating a table from another table.

create table from another table with no data

create table emp_ctas2 as (select * from emp_multiset) with no data;

 *** Table has been created.
 *** Total elapsed time was 1 second.

teradata create table as select with data primary index

create table emp_ctas3 as (select * from emp_multiset) with data primary index (phone_no);

 *** Table has been created.
 *** Total elapsed time was 1 second.

check DDL for recently created table

show table emp_ctas3;

 *** Text of DDL statement returned.
 *** Total elapsed time was 1 second.


CREATE SET TABLE DB_EMP.emp_ctas3 ,FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO,
     MAP = TD_MAP1
     (
      emp_no INTEGER,
      birth_date DATE FORMAT 'YY/MM/DD',
      first_name VARCHAR(14) CHARACTER SET LATIN NOT CASESPECIFIC,
      last_name VARCHAR(16) CHARACTER SET LATIN NOT CASESPECIFIC,
      gender CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      hire_date DATE FORMAT 'YY/MM/DD',
      phone_no VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
      alt_phone_no VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( phone_no );

There is one way of creating table from another table and keeping most of the table/column properties.

create table emp_ctas4 as emp_multiset with data primary index (phone_no);

 *** Table has been created.
 *** Total elapsed time was 1 second.

Let’s see the table structure for this new table

show table emp_ctas4;

 *** Text of DDL statement returned.
 *** Total elapsed time was 1 second.

CREATE MULTISET TABLE DB_EMP.emp_ctas4 ,FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO,
     MAP = TD_MAP1
     (
      emp_no INTEGER NOT NULL,
      birth_date DATE FORMAT 'yyyy-mm-dd',
      first_name VARCHAR(14) CHARACTER SET LATIN NOT CASESPECIFIC,
      last_name VARCHAR(16) CHARACTER SET LATIN NOT CASESPECIFIC,
      gender CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL DEFAULT 'M',
      hire_date DATE FORMAT 'y4-mm-dd',
      phone_no VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
      alt_phone_no VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( phone_no );

So this manner of creating table retains most of the table/column options like Multiset table, DATE FORMAT, GENDER column default value. However it may not be best way to create table if you just want subset of columns in the new table.

That’s it. I think this is sufficient info to get you started on CREATING tables in Teradata. Check out the video below if you prefer watching than reading


Teradata Volatile Table Syntax with Example

volatile table in teradata

Volatile tables are session-specific tables which stores intermediate data and once the session is over, all the data and table definition is removed. Volatile Tables uses SPOOL SPACE. In Teradata, use keyword “VOLATILE” in CREATE statement to create a volatile table.

Teradata VOLATILE syntax

CREATE VOLATILE TABLE TABLE_NAME
 (
 COLUMN_1 DATATYPE,
 COLUMN_2 DATATYPE
 ) PRIMARY INDEX (COLUMN_1)
 ON COMMIT [PRESERVE/DELETE] ROWS;

Options/Settings for Volatile Tables:

LOG/NO LOG:

If Volatile table is created with LOG option, then it will use Transient Journal to save “BEFORE IMAGES” of all the transactions. Since the data is intermediate and can be generated again in case of failures I strongly recommend to go ahead with NO LOG option. This is one simple thing you can do to improve performance of volatile tables. Default is LOG option.

ON COMMIT [PRESERVE/DELETE] ROWS:

This is very important option and most of the time you will be using PRESERVE option. If PRESERVE option is kept then Volatile Table will store the data after each transaction however if the DELETE option is kept then it will lose the data once the transaction is completed. Default is “ON COMMIT DELETE ROWS” hence by default volatile table retains data at transaction level and not session level. So you must make it “ON COMMIT PRESERVE ROWS”

Options NOT available for VOLATILE Tables:

  1. Permanent Journals
  2. Foreign Key
  3. Compression at Column Level
  4. Default Values
  5. Name of Indexes

Volatile Table Example

CREATE VOLATILE TABLE SALARY_STATS ,
NO LOG
 (
 EMP_ID INTEGER,
 HIGHEST_SAL INTEGER,
 AVG_SAL INTEGER,
 LOWEST_SAL INTEGER
 )
 PRIMARY INDEX (HIGHEST_SAL)
 ON COMMIT PRESERVE ROWS;

Once the session is over, all the data and table definition is lost and is never saved in Data Dictionary. Volatile Tables used Spool Space and once session is over , the occupied Spool space is again available for some other transactions.

VOLATILE table with DATA

In most of the ETL use-cases you will create a volatile table from another existing table. If you are creating a table with another table and you want to get records then you have to use “WITH DATA” clause in volatile table CREATE statement. Also if you want to specify different primary index then you can mention it after “with data” clause.

teradata create volatile table as select

CREATE VOLATILE TABLE SALARY_STATS 
 AS 
 (
 SELECT EMP_ID ,
 max(salary) AS HIGHEST_SAL ,
 avg(salary) AS AVG_SAL ,
 min(salary) AS LOWEST_SAL 
 FROM employee
 GROUP BY emp_id
 )
 WITH DATA 
 PRIMARY INDEX (HIGHEST_SAL)
 ON COMMIT PRESERVE ROWS;

To check all the volatile table created in current session use below command:

HELP VOLATILE TABLE;

Teradata Volatile Table vs Global Temporary Table

  1. Both the Volatile & Global Temporary table retains session-specific data only. Once session completes data is removed from table.
  2. Global Temporary table retains table definition permanently. However for volatile DDL is removed too. So you can use Temporary table in another session without the need to create it all the time.
  3. Volatile table use SPOOL space where as Temporary table use TEMPORARY space (basically un-used Permanent Space).
  4. In earlier version , collect stats on Volatile was not supported but now it is for both kind of tables. For Temporary table, use keyword “TEMPORARY” before tablename in collect stats.

Leave a Reply

Your email address will not be published. Required fields are marked *