We receive so many requests for covering Teradata Utilities tutorials in our blog. The reason primarily I was not doing it because there are other blogs which already cover it. However one utility in specific I find the resources available online are not easy to understand or not very clear. I am talking about Teradata TPT or Teradata Parallel Transporter. This is my attempt to make it simple and easy to understand.
What is Teradata TPT ?
TPT is short form of Teradata Parallel Transporter. Teradata TPT is a single utility which internally calls other Teradata Utilities depending on the operators specified by user. So now you don’t have to write separate scripts for different utilities for data ingestion or data export. One utility which replaces all Teradata utilities.
If you want to load bulk data into empty table you can write TPT script which invokes FASTLOAD internally and loads the data. Similarly it uses MULTILOAD, TPUMP and of course BTEQ utilities to perform various functions. In TPT for every activity there is an associated operator. Operator is nothing but you can think of role which the component plays. Different operator plays different role.
TPT Operators in Teradata
- BTEQ – DDL Operator
- FASTLOAD – LOAD Operator
- MULTILOAD – UPDATE Operator
- FASTEXPORT – EXPORT Operator
- TPUMP – STREAM Operator
Producer Operator refers to Source eg reading from Data File.
Consumer operator refers to Target eg extracting data into Data File.
Job Steps are execution block of code which applies various operators to producer or consumer.
So let’s start with a simple example where we will read data from a file and load it into target table which is empty now. So before TPT was introduced we used to write FASTLOAD scripts for this scenario. Today we will write TPT script to load the data. So we have a Data File which we want to load and an Empty Target Table.
Teradata TPT Script to load from file to table
Step 1 : Define FILE Schema and Producer Operator to read files
DEFINE SCHEMA SCHEMA_EMP_NAME ( NAME VARCHAR(50), AGE VARCHAR(2) ); DEFINE OPERATOR op_EMP_NAME TYPE DATACONNECTOR PRODUCER SCHEMA SCHEMA_EMP_NAME ATTRIBUTES ( VARCHAR DirectoryPath= '', VARCHAR FileName = 'NAME.CSV', VARCHAR Format = 'Delimited', VARCHAR OpenMode = 'Read', VARCHAR TextDelimiter ='|' );
Perfect. So we have defined the file schema and it consists of only 2 columns Name & Age. Also we have defined a PRODUCER operator specifying file details like File path, Filename, File Delimiter etc. There are more options available for it however to keep it simple we will check only these few options here.
Step 2: Define a DDL operator to drop/create Error tables and Target Table.
DEFINE OPERATOR od_EMP_NAME TYPE DDL ATTRIBUTES ( VARCHAR PrivateLogName = '', VARCHAR LogonMech = '', VARCHAR TdpId = '', VARCHAR UserName = '', VARCHAR UserPassword = '', VARCHAR ErrorList = '3807' );
So now we have defined a DDL operator which will drop error tables if already exists and will re create target table. In our case we need target table to be empty so we can recreate it all the time.
Note: Fill in the details for LogonMech/TDPID/USERNAME/PASSWORD.
Step 3: Define a LOAD operator to load the data file into target table.
DEFINE OPERATOR ol_EMP_NAME TYPE LOAD SCHEMA * ATTRIBUTES ( VARCHAR LogonMech = '', VARCHAR TdpId = '', VARCHAR UserName = '', VARCHAR UserPassword = '', VARCHAR LogTable = 'EMP_NAME_LG', VARCHAR ErrorTable1 = 'EMP_NAME_ET', VARCHAR ErrorTable2 = 'EMP_NAME_UV', VARCHAR TargetTable = 'EMP_NAME' );
Now we have defined LOAD operator and mentioned Target table and Error tables.
Note: Fill in the details for LogonMech/TDPID/USERNAME/PASSWORD.
Step 4: Define Job Step to execute DDL Operator
STEP stSetup_Tables ( APPLY ('DROP TABLE EMP_NAME_LG;'), ('DROP TABLE EMP_NAME_ET;'), ('DROP TABLE EMP_NAME_UV;'), ('DROP TABLE EMP_NAME;'), ('CREATE TABLE EMP_NAME(NAME VARCHAR(50), AGE VARCHAR(2));') TO OPERATOR (od_EMP_NAME); );
Step 5: Define Job Step to execute LOAD operator
STEP stLOAD_FILE_NAME ( APPLY ('INSERT INTO EMP_NAME (Name,Age) VALUES (:Name,:Age); ') TO OPERATOR (ol_EMP_NAME) SELECT * FROM OPERATOR(op_EMP_NAME); );
Steps Taken to Create a TPT script
- Define FILE Schema and Producer Operator to read files.
- Define a DDL operator to drop/create Error tables and Target Table.
- Define a LOAD operator to load the data file into target table.
- Define Job Step to execute DDL Operator
- Define Job Step to execute LOAD operator
Keeping everything together and giving a JOB name:
DEFINE JOB SAMPLE_FILE_LOAD DESCRIPTION 'Load a Teradata table from a file' ( DEFINE SCHEMA SCHEMA_EMP_NAME ( EMP_NAME VARCHAR(50), AGE VARCHAR(2) ); DEFINE OPERATOR op_EMP_NAME TYPE DATACONNECTOR PRODUCER SCHEMA SCHEMA_EMP_NAME ATTRIBUTES ( VARCHAR DirectoryPath= '', VARCHAR FileName = 'EMP_NAME.CSV', VARCHAR Format = 'Delimited', VARCHAR OpenMode = 'Read', VARCHAR TextDelimiter ='|' ); DEFINE OPERATOR od_EMP_NAME TYPE DDL ATTRIBUTES ( VARCHAR LogonMech = '', VARCHAR TdpId = '', VARCHAR UserName = '', VARCHAR UserPassword = '', VARCHAR ErrorList = '3807' ); DEFINE OPERATOR ol_EMP_NAME TYPE LOAD SCHEMA * ATTRIBUTES ( VARCHAR LogonMech = '', VARCHAR PrivateLogName = 'load_log', VARCHAR TdpId = '', VARCHAR UserName = '', VARCHAR UserPassword = '', VARCHAR LogTable = 'EMP_NAME_LG', VARCHAR ErrorTable1 = 'EMP_NAME_ET', VARCHAR ErrorTable2 = 'EMP_NAME_UV', VARCHAR TargetTable = 'EMP_NAME' ); STEP stSetup_Tables ( APPLY ('DROP TABLE EMP_NAME_LG;'), ('DROP TABLE EMP_NAME_ET;'), ('DROP TABLE EMP_NAME_UV;'), ('DROP TABLE EMP_NAME;'), ('CREATE TABLE EMP_NAME(EMP_NAME VARCHAR(50), AGE VARCHAR(2));') TO OPERATOR (od_EMP_NAME); ); STEP stLOAD_FILE_NAME ( APPLY ('INSERT INTO EMP_NAME (EMP_NAME,Age) VALUES (:EMP_NAME,:Age); ') TO OPERATOR (ol_EMP_NAME) SELECT * FROM OPERATOR(op_EMP_NAME); ); );
How do I run a Teradata TPT script in UNIX ?
You can run the TPT script via invoking tbuild command and passing the tpt script as parameter to tbuild utility.
tbuild –f TPT_Scriptname.tpt
Whenever we execute TPT there is a .out file created which is like a log file for TPT. You cannot read contents of .out file by cat or more or any other UNIX command if running on UNIX or in Notepad or any other Text Editor.
How to view TPT Log File ?
To view TPT log file use tlogview command and pass logfile as the input parameter.
tlogview –l TPT_Script.out
So we created one basic TPT script and successfully executed it to load data from File into Target Table. I have tried to make this post simple to understand. Hence I have explained many things here in my language which you may find different than other books or materials on TPT. However I hope if you understand the concept here you can relate to what others have written about TPT.
How Can I use the TPT with the mini batches as my tables are already populated.
Also can you please write a post about Batch Directory scan. ?
What if the input file is a .txt file and data looks like this:
"Key"|"Region"|"RegionDescription"
"1000000"|"ASIA"|"Asia"
"1000001"|"EU"|"Europe"
How do I specify the delimiter?
What if the input file is a .txt file and data looks like this:
"Record_Key"|"Level_1"|"Level_1_Description"
"1000000"|"ASIA"|"Asia"
"1000001"|"EU"|"Europe"
How do I specify the delimiter?
You have to clean the data, run a unix shell script to eliminate " at the begining and end. after that use delimiter as "|"
What so you think will be the fastest way to load almost a trillion rows from multiple csv.gz files, about 3000 files in S3. I've tried tbuild but it seems that the files from S3 are being downloaded sequentially. And I also noticed there is only one executor process is there a way to get more parallelism? Took 29 minutes to load 225,036,274 rows.
The information provided in this post regarding Tpt is very helpful.Thanks to the team.
Can you please share info on how to use Tpt to copy data from one table in one TD server to other table in different TD server.
Nice tutorial. The Teradata tutorial was help ful for me. Keep Sharing Tutorials.
Dear Auther,
Could you please help me to get script for data source as a Oracle ' s table:
Like, I have to extract data from oracle database table in keep in Teradata database table without landing in file.
Much appreciated for your support.
Thanks & Regards
AK
Hi,
I am also looking for a similar script to load a teradata table using any ODBC connection as source. Please let me know if you have already have a working solution for it.
Thanks for such a good article.. Simple scripting style
Glad you liked it Ganesh.
Cheers
Raj
Hi,
Need a help in TPT script while loading data from flat file to a Teradata table.
Eg:
STEP load_data
(
APPLY
$INSERT
TO OPERATOR (LOAD_OPERATOR () [1] )
SELECT * FROM OPERATOR (EXPORT_OPERATOR() [1] );
);
How to pass SELECT * FROM statement as a parameter ?
eg:
STEP load_data
(
APPLY
$INSERT
TO OPERATOR (LOAD_OPERATOR () [1] )
@SELECTSQL OPERATOR (EXPORT_OPERATOR() [1] );
);
I m using the command to execute the script:
tbuild -f FF_TGT_TBL_LOAD.txt -u "TgtDatabase = 'DEV_XXX_XXX', TgtTable = 'FF_ABC_XYZ', ,SELECTSQL='SELECT * FROM ' "
But it is getting failed. Could anyone of you please help me to fix this ?
Hello, this was extremely helpful. Can you please also provide a sample TPT script where instead of one file (as shown above), we are using multiple (lets say, 3) data files to load data into one empty table. I have searched through internet but there's no example as easy as your to follow. Greatly appreciate your help!
Hi Zoya.
Please check our new post:
https://etl-sql.com/load-multiple-files-one-single-table-using-tpt-teradata/
This should help.
Cheers
Raj
Hi Kiran,
Could you please give an example, how we can export the data from teradata to HDFS(Hadoop distributed file system) location.
Thanks,
Ranjit
Can you give an example on TPT Upsert, using ODBC as source
Hi Raj
I am having issues running the script u shared.
Getting this error
TPT_INFRA: Syntax error at or near line 6 of Job Script File 'TPTScript.tpt':
TPT_INFRA: At "NAME" missing RPAREN_ in Rule: Explicit Schema Element List
TPT_INFRA: Syntax error at or near line 8 of Job Script File 'TPTScript.tpt':
My script is as below:
DEFINE JOB SAMPLE_FILE_LOAD
DESCRIPTION 'Load a Teradata table from a file'
(
DEFINE SCHEMA SCHEMA_EMP_NAME
(
NAME VARCHAR(50),
AGE VARCHAR(2)
);
DEFINE OPERATOR op_EMP_NAME
TYPE DATACONNECTOR PRODUCER
SCHEMA SCHEMA_EMP_NAME
ATTRIBUTES
(
VARCHAR DirectoryPath= '/TPT/',
VARCHAR FileName = 'NAME.CSV*',
VARCHAR Format = 'Delimited',
VARCHAR OpenMode = 'Read',
VARCHAR TextDelimiter ='|'
);
DEFINE OPERATOR od_EMP_NAME
TYPE DDL
ATTRIBUTES
(
VARCHAR PrivateLogName = 'ddl_log',
VARCHAR LogonMech = 'LDAP',
VARCHAR TdpId =",
VARCHAR UserName =",
VARCHAR UserPassword =",
VARCHAR ErrorList = '3807'
);
DEFINE OPERATOR ol_EMP_NAME
TYPE LOAD
SCHEMA *
ATTRIBUTES
(
VARCHAR LogonMech = 'LDAP',
VARCHAR TdpId =",
VARCHAR UserName =",
VARCHAR UserPassword =",
VARCHAR LogTable = 'EMP_NAME_LG',
VARCHAR ErrorTable1 = 'EMP_NAME_ET',
VARCHAR ErrorTable2 = 'EMP_NAME_UV',
VARCHAR TargetTable = 'EMP_NAME'
);
STEP stSetup_Tables
(
APPLY
(DROP TABLE EMP_NAME_LG;),
(DROP TABLE EMP_NAME_ET;),
(DROP TABLE EMP_NAME_UV;),
(DROP TABLE EMP_NAME;),
('CREATE TABLE SCHEMA_EMP_NAME.EMP_NAME(NAME VARCHAR(50), AGE VARCHAR(2));')
TO OPERATOR (od_EMP_NAME);
);
STEP stLOAD_FILE_NAME
(
APPLY
(INSERT INTO SCHEMA_EMP_NAME.EMP_NAME
(Name,Age)
VALUES
(:Name,:Age);
)
TO OPERATOR (ol_EMP_NAME)
SELECT * FROM OPERATOR(op_EMP_NAME);
);
);
Please advice what all changes need to be done in the script shared to work as the script itself is creating the Table but the Schema name is not defined.
Hi Kshitij
I executed similar TPT script in different TPT versions. I got the error in V15.00
Below is the complete Error Description.
"Teradata Parallel Transporter Version 15.00.00.00
TPT_INFRA: Syntax error at or near line 6 of Job Script File 'TPT_Load.tpt':
TPT_INFRA: At "NAME" missing RPAREN_ in Rule: Explicit Schema Element List
TPT_INFRA: Syntax error at or near line 8 of Job Script File 'TPT_Load.tpt':
TPT_INFRA: TPT03020: Rule: DEFINE SCHEMA
Compilation failed due to errors. Execution Plan was not generated.
Job script compilation failed.
Job terminated with status 8."
Looks like you are also using the same TPT version. To overcome this error all I did is changed NAME to EMP_NAME in DEFINE SCHEMA section. So the new Schema definition looks like
DEFINE SCHEMA SCHEMA_EMP_NAME
(
EMP_NAME VARCHAR(50),
AGE VARCHAR(2)
);
After this change the TPT script executed successfully.
Let me know the results once you try this change.
Cheers
Raj
This is fantastic. It is very true that there are not many examples on TPT online. This explanation of yours above, of the whole process is concise, accurate and what more?!, you actually have given me a dead on example of what I was looking for, a file to import into a CSV. It really takes passion to write articles to help others out. Thank you for your hard work and keep up the good work!
Correction…….I meant an import from a csv.
Hi Karan
Glad you like the post. Let us know if you are looking for any specific topic in Teradata. We will try to cover it in our future posts.
Cheers
Nitin
Hi Raj,
I do have a request, took you up on it very quickly there :D, but I'm trying to assign the OS command output value to a variable, so I can insert into a table. Essentially, all I'm trying to do, is to count the number of lines in a csv and load it and check if all the records have made it into the database. I could always use the error tables to judge that, but it is a standard where we work, so I'm trying to count the lines in the csv using powershell, and once I have that count, I want to be able to insert into an audits table.
Thanks,
Karan
Hi Karan
I added below two steps in tpt script to get the data file row count:
DEFINE OPERATOR OS_COMMAND_OPERATOR
TYPE OS COMMAND
ATTRIBUTES
(
VARCHAR PrivateLogName = 'oscommand_log',
VARCHAR OsCmd = 'wc -l NAME.CSV|cut -d" " -f1>count.txt',
VARCHAR IgnoreError = 'YES'
);
STEP Count_Row_Data_File
(
APPLY TO OPERATOR (OS_COMMAND_OPERATOR);
);
Now you can read the file count.txt to check number of records in file. You cannot directly load the variable value into table as the os command operator is not allowed for TD operations.
Let me know if you find any other work around to achieve this.
Thanks
Raj