What,Why and How of Encryption in Teradata – Basics and Examples with Syntax

teradata

What is encryption ? Why we need encryption ? How do we implement encryption ? One must clearly know answer to these 3 questions before actually implementing encryption in any datawarehouse environment.
Encryption is the process by which we change the actual meaningful value to some other meaningless value so that any person who reads it cannot understand it. For example if I say “VGTCFCVC” it won’t make any sense to you however if I share actual value “TERADATA” then you’ll immediately understand it. We may use any algorithm to convert one value to other ranging from very simple to extreme complex. In this case I just moved each alphabet to 2 places so
T becomes V (T,U,V)
E becomes G (E,F,G)
R becomes T (R,S,T)
A becomes C (A,B,C)
D becomes F (D,E,F)
hence TERADATA becomes VGTCFCVC.
In any Datawarehouse environment there are so many columns which store sensitive or personal information which must be hidden from developers or testers working on building/maintaining the data warehouse. Also in case of data hacks, since the personal information is encrypted it is very challenging for the hacker to get the real value. The information can be anything like a customer credit card information or pin to his Internet banking portal etc. Such information must be kept in encrypted form in order to avoid huge losses to company or person in case of data hacks.
There are many ways to implement Encryption in Teradata. We are not talking about encrypting the login password which can be done using Teradata Wallet. Here we are limiting Encryption to the actual DATA values stored in Teradata tables.
Creating Encryption Decryption functions
This is one of the most common method to implement Encryption in Teradata. It requires a User Defined Function (typically written in C,C++ or even JAVA in new TD versions) which is called to encrypt the sensitive values. All the SQL operations are done on encrypted values only in data warehouse. There is Decryption function available too. The Decryption is generally done in Teradata while extracting data in order to share it with downstream systems which expect clear values. A UDF can be SQL based or not. It depends on the requirement. Let us quickly look at a simple example to get better understanding of how encryption/decryption works in Teradata.

Total 4 components involved:
1) Encryption Function
2) Decryption Function
3) Staging Table with Clear values
4) Target Table with encrypted values

We will start with creating 2 new USER DEFINED FUNCTIONS: One for Encryption & Second for Decryption.

REPLACE FUNCTION encrypt_salary (varvalue BIGINT)
RETURNS INT
LANGUAGE SQL
CONTAINS SQL
SQL SECURITY DEFINER
COLLATION INVOKER
INLINE TYPE 1
RETURN ((varvalue* 20)/2 + 85)
;

The above encryption function will take employee salary as input and will convert it into some other value by using simple algorithm. Similarly we will create another function to decrypt the salary to produce actual salary of any employee.

REPLACE FUNCTION decrypt_salary (varvalue BIGINT)
RETURNS INT
LANGUAGE SQL
CONTAINS SQL
SQL SECURITY DEFINER
COLLATION INVOKER
INLINE TYPE 1
RETURN ((varvalue – 85)*2) /20
;

In the above function we have just done the reverse of the algorithm which is used for encryption in order to get actual values.

Note:
Encryption function must always be used over Actual Values.
Decryption function must always be used over Encrypted values.

Let us create 2 tables now :

CREATE TABLE stg_emp_details
(
emp_id SMALLINT,
emp_name VARCHAR(30),
emp_mobile VARCHAR(10),
emp_salary INTEGER
)
UNIQUE PRIMARY INDEX(emp_id);

CREATE TABLE tgt_emp_details
(
emp_id SMALLINT,
emp_name VARCHAR(30),
emp_mobile VARCHAR(10),
emp_salary INTEGER
)
UNIQUE PRIMARY INDEX(emp_id);

Now insert some data into the table:

INSERT INTO stg_emp_details SEL 10001,’Richard’,’878787987′,1000;
INSERT INTO stg_emp_details SEL 10002,’Jason’,’77777777′,1500;
INSERT INTO stg_emp_details SEL 10003,’Rony’,’8888888′,2000;
INSERT INTO stg_emp_details SEL 10004,’Rob’,’9999999′,2300;

Check the data once:

SEL emp_id, emp_name,emp_mobile,emp_salary FROM stg_emp_details ORDER BY 1;

emp_idemp_nameemp_mobileemp_salary
10001Richard8787879871000
10002Jason777777771500
10003Rony88888882000
10004Rob99999992300

Now while inserting data from Staging to Target table, encrypt the salary of the employee using the encryption function:

INSERT INTO tgt_emp_details
SEL emp_id, emp_name,emp_mobile,encrypt_salary(emp_salary) AS emp_salary
FROM stg_emp_details;

Check the data once:

SEL emp_id, emp_name,emp_mobile,emp_salary FROM tgt_emp_details ORDER BY 1;

emp_idemp_nameemp_mobileemp_salary
10001Richard87878798710085
10002Jason7777777715085
10003Rony888888820085
10004Rob999999923085

You can see that salary in TGT table is encrypted while in STG table it is clear. If the requirement is to send the data to downstream and that too real values then just use the decryption function while extracting the data.

SEL emp_id, emp_name,emp_mobile,decrypt_salary(emp_salary) AS emp_salary FROM tgt_emp_details ORDER BY 1;

This is very simple example to understand how we can implement encryption/decryption in Teradata tables. This approach is called ‘in-database’ approach for encryption.

Advantages of the “in database” encryption approach:
1) Data resides in Teradata environment only so it is more secure.
2) Very Simple approach to execute and process the data.

Disadvantages of the “in database” encryption approach:
1) Additional CPU resources are used for encryption/decryption.
2) Performance impact cannot be measured for all the process using the functions prior to actual run.
3) Development of User Defined Function will require some effort.
4) Testing of User Defined Function will require some effort.

7 thoughts on “What,Why and How of Encryption in Teradata – Basics and Examples with Syntax”

    1. Hi Raja

      I can think of few options in this case.
      1) You can definitely go for custom encrypt/decrypt UDF and apply on the decimal columns.
      2) If you want some users to see proper values and others not then I will recommend to introduce a view which depending on the user/group/role allows the user to see clear or encrypted values.

      If it is handful columns then you can manually do it else you may want to join with dbc.columns and get datatype and apply functions.


      Raj

  1. Hi – For example, what if a user enters a query to list all the tables fromt eh db – SELECT DatabaseName,
    TableName,
    CreateTimeStamp,
    LastAlterTimeStamp
    FROM DBC.TablesV
    WHERE TableKind = ‘T’
    and DatabaseName = ‘DBC’
    ORDER BY TableName;

    will it list both tables – staging & target table? do we have to maintain 2 tables for all the sensitive parameters we are trying to protect?

  2. Hi – For example, what if a user enters a query to list all the tables fromt eh db – SELECT DatabaseName,
    TableName,
    CreateTimeStamp,
    LastAlterTimeStamp
    FROM DBC.TablesV
    WHERE TableKind = ‘T’
    and DatabaseName = ‘DBC’
    ORDER BY TableName;

    will it list both tables – staging & target table? do we have to maintain 2 tables for all the sensitive parameters we are trying to protect?

  3. We are planning to use disk-level encryption and block level compression. Will the % of compression change when applied to encrypted data?
    Thanks

    1. Yes. It will read encrypted data. In such cases make sure if you are joining on encrypted columns then both the table participating in join should have encrypted column.

Leave a Reply

Your email address will not be published. Required fields are marked *