What is encryption ? Why we need encryption ? How do we implement encryption ? One must clearly know answer to these 3 questions before actually implementing encryption in any datawarehouse environment.
Encryption is the process by which we change the actual meaningful value to some other meaningless value so that any person who reads it cannot understand it. For example if I say "VGTCFCVC" it won't make any sense to you however if I share actual value "TERADATA" then you'll immediately understand it. We may use any algorithm to convert one value to other ranging from very simple to extreme complex. In this case I just moved each alphabet to 2 places so
T becomes V (T,U,V)
E becomes G (E,F,G)
R becomes T (R,S,T)
A becomes C (A,B,C)
D becomes F (D,E,F)
hence TERADATA becomes VGTCFCVC.
In any Datawarehouse environment there are so many columns which store sensitive or personal information which must be hidden from developers or testers working on building/maintaining the data warehouse. Also in case of data hacks, since the personal information is encrypted it is very challenging for the hacker to get the real value. The information can be anything like a customer credit card information or pin to his Internet banking portal etc. Such information must be kept in encrypted form in order to avoid huge losses to company or person in case of data hacks.
There are many ways to implement Encryption in Teradata. We are not talking about encrypting the login password which can be done using Teradata Wallet. Here we are limiting Encryption to the actual DATA values stored in Teradata tables.
Creating Encryption Decryption functions
This is one of the most common method to implement Encryption in Teradata. It requires a User Defined Function (typically written in C,C++ or even JAVA in new TD versions) which is called to encrypt the sensitive values. All the SQL operations are done on encrypted values only in data warehouse. There is Decryption function available too. The Decryption is generally done in Teradata while extracting data in order to share it with downstream systems which expect clear values. A UDF can be SQL based or not. It depends on the requirement. Let us quickly look at a simple example to get better understanding of how encryption/decryption works in Teradata.
Total 4 components involved:
1) Encryption Function
2) Decryption Function
3) Staging Table with Clear values
4) Target Table with encrypted values
We will start with creating 2 new USER DEFINED FUNCTIONS: One for Encryption & Second for Decryption.
REPLACE FUNCTION encrypt_salary (varvalue BIGINT)
RETURNS INT
LANGUAGE SQL
CONTAINS SQL
SQL SECURITY DEFINER
COLLATION INVOKER
INLINE TYPE 1
RETURN ((varvalue* 20)/2 + 85)
;
The above encryption function will take employee salary as input and will convert it into some other value by using simple algorithm. Similarly we will create another function to decrypt the salary to produce actual salary of any employee.
REPLACE FUNCTION decrypt_salary (varvalue BIGINT)
RETURNS INT
LANGUAGE SQL
CONTAINS SQL
SQL SECURITY DEFINER
COLLATION INVOKER
INLINE TYPE 1
RETURN ((varvalue – 85)*2) /20
;
In the above function we have just done the reverse of the algorithm which is used for encryption in order to get actual values.
Note:
Encryption function must always be used over Actual Values.
Decryption function must always be used over Encrypted values.
Let us create 2 tables now :
CREATE TABLE stg_emp_details
(
emp_id SMALLINT,
emp_name VARCHAR(30),
emp_mobile VARCHAR(10),
emp_salary INTEGER
)
UNIQUE PRIMARY INDEX(emp_id);CREATE TABLE tgt_emp_details
(
emp_id SMALLINT,
emp_name VARCHAR(30),
emp_mobile VARCHAR(10),
emp_salary INTEGER
)
UNIQUE PRIMARY INDEX(emp_id);
Now insert some data into the table:
INSERT INTO stg_emp_details SEL 10001,'Richard','878787987',1000;
INSERT INTO stg_emp_details SEL 10002,'Jason','77777777',1500;
INSERT INTO stg_emp_details SEL 10003,'Rony','8888888',2000;
INSERT INTO stg_emp_details SEL 10004,'Rob','9999999',2300;
Check the data once:
SEL emp_id, emp_name,emp_mobile,emp_salary FROM stg_emp_details ORDER BY 1;
emp_id | emp_name | emp_mobile | emp_salary |
10001 | Richard | 878787987 | 1000 |
10002 | Jason | 77777777 | 1500 |
10003 | Rony | 8888888 | 2000 |
10004 | Rob | 9999999 | 2300 |
Now while inserting data from Staging to Target table, encrypt the salary of the employee using the encryption function:
INSERT INTO tgt_emp_details
SEL emp_id, emp_name,emp_mobile,encrypt_salary(emp_salary) AS emp_salary
FROM stg_emp_details;
Check the data once:
SEL emp_id, emp_name,emp_mobile,emp_salary FROM tgt_emp_details ORDER BY 1;
emp_id | emp_name | emp_mobile | emp_salary |
10001 | Richard | 878787987 | 10085 |
10002 | Jason | 77777777 | 15085 |
10003 | Rony | 8888888 | 20085 |
10004 | Rob | 9999999 | 23085 |
You can see that salary in TGT table is encrypted while in STG table it is clear. If the requirement is to send the data to downstream and that too real values then just use the decryption function while extracting the data.
SEL emp_id, emp_name,emp_mobile,decrypt_salary(emp_salary) AS emp_salary FROM tgt_emp_details ORDER BY 1;
This is very simple example to understand how we can implement encryption/decryption in Teradata tables. This approach is called 'in-database' approach for encryption.
Advantages of the “in database” encryption approach:
1) Data resides in Teradata environment only so it is more secure.
2) Very Simple approach to execute and process the data.
Disadvantages of the “in database” encryption approach:
1) Additional CPU resources are used for encryption/decryption.
2) Performance impact cannot be measured for all the process using the functions prior to actual run.
3) Development of User Defined Function will require some effort.
4) Testing of User Defined Function will require some effort.
7 thoughts on “What,Why and How of Encryption in Teradata – Basics and Examples with Syntax”
As part of encryption, Is it Possible to Mask the Decimal values to *** or convert them to HASH value and decrypt as needed ?
Hi Raja
I can think of few options in this case.
1) You can definitely go for custom encrypt/decrypt UDF and apply on the decimal columns.
2) If you want some users to see proper values and others not then I will recommend to introduce a view which depending on the user/group/role allows the user to see clear or encrypted values.
If it is handful columns then you can manually do it else you may want to join with dbc.columns and get datatype and apply functions.
—
Raj
Hi – For example, what if a user enters a query to list all the tables fromt eh db – SELECT DatabaseName,
TableName,
CreateTimeStamp,
LastAlterTimeStamp
FROM DBC.TablesV
WHERE TableKind = 'T'
and DatabaseName = 'DBC'
ORDER BY TableName;
will it list both tables – staging & target table? do we have to maintain 2 tables for all the sensitive parameters we are trying to protect?
Hi – For example, what if a user enters a query to list all the tables fromt eh db – SELECT DatabaseName,
TableName,
CreateTimeStamp,
LastAlterTimeStamp
FROM DBC.TablesV
WHERE TableKind = 'T'
and DatabaseName = 'DBC'
ORDER BY TableName;
will it list both tables – staging & target table? do we have to maintain 2 tables for all the sensitive parameters we are trying to protect?
We are planning to use disk-level encryption and block level compression. Will the % of compression change when applied to encrypted data?
Thanks
what if we don't decrypt it in ETL process it will get read without any decryption?/
Yes. It will read encrypted data. In such cases make sure if you are joining on encrypted columns then both the table participating in join should have encrypted column.