Teradata UNION vs UNION ALL

teradata

Teradata SQL UNION and UNION ALL are SET operators used to retrieve data from multiple SELECT blocks and merge it into a single final resultset. The main difference between UNION vs UNION ALL is UNION operator removes duplicate records where as UNION ALL allows redundant values in the output. Performance wise UNION ALL is faster because it avoids the check to remove duplicate values in the final result.

Let us look at the example below.

Create 2 dummy tables and insert 5 records in each table.

create volatile table vt_emp_perm
(
emp_name varchar(6),
emp_age smallint
)
on commit preserve rows;

create volatile table vt_emp_temp
(
emp_name varchar(6),
emp_age smallint
)
on commit preserve rows;
;

insert into vt_emp_perm values ('Mark',30);
insert into vt_emp_perm values ('Roger',35);
insert into vt_emp_perm values ('Sophia',23);
insert into vt_emp_perm values ('Mia',40);
insert into vt_emp_perm values ('Emma',25);

insert into vt_emp_temp values ('Mark',30);
insert into vt_emp_temp values ('George',35);
insert into vt_emp_temp values ('Noah',23);
insert into vt_emp_temp values ('Oscar',40);
insert into vt_emp_temp values ('Emma',28);

Teradata UNION ALL example

UNION ALL fetches data from more than 1 SELECT blocks and merge them without removing duplicates in the final result set. UNION ALL is really helpful when you want to load same Target table with multiple SELECT statements. So rather than loading same Target table multiple times with different SELECT statements , UNION ALL may result in better performance. If you expect duplicate data in the output and you don’t want it in Target table then you may create Target as SET table or use UNION in place of UNION ALL in the query.

select emp_name,emp_age 
from vt_emp_perm
union all
select emp_name,emp_age
from vt_emp_temp;
emp_nameemp_age
Emma25
Emma28
George35
Mark30
Mark30
Mia40
Noah23
Oscar40
Roger35
Sophia23

The above query will retrieve all the data from first SQL SELECT query and also retrieve the data from second SQL SELECT query and merge the data and display the resultset. Hence 5 records from first select block and 5 from seconds. Thus output has 10 records. This will not remove duplicate records. For example – in the output “MARK 30” has come twice.

Teradata UNION example

SQL UNION will remove the duplicate entries and will display only one row if duplicate rows are present in final resultset. This is beneficial if you want to avoid loading same table multiple times by different SELECT blocks. You can use UNION operator to merge the individual output and load target table if you don’t want duplicate in target.

select emp_name,emp_age 
from vt_emp_perm
union 	
select emp_name,emp_age
from vt_emp_temp;
emp_nameemp_age
Emma25
Emma28
George35
Mark30
Mia40
Noah23
Oscar40
Roger35
Sophia23

In the above output you can only see 9 records. The duplicate row “MARK 30” has come only once in output. We cannot consider “EMMA” as duplicate because value fro age is different in both the select block. Since there is additional check for removing duplicates in the output UNION is logically slower than UNION ALL.

Tips for SQL UNION & UNION ALL queries

Teradata union corresponding select-list expressions are incompatible

The SELECT statements used in UNION/UNION ALL must have equal number of columns and the columns datatypes should be of same type across all SELECT statements else it may result in error.

In the below example , I have swapped position of emp_name & emp_age column in the second SELECT block. So now the first column is varchar and when it encounters smallint from second SELECT block then SQL query throws error.

select emp_name,emp_age 
from vt_emp_perm
union 	
select emp_age,emp_name
from vt_emp_temp;

OUTPUT: [Error 3654] Corresponding select-list expressions are incompatible.

All select-lists do not contain the same number of expressions.

In the example below I have removed one column and kept on single column in second SELECT block.

select emp_name,emp_age 
from vt_emp_perm
union 	
select emp_name
from vt_emp_temp;

OUTPUT: [Error 3653] All select-lists do not contain the same number of expressions.

UNION/UNION ALL truncating data

The UNION and UNION ALL operator will take technical metadata of the First SELECT statement for complete resultset. So you should make sure that all the columns in first SELECT clause should have highest column width in order to avoid truncation of column values in final resultset.

In the below example I have created another temp table and loaded some dummy records to it.

create volatile table vt_emp_temp2
(
emp_name varchar(10),
emp_age smallint
)
on commit preserve rows;
;

insert into vt_emp_temp2 values ('Katherine',29);
insert into vt_emp_temp2 values ('Alexandra',29);

Let’s use this third table as well in the UNION operator.

select emp_name,emp_age 
from vt_emp_perm
union 	
select emp_name,emp_age
from vt_emp_temp
union 
select emp_name,emp_age
from vt_emp_temp2;
emp_nameemp_age
Alexan29
Emma25
Emma28
George35
Kather29
Mark30
Mia40
Noah23
Oscar40
Roger35
Sophia23

In the output we can see values like “ALEXANDRA” is truncated to “ALEXAN” and the reason is the first select statement has emp_name as VARCHAR(6). So overall the metadata for first column in the output is varchar(6) hence any value that is more than that is truncate to 6 characters only. To overcome this, we have to possible solution.

Manually cast the first select statement columns to max values

select cast(emp_name as varchar(10)) as emp_name,emp_age 
from vt_emp_perm
union 	
select emp_name,emp_age
from vt_emp_temp
union 
select emp_name,emp_age
from vt_emp_temp2;
emp_nameemp_age
Alexandra29
Emma25
Emma28
George35
Katherine29
Mark30
Mia40
Noah23
Oscar40
Roger35
Sophia23

So now we can see no truncation of value. The other method is to keep the select block with maximum column-width as the first one in the UNION/UNION operator. So in example below , we have placed first the select block with emp_name as varchar(10) in place of other blocks where it is varchar(6).

select emp_name,emp_age
from vt_emp_temp2
union
select emp_name,emp_age 
from vt_emp_perm
union 	
select emp_name,emp_age
from vt_emp_temp
;
emp_nameemp_age
Alexandra29
Emma25
Emma28
George35
Katherine29
Mark30
Mia40
Noah23
Oscar40
Roger35
Sophia23

In this post, we saw few examples of UNION , UNION ALL. The difference between two SET operators – UNION removes duplicate records where as UNION ALL does not. UNION ALL is faster than UNION because of missing duplicate check. We also saw few examples of common errors in UNION/UNION ALL. We also see how the output metadata is determined in UNION/UNION ALL which may result in data truncation in output. Also we saw what are the possible solutions to data/string truncation error in UNION/UNION ALL output.

1 thought on “Teradata UNION vs UNION ALL”

Leave a Reply

Your email address will not be published. Required fields are marked *