ETL SQL

Learn Extract Transform Load using SQL & PySpark

Teradata combine multiple rows into one row

We have couple of functions available which can help in combining rows into a single line. In this post, we will discuss about 2 such UDF which can give single value for multiple rows.

First, XMLAGG: With Teradata supporting various operations on XML datatype now, we can leverage this function to combine multiple rows. Let’s quickly look into examples with syntax:

select w_state,w_warehouse_sq_ft from warehouse;
w_statew_warehouse_sq_ft
TNNULL
TN294242
TN977787
TN621234
TN138504

Now I want to combine all the warehouse area into a single value separated by comma for each state.

select w_state,
xmlagg(trim(coalesce(w_warehouse_sq_ft,-1)||',') ) (varchar(50)) 
as w_warehouse_sq_ft_list
 from warehouse group by w_state;
w_statew_warehouse_sq_ft_list
TN-1, 294242, 977787, 621234, 138504,

This looks wonderful. However I can see “,” at the end which I don’t want. So I can either use “SUBSTR” function or “TRIM” function to remove it. Let’s see the syntax:

select w_state,
xmlagg(trim(coalesce(w_warehouse_sq_ft,-1)||',') ) (varchar(50)) 
as w_warehouse_sq_ft_list
 from warehouse group by w_state;
w_statew_warehouse_sq_ft_list
TN-1, 294242, 977787, 621234, 138504

The output looks good now but the query is little confusing. So let’s make it little more clean.

select
	w_state,
	substr(w_warehouse_sq_ft,
	1,
	length(w_warehouse_sq_ft)-2) as w_warehouse_sq_ft_list
from
	(
	select
		w_state,
		xmlagg(trim(coalesce(w_warehouse_sq_ft,-1)|| ',') )(varchar(50)) as w_warehouse_sq_ft
	from
		warehouse
	group by
		w_state )tb1;
w_statew_warehouse_sq_ft_list
TN-1, 294242, 977787, 621234, 138504

We can also use TRIM “trailing” function to remove last “,” from the output.

select
	w_state,
	trim(trailing ',' from ( xmlagg(trim(coalesce(w_warehouse_sq_ft,-1)|| ',') ) (varchar(50)) )) as w_warehouse_sq_ft_list
from
	warehouse
group by
	w_state;
w_statew_warehouse_sq_ft_list
TN-1, 294242, 977787, 621234, 138504

Now this looks perfect as per my requirement. So you can use SUBSTR or TRIM , either of the function to remove last “,” from output. Now let’s quickly look into the other method of combining rows using different UDF.

Second, tdstats.udfconcat. This is not the common UDF and you may not have access to it. So in that case you can either ask your DBA to grant privilege to this UDF or create replica of this UDF which you can use. Let’s look at the syntax for this UDF now.

select
	w_state,
	tdstats.udfconcat(trim(coalesce(w_warehouse_sq_ft,-1))) as w_warehouse_sq_ft_list
from
	warehouse
group by
	w_state;
w_statew_warehouse_sq_ft_list
TN-1,”294242″,”977787″,”621234″,”138504″

This looks good and there is no “,” at the end too. But I don’t want double quotes around values. So let’s remove it using regexp_replace.

select
	w_state,
	regexp_replace( tdstats.udfconcat(trim(coalesce(w_warehouse_sq_ft,-1))) ,
	'"',
	'',
	1,
	0,
	'i') as w_warehouse_sq_ft_list
from
	warehouse
group by
	w_state;
w_statew_warehouse_sq_ft_list
TN-1, 294242, 977787, 621234, 138504

Perfect. We have coalesce in all the queries because there is one value which is NULL and we wanted to keep it as “-1”. Hence used coalesce.

Let me know if it helps you or if you have any query, feel free to leave a comment.

Leave a Reply

Your email address will not be published. Required fields are marked *