In Data-warehouse, Grouping means aggregating values to calculate some meaningful measures. In EDW, we have fact tables which holds measures and calculate some aggregates like SUM, MAX, MIN, AVG etc. In most of the cases, we will use GROUP BY clause however in some cases you may want to calculate measures over different axis. It can be single column or multiple columns combined. This is very useful when you are creating data cubes where each dimension will show you different information and you can drill down to get more info.
In Teradata, GROUP BY GROUPING SETS allows you to do exactly same. In a single query you can calculate multiple aggregates over the same data. If you are familiar with GROUP BY but not sure about GROUPING SETS then I will ask you to remember it like multiple GROUP BY in a single query. Let’s see this with a simple example:
I have a table which keeps store sales data. Now I want to calculate how much is total sales of entire store for SPORTS category items only. Also I want to calculate under Sports Category which class or sub-category items exists and what is total sales of individual sub category. Now if you will try this Query then you may have to create 2 separate SELECT statement. First one to calculate total sales for Sports Category and Second to calculate total sales for Sports sub-categories. With GROUPING SETS you can achieve this in a single query and with perhaps better performance than 2 separate GROUP BY queries.
select i_category
,i_class as sub_category
,sum(ss_sales_price) as tot_sales
from date_dim dt
inner join
store_sales
on dt.d_date_sk = store_sales.ss_sold_date_sk
inner join item
on
store_sales.ss_item_sk = item.i_item_sk
where d_date between ‘2000-01-01’ and ‘2000-12-31’
and i_category in (‘Sports’) and i_class is not null
group by grouping sets ((i_category,i_class),(i_category))
row id | i_category | sub_category | tot_sales |
1 | Sports | ? | 1995374.18 |
2 | Sports | archery | 106090.43 |
3 | Sports | athletic shoes | 147960.17 |
4 | Sports | baseball | 114923.36 |
5 | Sports | basketball | 108937.63 |
6 | Sports | camping | 133417.13 |
7 | Sports | fishing | 132074.22 |
8 | Sports | fitness | 135305.84 |
9 | Sports | football | 111123.26 |
10 | Sports | golf | 152849.03 |
11 | Sports | guns | 130239.52 |
12 | Sports | hockey | 106000.67 |
13 | Sports | optics | 143145.48 |
14 | Sports | outdoor | 145811.97 |
15 | Sports | pools | 137725.11 |
16 | Sports | sailing | 97474.83 |
17 | Sports | tennis | 92295.53 |
In the above output, rowid:1 is total sales for Sports Category items. The NULL in sub-category tells that it is aggregated as Category level only. The remaining rows corresponds to second GROUP SET which is category + class.
Hope this helps.