Redshift SELECT Distinct is used to fetch unique rows from the table. DISTINCT is applied to all the columns mentioned in the SELECT clause and not just the first column.
The example fetches DISTINCT values for one column in the table
select distinct catgroup from category;
catgroup |
---|
Sports |
Shows |
Concerts |
redshift select distinct multiple columns
You can apply DISTINCT on multiple columns to fetch unique rows considering all the columns and not just the first column in SELECT list.
select distinct catid,catgroup from category;
catid | catgroup |
---|---|
1 | Sports |
2 | Sports |
3 | Sports |
4 | Sports |
5 | Sports |
6 | Shows |
7 | Shows |
8 | Shows |
9 | Concerts |
10 | Concerts |
11 | Concerts |
redshift select distinct on is not supported
Redshift does not support DISTINCT ON clause. In other databases you can specify which row to pick if duplicate exists using DISTINCT ON clause. In redshift, select distinct on is not supported.
select distinct on (catgroup) from category order by catgroup,catid; --ERROR: SELECT DISTINCT ON is not supported
redshift select distinct count
In Redshift you can calculate count of distinct values for any column. Count Distinct is supported for only one column in Redshift. If you will list more than one column in DISTINCT in COUNT function then redshift will throw error.
select count(distinct catid) from category; --11 select count(distinct catgroup) from category; --3 select count(distinct catgroup,catid) from category; --ERROR: function count(character varying, smallint) does not exist Hint: No function matches the given name and argument types. You may need to add explicit type casts.
redshift select distinct vs group by
Is DISTINCT slower than GROUP BY ? There is no factual numbers which can support than DISTINCT is slower than GROUP BY in Redshift. If you will compare explain plan for GROUP BY and DISTINCT then the result will be absolutely same in most of the cases. In some databases, explain plan may differ and Group By may be the more preferred one.
But with new databases like Redshift the SQL execution plan is created and most optimised one is picked. In Redshift it does not make much difference.
One rule which I follow is if I am aware than there are less records and more unique values I may be ok with DISTINCT clause else I will prefer writing Group By clause.