Redshift Select

In this post, I will show you how you can use SELECT statements in Redshift to query data from tables. SELECT is used to fetch one or more columns directly from the table. The SELECT is one of the most simple and yet challenging at times statement in Redshift. SELECT can become complex while creating derived columns sometimes.

Redshift SELECT statement to fetch one column

The example fetches only one column as-is from existing table in Redshift.

select catdesc from category;
OUTPUT
catdesc
Major League Baseball
National Football League
Musical theatre
All opera and light opera
All symphony, concerto, and choir concerts
National Hockey League
National Basketball Association
Major League Soccer
All non-musical theatre
All rock and pop music concerts
All jazz singers and bands

Redshift SELECT statement to fetch multiple columns

You can also specify multiple columns in the SELECT statement in Redshift to fetch the values from existing table.

select catid,
    catgroup,
    catname,
    catdesc
from category;
OUTPUT
catidcatgroupcatnamecatdesc
1SportsMLBMajor League Baseball
2SportsNHLNational Hockey League
3SportsNFLNational Football League
4SportsNBANational Basketball Association
5SportsMLSMajor League Soccer
6ShowsMusicalsMusical theatre
7ShowsPlaysAll non-musical theatre
8ShowsOperaAll opera and light opera
9ConcertsPopAll rock and pop music concerts
10ConcertsJazzAll jazz singers and bands
11ConcertsClassicalAll symphony, concerto, and choir concerts

Redshift SELECT statement to fetch all columns

You can also fetch all the columns in a table by using "*" along with SELECT or by specifying the column list separated by comma.

select * from category;
OUTPUT
catidcatgroupcatnamecatdesc
1SportsMLBMajor League Baseball
2SportsNHLNational Hockey League
3SportsNFLNational Football League
4SportsNBANational Basketball Association
5SportsMLSMajor League Soccer
6ShowsMusicalsMusical theatre
7ShowsPlaysAll non-musical theatre
8ShowsOperaAll opera and light opera
9ConcertsPopAll rock and pop music concerts
10ConcertsJazzAll jazz singers and bands
11ConcertsClassicalAll symphony, concerto, and choir concerts

Redshift SELECT statement with DERIVED columns

select
	catid,
	catdesc,
	case
		when catdesc like '%Baseball%' then 'Baseball'
		when catdesc like '%Basketball%' then 'Basketball'
		else 'Other Sports'
	end as Sports
from
	category
where
	catgroup = 'Sports';
OUTPUT
catidcatdescsports
1Major League BaseballBaseball
2National Hockey LeagueOther Sports
3National Football LeagueOther Sports
4National Basketball AssociationBasketball
5Major League SoccerOther Sports

Redshift SELECT statement without FROM clause

It is not mandatory to have a FROM clause in the SELECT statement in Redshift. If you are not fetching any data from any table and doing just some operation on STATIC data or HARDCODED values then you can skip FROM clause in Redshift.

SELECT
	20 + 3 as col1,
	'Hello from Redshift' as col2;
OUTPUT
col1col2
23Hello from Redshift

Note: Although it may seem comfortable to use * in place of Column list it is not the recommended approach. Two main reasons for not using it are:
1) Failure in future if DDL changes
2) SQL Query Performance

The reason is if tomorrow the table structure is modified like new column added or existing column removed select * may result in updated row structure in result-set. If any downstream application is using select * then it may end up having extra or less number of columns than expected. Thereby resulting in failure.

Specially with Redshift which is a columnar database specifying column list is a much better approach than mentioning select *. Redshift can speed up the performance by selecting only the required columns in the given query and not reading entire row if not required.

Leave a Reply

Your email address will not be published. Required fields are marked *