In this post, I will show you how you can use SELECT statements in Redshift to query data from tables. SELECT is used to fetch one or more columns directly from the table. The SELECT is one of the most simple and yet challenging at times statement in Redshift. SELECT can become complex while creating derived columns sometimes.
Redshift SELECT statement to fetch one column
The example fetches only one column as-is from existing table in Redshift.
select catdesc from category;
OUTPUT |
---|
catdesc |
Major League Baseball |
National Football League |
Musical theatre |
All opera and light opera |
All symphony, concerto, and choir concerts |
National Hockey League |
National Basketball Association |
Major League Soccer |
All non-musical theatre |
All rock and pop music concerts |
All jazz singers and bands |
Redshift SELECT statement to fetch multiple columns
You can also specify multiple columns in the SELECT statement in Redshift to fetch the values from existing table.
select catid, catgroup, catname, catdesc from category;
OUTPUT | |||
---|---|---|---|
catid | catgroup | catname | catdesc |
1 | Sports | MLB | Major League Baseball |
2 | Sports | NHL | National Hockey League |
3 | Sports | NFL | National Football League |
4 | Sports | NBA | National Basketball Association |
5 | Sports | MLS | Major League Soccer |
6 | Shows | Musicals | Musical theatre |
7 | Shows | Plays | All non-musical theatre |
8 | Shows | Opera | All opera and light opera |
9 | Concerts | Pop | All rock and pop music concerts |
10 | Concerts | Jazz | All jazz singers and bands |
11 | Concerts | Classical | All symphony, concerto, and choir concerts |
Redshift SELECT statement to fetch all columns
You can also fetch all the columns in a table by using "*" along with SELECT or by specifying the column list separated by comma.
select * from category;
OUTPUT | |||
---|---|---|---|
catid | catgroup | catname | catdesc |
1 | Sports | MLB | Major League Baseball |
2 | Sports | NHL | National Hockey League |
3 | Sports | NFL | National Football League |
4 | Sports | NBA | National Basketball Association |
5 | Sports | MLS | Major League Soccer |
6 | Shows | Musicals | Musical theatre |
7 | Shows | Plays | All non-musical theatre |
8 | Shows | Opera | All opera and light opera |
9 | Concerts | Pop | All rock and pop music concerts |
10 | Concerts | Jazz | All jazz singers and bands |
11 | Concerts | Classical | All symphony, concerto, and choir concerts |
Redshift SELECT statement with DERIVED columns
select catid, catdesc, case when catdesc like '%Baseball%' then 'Baseball' when catdesc like '%Basketball%' then 'Basketball' else 'Other Sports' end as Sports from category where catgroup = 'Sports';
OUTPUT | ||
---|---|---|
catid | catdesc | sports |
1 | Major League Baseball | Baseball |
2 | National Hockey League | Other Sports |
3 | National Football League | Other Sports |
4 | National Basketball Association | Basketball |
5 | Major League Soccer | Other Sports |
Redshift SELECT statement without FROM clause
It is not mandatory to have a FROM clause in the SELECT statement in Redshift. If you are not fetching any data from any table and doing just some operation on STATIC data or HARDCODED values then you can skip FROM clause in Redshift.
SELECT 20 + 3 as col1, 'Hello from Redshift' as col2;
OUTPUT | |
---|---|
col1 | col2 |
23 | Hello from Redshift |
Note: Although it may seem comfortable to use * in place of Column list it is not the recommended approach. Two main reasons for not using it are:
1) Failure in future if DDL changes
2) SQL Query Performance
The reason is if tomorrow the table structure is modified like new column added or existing column removed select * may result in updated row structure in result-set. If any downstream application is using select * then it may end up having extra or less number of columns than expected. Thereby resulting in failure.
Specially with Redshift which is a columnar database specifying column list is a much better approach than mentioning select *. Redshift can speed up the performance by selecting only the required columns in the given query and not reading entire row if not required.