In this post I have shared the key differences between star schema and snowflake schema.
Star schema is the most commonly used approach to design & develop data warehouse. It is simple to implement in which we have centralised one or more FACT Tables surrounded by multiple dimension tables.
The fact table source keys from respective dimension tables.
In the diagram , there is centralised FACT table sourcing keys from different dimension tables. I want you to focus on employee dimension table and the columns in that table. When you check the sample rows in the table, you can identify redundant or repeated pattern of data for the rows.
Let's look into the Snowflake schema now.
The snowflake schema normalises the Dimension tables to create sub-dimension tables. Thereby reducing redundant data. In this case, you can divide employee dimension table into a sub-dimension table office which will store office related information.
The key column from sub dimension table will be sourced into employee dimension table.
You can see that the redundant column groups are removed from employee dimension and replaced by a single office id column.
PROS of Snowflake Schema:
- Reduced data redundancy thereby saving storage space and strengthening data integrity
- The normalised structure lets you organised data into more efficient design
- UPDATE queries may become faster as less number of rows require update now
CONS of Snowflake Schema:
- Extra sub-dimension table requires additional maintenance script to refresh data
- More complicated than STAR schema to implement
- SELECT queries may become slower because of extra join condition to include sub-dimension tables