How Teradata distribute rows ? It is very important to understand how Teradata actually distribute rows among AMPs. We all know that PRIMARY INDEX is used for data distribution. All the AMPs in the Teradata system maintains a portion of each table and this forms the basis of 'parallel architecture and share nothing' architecture of Teradata. We will see how exactly it happens and what all hash functions are used to determine the respective AMP which will hold the table.
There are three main functions:
- HASHROW: The function returns 4 BYTE output for each input value. So the primary index value is passed through HASHROW function to generate output which is further passed to HASHBUCKET. Example: If MARK is my primary index value then HASHROW will return output something like below:
- HASHBUCKET: The function returns the bucket number in HASHMAP which will hold the table row. The input to HASHBUCKET is 4 BYTE HASHROW output and it returns the BUCKET#.
- HASHAMP: The function returns the AMP number which will hold the table row in its vdisk. This AMP will be responsible for the portion of table rows which it stores in it vdisk. It takes HASHBUCKET# as the input and returns the AMP number.
All the three Teradata HASH functions work together to determine which AMP in Teradata will store what portion of table data. I will strongly recommend to read the other post which explains how Teradata distribute rows to completely understand how important these HASH functions are in Teradata.