Teradata uses a HASHING algorithm on PRIMARY INDEX to distribute rows among various AMP’s. The process of rows distribution is the core reason behind the parallel architecture of TERADATA.
TERADATA uses indexes to determine the distribution of rows. Teradata uses a hashing algorithm which processes the index and gives the HASH VALUE. Depending on the HASH VALUE , it refers to HASH MAP to decide the HASH BUCKET and hence HASH AMP. That particular AMP will store that record. Similary there are other AMP’s also receiving their share of records. So each record is stored in specific AMP depending on the HASH VALUE. This is the reason it is suggested that the columns with more unique values and used in joins etc are preferable index columns. So whenever set of records are received , index columns are processed and are stored in respective AMP’s. Since the work is distributed between AMP’s, TERADATA is so swift. So we can say TERADATA is as fast as its slowest AMP.
TERADATA is unique because of it’s parallel acrhitecture. TERADATA rows distribution is important to understand as it will help us in realizing better index columns. One must also check for AMP skewness. In case of improper index column selection , uneven distribution of rows may occur; which will eventually lead to more work for some specific AMP while other AMP’s would be free. Hence increasing turnaround time. AMP skewness can be check by using simple HASH functions available in TERADATA. To understand HASH Functions and ROWS Distributions in Teradata refer to below query which try to process a record with index column value as ‘NITIN’:
SELECT HASHROW ('NITIN') AS "ROW HASH" ,HASHBUCKET (HASHROW ('NITIN')) AS "HASH BUCKET" ,HASHAMP (HASHBUCKET (HASHROW ('NITIN'))) AS "HASH AMP";
Teradata Data distribution directly impacts performance of queries executed on any table. So selecting a good Primary Index is of utmost importance in any Teradata environment.