ETL SQL

Learn Extract Transform Load using SQL & PySpark

Teradata SQL Query Performance Check & Measures

One very common question people ask is about possible reasons for long running queries in Teradata. First thing you should understand is Teradata is very good in handling huge volume of data and handling several million or even billion records should not break it. If your query is taking unusual long time then your query needs tuning. Listing down few most common reasons for long running queries in Teradata:

  1. Bad Query: This is the most common reason for long running query. Your query is not using any kind of index: Primary Index, Secondary Index or Partition Primary Index. The tables are getting redistributed and those big tables taking long time in it.
  2. Delay Time: Check for Delay time in DBQL for your query. It may happen that query execution time is in seconds however it was in Delay queue for hours.
  3. Blocking: It may happen that your query is blocked by some other query. Also watch out for TDM jobs. Such jobs generally apply Exclusive locks on the table thereby blocking all other queries requesting the same table. You can check this Teradata Viewpoint Monitor Portlet.
  4. Server State: Generally Teradata Server status is classified as Healthy, Degraded, Critical and Down. As the number of concurrent users increase the load on server also increases. With more load on server and AMP doing more work, your query may take more time than usual time.
  5. Skewness: Your Source table is highly skewed because the Primary Index chosen is not good. Consider changing PI for the table if skewness factor is high.

One thing which can help you understand everything about your query is Teradata Viewpoint. So use it as much as possible and it should answer most of your questions.

Before we actually start with SQL Performance Tuning we should know how to measure performance of SQL Query. The most common answer we get is the Query which takes most time is the worst and we should optimize it.

This should not hold good for all the cases. There can be various reasons possible for long running queries. Now coming back to the question if it is not the real time in seconds, minutes or hours then what should be  the deciding criteria to determine query eligible for SQL optimization ?

The first thing you should check is AMPCPUTIME This is time taken by AMPs used by query in CPU seconds. This is different than your real life time units. Higher the AMPCPUTIME means AMPs have worked more on processing the query and used more CPU seconds.

The other parameters you can check for is TotalIOCount. This is total I/O used by the query. If you are reading data from disk a lot then this will be high. One possible reason could be if you are not retrieving data efficiently from the disk. Now we know the parameters which can help in determining suitable candidates for SQL Query Optimization.

Next question is where to find these parameters ? For this you have to query DBQLogTbl & DBQLSQLTBL tables in DBC. You can use below query to get the result:

SELECT
TB1.AMPCPUTime,TB1.SpoolUsage,TB1.TotalIOCount,
SUBSTR(TB2.SqlTextInfo,1,1000) AS SqlTextInfo
FROM
DBC.DBQLOGTBL TB1
INNER JOIN
DBC.DBQLSQLTBL TB2
ON
TB1.QueryID = TB2.QueryID
AND
TB1.ProcID = TB2.ProcID
AND
TB1.SessionID=(select session);

The query may take sometime to show the results. Also it is advisable to check with DBA team to understand the cache flushing mechanism used in Teradata. This means it may take up to 10 mins for your query to reflect in Query Log. Generally it is kept @ 10 minutes.

Calculate hourly resource consumption of Teradata SQL query

Recently Raghav reached out to us on our Facebook page asking for a SQL query in Teradata to identify the highest AMPCPU, I/O and SPOOL usage for each hour during a given time period. He was actually working on some activity to find out the consumption of resources on Teradata box on hourly basis. The purpose was to find out time range when Teradata is most heavily loaded and when it is relatively free. He was looking for information in hourly manner for any given day or multiple days. I shared with him the below mentioned query and he was happy with the output as it was exactly what he was looking out for.

SELECT
CAST(TB1.STARTTIME AS VARCHAR(13)) AS COL1, MAX(TB1.AMPCPUTIME),MAX(TB1.SPOOLUSAGE),MAX(TB1.TOTALIOCOUNT)
FROM
DBC.DBQLOGTBL TB1
INNER JOIN
DBC.DBQLSQLTBL TB2
ON
TB1.QUERYID = TB2.QUERYID
AND
TB1.PROCID = TB2.PROCID
AND
TB1.STARTTIME BETWEEN '2017-06-12 00:00:00' AND '2017-06-14 00:00:00'
GROUP BY 1
ORDER BY 1;

Sample Output:

Date HourAMPCPUSpoolI/O
2017-06-12 090.0218432373
2017-06-12 100.43184321784
2017-06-12 111.121346170883519
2017-06-12 122.851346170883543
2017-06-12 130.156174721083
2017-06-12 140.147731205221
2017-06-12 150.146568961149
2017-06-12 160.4895764484252
2017-06-12 171.8520480033051
2017-06-12 180.046174721066

I thought of sharing this with other readers too in case they also come across similar requirement then they can use this query. Also his requirement was basically to find out the highest AMPCPU, SPOOL and I/O consumption however if you want to see the actual resource consumption then just replace the MAX function with SUM function in above query. This should give you total AMPCPU,SPOOL and I/O consumed on hourly basis.

Sample Output:

Date HourAMPCPUSpoolI/O
2017-06-12 094.047709696122626
2017-06-12 104.47692288122957
2017-06-12 1110.67858554880142473
2017-06-12 1222.321962375168190284
2017-06-12 133.54803174491908
2017-06-12 142.36688844871100
2017-06-12 153.67745024102353
2017-06-12 165.68406583808111934
2017-06-12 174.885433344115486
2017-06-12 183.171628672095226

How do you calculate hourly consumption of Teradata resources ? Will this query work for you ? Share your feedback with us.


Leave a Reply

Your email address will not be published. Required fields are marked *