How Primary Index or Secondary Index affects the query response ?

We all know the difference between Primary Index and Secondary Index. However let us see how using either or none in the query can make difference.

Primary Index helps in data distribution and data retrieval from the table. Secondary Index provides an alternate path to fetch the data and helps in avoiding full table scan.
Primary Index is 1 AMP operation where as Unique Secondary Index is 2 AMP operation.
Non-Unique Secondary index is an ALL AMP operation.
Also how collect stats can help while using Non Unique Secondary Index.

Let us see this with a very simple example.

Create a table first.

CREATE MULTISET VOLATILE TABLE vt_student , NO LOG
(
student_id INTEGER NOT NULL,
student_name VARCHAR(50),
student_age INTEGER,
student_country VARCHAR(20)
)
UNIQUE PRIMARY INDEX(student_id)
UNIQUE INDEX (student_name)
INDEX(student_age)
ON COMMIT PRESERVE ROWS;

Insert some dummy records in it.

INSERT INTO VT_STUDENT SEL 1,'WILLIAM',12,'USA';
INSERT INTO VT_STUDENT SEL 2,'MIKE',13,'INDIA';
INSERT INTO VT_STUDENT SEL 3,'ZACH',14,'CANADA';
INSERT INTO VT_STUDENT SEL 4,'ANDY',15,'SPAIN';

Now run the query using UNIQUE PRIMARY INDEX
EXPLAIN SEL * FROM VT_STUDENT WHERE STUDENT_ID=1;

1) FIRST, we DO a single-AMP RETRIEVE step FROM VT_STUDENT
BY way OF the UNIQUE PRIMARY INDEX "VT_STUDENT.student_id
= 1" WITH NO residual conditions. The estimated TIME FOR this
step IS 0.00 seconds.
-> The ROW IS sent directly back TO the USER AS the RESULT OF
STATEMENT 1. The total estimated TIME IS 0.00 seconds.

We have used UNIQUE PRIMARY INDEX in the query. This results in single-AMP fetch from table.

Now run the query using UNIQUE SECONDARY INDEX
EXPLAIN SEL * FROM VT_STUDENT WHERE student_name='WILLIAM';

1) FIRST, we DO a two-AMP RETRIEVE step FROM VT_STUDENT BY
way OF UNIQUE INDEX # 4 "VT_STUDENT.student_name =
'WILLIAM'" WITH NO residual conditions. The estimated TIME FOR
this step IS 0.00 seconds.
-> The ROW IS sent directly back TO the USER AS the RESULT OF
STATEMENT 1. The total estimated TIME IS 0.00 seconds.

We have used UNIQUE SECONDARY INDEX in the query. This results in two-AMP fetch from table.

Now run the query using NON UNIQUE SECONDARY INDEX
EXPLAIN SEL * FROM VT_STUDENT WHERE student_age=12;

1) FIRST, we DO an ALL-AMPs RETRIEVE step FROM VT_STUDENT BY
way OF an ALL-ROWS scan WITH a CONDITION OF (
"VT_STUDENT.student_age = 12") INTO SPOOL 1 (group_amps),
which IS built locally ON the AMPs. The SIZE OF SPOOL 1 IS
estimated WITH LOW confidence TO be 5 ROWS (235 BYTES). The
estimated TIME FOR this step IS 0.03 seconds.
2) Finally, we send OUT an END TRANSACTION step TO ALL AMPs involved
IN processing the REQUEST.
-> The contents OF SPOOL 1 are sent back TO the USER AS the RESULT OF
STATEMENT 1. The total estimated TIME IS 0.03 seconds.

We have used SECONDARY INDEX in the query. This results in ALL-AMP fetch from table and the confidence level is LOW.
Now do collect stats on NUSI column and see what difference it makes.

COLLECT STATS ON VT_STUDENT COLUMN(student_age);

Now run the same query using NUSI
EXPLAIN SEL * FROM VT_STUDENT WHERE student_age=12;

1) FIRST, we DO an ALL-AMPs RETRIEVE step FROM VT_STUDENT BY
way OF an ALL-ROWS scan WITH a CONDITION OF (
"VT_STUDENT.student_age = 12") INTO SPOOL 1 (group_amps),
which IS built locally ON the AMPs. The SIZE OF SPOOL 1 IS
estimated WITH HIGH confidence TO be 2 ROWS (94 BYTES). The
estimated TIME FOR this step IS 0.03 seconds.
2) Finally, we send OUT an END TRANSACTION step TO ALL AMPs involved
IN processing the REQUEST.
-> The contents OF SPOOL 1 are sent back TO the USER AS the RESULT OF
STATEMENT 1. The total estimated TIME IS 0.03 seconds.

We have used SECONDARY INDEX in the query. This results in ALL-AMP fetch from table and the confidence level changes to HIGH after stats collection.

Now run the query using column which has no index defined on it.
EXPLAIN SEL * FROM VT_STUDENT WHERE STUDENT_COUNTRY='USA';

1) FIRST, we DO an ALL-AMPs RETRIEVE step FROM VT_STUDENT BY
way OF an ALL-ROWS scan WITH a CONDITION OF (
"VT_STUDENT.student_country = 'USA'") INTO SPOOL 1 (group_amps),
which IS built locally ON the AMPs. The SIZE OF SPOOL 1 IS
estimated WITH NO confidence TO be 1 ROW (55 BYTES).
The estimated TIME FOR this step IS 0.03 seconds.
2) Finally, we send OUT an END TRANSACTION step TO ALL AMPs involved
IN processing the REQUEST.
-> The contents OF SPOOL 1 are sent back TO the USER AS the RESULT OF
STATEMENT 1. The total estimated TIME IS 0.03 seconds.

We have used column which is not any index in the query. This results in ALL-AMP fetch from table and there is NO confidence level.

We can see it in above examples how explain plan changes with changing index columns in queries.

Leave a Reply

Your email address will not be published. Required fields are marked *