Teradata interview questions: Explain Secondary Index

There are up to 32 unique and non-unique secondary indexes can be defined for a table. Comparing to primary indexes, Secondary indexes allow access to information in a table by alternate, less frequently used paths. A secondary index is a subtable that is stored in all AMPs, but separately from the primary table. The subtables, which are built and maintained by the system, contain the following;

RowIDs of the subtable rows
Base table index column values
RowIDs of the base table rows (points)

As shown in the following figure, the secondary index subtable on each AMP is associated with the base table by the rowID.

Secondary index are optional. Unlike the primary index, a secondary index can be added or dropped without recreating the table. There can be one or more secondary indexes in the CREATE TABLE statement, or add them to an existing table using the CREATE INDEX statement or ALTER TABLE statement. DROP INDEX can be used to dropping a named or unnamed secondary index. Since secondary indexes require subtables, these subtables require additional disk space and, therefore, may require additional I/Os for INSERTs, DELETEs, and UPDATEs.

If a Teradata SQL request uses secondary index values in a WHERE constraint, the optimizer may use the rowID in a secondary index subtable to access the qualifying rows in the data table. If a secondary index is used only periodically by certain applications and is not routinely used by most applications, disk space can be saved by creating the index when it is needed and dropping it immediately after use.

A unique secondary index is very efficient, it typically allows access of only two AMPs, requires no spool file, and has one row per value, therefore, when a unique secondary index is used to access a row, two AMPs are involved. Unique secondary indexes can thus improve performance by avoiding the overhead of scanning all AMPs. For example, if a unique secondary index is defined on the department_name column of the Customer_service.department table (assuming that no two departments have the same name), then the following query is processed using two AMPs:

SELECT department_number FROM customer_service.department
        WHERE department_name = 'Education';

In this example, the request is sent to AMP n, which contains the rowID for the secondary index value "Education", this AMP, in turn, sends the request to AMP m, where the data row containing that value is stored. Note that the rowID and the data row may reside on the same AMP, in which case only one AMP is involved.

A non-unique secondary index (NUSI) may have multiple rows per value. As a general rule, the NUSI should not be defined if the maximum number of rows per value exceeds the number of data blocks in the table. A NUSI is efficient only if the number of rows accessed is a small percentage of the total number of data rows in the table. It can be useful for complex conditional expressions or processing aggregates. For example, if the contact_name column is defined as a secondary index for the customer_service.contact table, the following statement can be processed by secondary index:

    SELECT * FROM customer_service.contact
        WHERE contact_name = 'Mike';

After request is submitted, the optimizer first will determine if it is faster to do a full-table scan of the base table rows or a full-table scan of the secondary index subtable to get the rowIDs of the qualifying base table rows; then place those rowIDs into a spool file; finally use the resulting rowIDs to access the base table rows.
Non-unique secondary indexed accessed is used only for request processing when it is less costly than a complete table search.

Teradata allows the use of Primary and Secondary Indexes, however, there are few differences between the two. In Teradata, Primary Indexes are required and Secondary Indexes are optional.

Each table must have a Primary Index for distributing the rows among the VPROCs. Each table can have only one Primary Index, but it can have up to 32 Secondary Indexes. Both Primary and Secondary Indexes may contain up to 16 columns, and they both may be unique or non-unique.

Row distribution among VPROCs is only affected by the Primary Index not the Secondary Index. Secondary Indexes may be created and dropped at any time, but if the Primary Index must be changed or dropped, the table must be dropped and recreated.   Again, both the Primary and Secondary Indexes will affect the table performance, so poorly chosen indexes can cause some VPROCs to work harder than others.

Teradata interview questions

Tuesday, 18 October 2011

Explain Secondary Index

1 comment: