Database, SQL and PL/SQL

Rapid Retrieval of Rows in Small Data Sets

Part 8 in a second series on the basics of the relational database and SQL

By Melanie Caffrey

March/April 2017

This article is the eighth in a series that helps you build on the fundamentals you learned in the 12-part SQL 101 series in Oracle Magazine. The previous Beyond SQL 101 article, “Sequential Additions and Different Points of View,” illustrated more about data definition language (DDL), including how to drop and rename tables and how to recover tables by using the recycle bin. You also learned how purging syntax affects table recovery and found out about the difference between truncating and dropping a table. You discovered how to use virtual columns to help simplify query writing. You also discovered how sequences and IDENTITY columns can be used for generating surrogate key values. Finally, you were introduced to views and how they can assist with query writing and data hiding.

In this article, you’ll learn the following about indexes:

How to create a balanced tree (B-tree) index
The difference between individual indexes and composite indexes
The relationship between indexes and constraints
The relationship between indexes and NULL values
How to create a function-based index

You’ll also be introduced to altering and dropping indexes. Last, you’ll discover how you can employ various individually tailored indexing strategies for your different application needs.

To try out the examples in this series, you need access to an Oracle Database instance. If necessary, download and install an Oracle Database edition for your operating system. I recommend installing Oracle Database, Enterprise Edition 12c Release 1 (12.1.0.2.0). If you install the Oracle Database software, choose the installation option that enables you to create and configure a database. A new database, including sample user accounts and their associated schemas, will be created for you. (Note that SQL_201 is the user account to use for the examples in this series; it’s also the schema in which you’ll create database tables and other objects.) When the installation process prompts you to specify schema passwords, enter and confirm passwords for the SYS and SYSTEM users and make a note of them.

Finally—whether you installed the database software from scratch or have access to an existing Oracle Database instance—download, unzip, and execute the SQL script to create the tables for the SQL_201 schemas used for this article’s examples. (View the script in a text editor for execution instructions.)

Boosting Your Performance

As your tables grow with each additional row inserted into them, your users’ ability to quickly retrieve information can degrade, particularly if users perform changes (inserts, updates, deletes) on your tables frequently, as often happens with highly transactional tables. To help speed up data retrieval, Oracle Database sometimes uses indexes. Oracle indexes work in a manner similar to the index you might find in the back of a book. Just as you use a book index to determine the fastest way to obtain a particular piece of information, you can employ an Oracle index to help your users quickly obtain the relevant row(s) containing their desired query results. If an index that can assist with quickly selecting requested rows does not exist on a table, Oracle Database must examine every row to determine its eligibility for inclusion in the returned result set.

The act of examining every row is called a full table scan. If your query warrants the return of many row values from your table, it might be more efficient to scan the entire table to return the row values than to look up and obtain row values from an index. For each change you issue against a table that changes a value in an indexed column, the corresponding index must also be changed. Therefore, your indexing strategies will require some planning and analysis, because it can take a significant amount of time and storage space to build and maintain each index.

The Oracle Optimizer feature of Oracle Database uses statistics to determine whether to use a full table scan or an index for queries where an index is available. The next article in this series will introduce you to the Oracle Optimizer and statistics, but these two topics are beyond the scope of the current article.

Striking a Balance

The most common type of Oracle index is the balanced tree, or B-tree, index. It is often used to increase performance in situations where there are many distinct values in a single column or set of columns. Columns that have many distinct values are referred to as having high selectivity. Columns that contain primary key values and dates are examples of columns that have many distinct values and have, therefore, high selectivity. You can use B-tree indexes when you want to perform an exact match to retrieve a particular row value or when you want to perform a range search (for example, you’d like to retrieve all employees hired between 01-JANUARY-2010 and 15-JULY-2011). The structure of a B-tree index looks similar to Figure 1.

The structure looks like an inverted tree and consists of two types of blocks: root/branch blocks and leaf blocks. Root and/or branch blocks (parent blocks) store the index key (such as a range of date values), along with a pointer to a leaf block (child block) containing the key value (such as a date value) along with the ROWID for any row that matches the key value. Every row has a ROWID. A ROWID is a unique address to a particular row that indicates the row’s physical storage location in a table. A ROWID is a pseudocolumn—much like the ROWNUM or SYSDATE pseudocolumns—in that it is not an actual column but can be referenced like one.

The query in Listing 1 illustrates ROWID values returned in a query from the EMPLOYEE table. Note how each ROWID value returned in the result set is distinct from every other ROWID value. A ROWID is always unique, and using it is the fastest way to access a row. You can also update or delete a row by using the row’s ROWID in the WHERE clause, thereby avoiding having to make Oracle Database search an index or perform a full table scan.

Code Listing 1: Selecting ROWID values from the EMPLOYEE table

SQL> set lines 10000
SQL> select ROWID, first_name||' '|| last_name employee_name
  2    from employee
  3  order by last_name, first_name;
ROWID              EMPLOYEE_NAME
—————————————————— ——————————————
AAAXFUAAJAAAEB9AAK Lori Dovichi
AAAXFUAAJAAAEB9AAA Emily Eckhardt
AAAXFUAAJAAAEB9AAE Roger Friedli
AAAXFUAAJAAAEB9AAF Betsy James
AAAXFUAAJAAAEB9AAI Thomas Jeffrey
AAAXFUAAJAAAEB8AAB Sasha Meyer
AAAXFUAAJAAAEB9AAD Matthew Michaels
AAAXFUAAJAAAEB9AAC Donald Newton
AAAXFUAAJAAAEB9AAB Frances Newton
AAAXFUAAJAAAEB9AAN Don Rose
AAAXFUAAJAAAEB8AAC Gerald Sowell
ROWID              EMPLOYEE_NAME
—————————————————— ——————————————
AAAXFUAAJAAAEB9AAL Mary Streicher
AAAXFUAAJAAAEB8AAA Marcy Tamra
AAAXFUAAJAAAEB9AAJ Theresa Wong
AAAXFUAAJAAAEB9AAH Mark Leblanc
AAAXFUAAJAAAEB9AAG Michael Peterson
16 rows selected.

o27sql201-f1

Figure 1: Structure of a B-tree index

Using Figure 1 as an example, suppose you want to query the EMPLOYEE table for those employees hired between 01-JANUARY-2010 and 15-JULY-2011 and you have created a B-tree index on the HIRE_DATE column. Your search through the B-tree index would begin with the root block. A WHERE clause similar to

WHERE HIRE_DATE >= TO_DATE('01-JAN-2010', 'DD-MON-YYYY')
  AND HIRE_DATE <  TO_DATE('16-JULY-2011', 'DD-MON-YYYY')

would yield search input values that would be compared with the rightmost root block, which has pointers to 2010–2016. The next step would be to search the leftmost branch block, which has pointers to 2010–2011. Finally, the leaf blocks would be scanned to find the exact HIRE_DATE index values—and their associated ROWIDs—that satisfy the search criteria. Leaf blocks also contain links to the next and previous leaf blocks, which enable the index to be scanned for ranges. (Note that the WHERE clause in this example searches inclusively for all dates between 12:00 a.m., January 1, 2010, and 11:59 p.m., July 15, 2011.)

Planting Trees

The following SQL

SQL> CREATE INDEX emp_hire_date_i
  2  ON employee(hire_date);
Index created.

illustrates the simplest method for creating a B-tree index on the HIRE_DATE column of the EMPLOYEE table. A query to find every employee hired on, for example, May 4, 2016, can take advantage of the EMP_HIRE_DATE_I index, because Oracle Database can look up the value in the index.

The query in Listing 2 shows how many employees were hired on May 4, 2016. Only two employees, Sasha Meyer and Marcy Tamra, are included in the returned result set. With so few rows returned, using an index to retrieve the relevant rows is faster than reading every row in the table, especially if the table has many rows.

Code Listing 2: Using a B-tree index to speed up the retrieval of a few rows

SQL> select first_name||' '||last_name employee_name
  2    from employee
  3   where hire_date >= to_date('04-MAY-2016', 'DD-MON-YYYY')
  4     and hire_date <  to_date('05-MAY-2016', 'DD-MON-YYYY')
  5  order by last_name;
EMPLOYEE
——————————————————————————————————————————————————————————————
Sasha Meyer
Marcy Tamra
2 rows selected.

It may sometimes be useful to build indexes using multiple columns, particularly on individual columns with low selectivity. An index composed of multiple columns is referred to as a composite index or a concatenated index. The combination of one or more columns may make an index more selective than it would otherwise be with single-column B-tree indexes on individual columns.

The following SQL

SQL> CREATE INDEX dep_wage_increase_i
  2  ON employee (department_id, wage_increase_worthiness);
Index created.

demonstrates a simple method for creating a composite B-tree index on the DEPARTMENT_ID and WAGE_INCREASE_WORTHINESS columns of the EMPLOYEE table. It also illustrates how virtual columns, such as WAGE_INCREASE_WORTHINESS, can be used in indexes.

And Listing 3 shows a query that selects all employees who work in the accounting department (DEPARTMENT_ID = 10) and are eligible for a cost-of-living pay raise (WAGE_INCREASE_WORTHINESS = 'Cost of Living Increase Eligible'). Columns used together frequently in a WHERE clause, combined with the AND operator, are often good candidates for use in a composite index.

Code Listing 3: Using the DEP_WAGE_INCREASE_I index to retrieve rows

SQL> select emp_full_name employee_name
  2    from employee
  3   where department_id = 10
  4     and wage_increase_worthiness = 'Cost of Living Increase Eligible'
  5  order by last_name;
EMPLOYEE
————————————————————————————————————————————————————————————————————————
Roger Friedli
Betsy James
Matthew Michaels
Frances Newton
Donald Newton
5 rows selected.

The order of the listed columns in a composite index can affect performance. When creating composite indexes, it is a good idea to list the column(s) you use with equality searches in the WHERE clause first and the column(s) you use with range searches (such as those that use the LIKE operator or > and < operators, for example) second. For instance, if you find that your application uses queries with a WHERE clause similar to the following:

WHERE salary    = 70000
  AND hire_date > TO_DATE('01-JAN-2010', 'DD-MON-YYYY')

then an index on EMPLOYEE (SALARY, HIRE_DATE) would be beneficial. If you tend to use equality searches on all columns, the order in which you list the columns in the composite index is not relevant, unless you plan to compress the index, in which case you should list the columns in the index in order of lowest to highest selectivity. The topic of index compression will be covered in subsequent articles in this series. Meanwhile, if you want to learn about index compression, refer to the index compression documentation.

A Constrained Relationship

Recall that in the Beyond SQL 101 article “Defining, Constraining, and Manipulating Your Entities,” you learned about different types of table and column constraints. Whenever you create a primary key constraint or a unique key constraint, Oracle Database creates a corresponding index automatically, unless an index for the column(s) of the constraint already exists. Listing 4 demonstrates how adding a primary key constraint, EMPLOYEE_PK, to the EMPLOYEE table results in the automatic creation of the corresponding EMPLOYEE_PK index.

Code Listing 4: Adding a primary key constraint, which automatically creates an index

SQL> select index_name
  2    from user_indexes
  3   where index_name = 'EMPLOYEE_PK';
no rows selected
SQL> alter table employee
  2  add constraint employee_pk primary key (employee_id);
Table altered.
SQL> select index_name
  2    from user_indexes
  3   where index_name = 'EMPLOYEE_PK';
INDEX_NAME
—————————————————————————————————————————————————————————
EMPLOYEE_PK
1 row selected.
SQL> select column_name
  2    from user_ind_columns
  3   where index_name = 'EMPLOYEE_PK';
COLUMN_NAME
——————————————————————————————————————————————————————————
EMPLOYEE_ID
1 row selected.

If columns upon which you have foreign key constraints defined are used frequently in WHERE clauses as access methods, they should be indexed. Access methods will be covered more fully in subsequent articles in this series. Meanwhile, to learn more about when it makes the most sense to index foreign-key-constrained columns, refer to the documentation for foreign key constraints; begin with the subsection named “Indexes and Foreign Keys.” Additionally, if your database application code allows primary keys referenced by foreign keys to be updated or rows in the parent table that includes the primary keys to be deleted, the entire child table that includes the unindexed foreign key columns will be locked while such an update or delete action takes place. This means that any subsequent insertions, updates, or deletions on the child table will be prevented until the action being performed on the parent table is completed.

For this update/delete example, note that without foreign key indexes in place, for each row in the parent table that is modified and also referenced by foreign key values in the child table, a full table scan of the child table will be performed. Therefore, the update/delete action could take a very long time, due to one or more full table scans’ being performed against the likely much larger child table. Listing 5 illustrates an alternative query for obtaining the same information as that obtained in Listing 3. The main differences are that an index is created for the foreign-key-constrained DEPARTMENT_ID column of the EMPLOYEE table and a subquery is performed against the DEPARTMENT table to obtain the DEPARTMENT_ID associated with the accounting department.

Code Listing 5: Using a foreign key index and a subquery to retrieve information from a parent table

SQL> create index emp_dept_fk
  2  on employee (department_id);
Index created.
SQL> select emp_full_name employee
  2    from employee
  3   where department_id = (select department_id from department where name = 'Accounting')
  4    and wage_increase_worthiness = 'Cost of Living Increase Eligible'
  5  order by last_name;
EMPLOYEE
————————————————————————————————————————————————————————————————————————
Roger Friedli
Betsy James
Matthew Michaels
Frances Newton
Donald Newton
5 rows selected.

When Your WHERE Won’t Work (and Other Considerations)

There are several circumstances in which Oracle Database might not use an index, even though you’ve defined an index on the column used in your WHERE clause. Entirely NULL index key values are not stored in a B-tree index, but if any of the columns in a composite B-tree index has a non-NULL value, the index key will be stored. However, if all the index key columns for a given row are NULL, such an index key will not be stored. The following SQL

SQL> select emp_full_name employee_name
  2    from employee
  3   where hire_date IS NULL;
no rows selected

outlines a query on the EMPLOYEE table that will not use an index. Although there is an index, EMP_HIRE_DATE_I, defined on the HIRE_DATE column of the EMPLOYEE table, it does not include NULL HIRE_DATE values. Therefore, the query must perform a full table scan to determine that no rows in the EMPLOYEE table have a NULL value in the HIRE_DATE column.

Additionally, be aware that if you use a function on a column of a WHERE clause, you’ll be altering it in such a way that an exact match value cannot be compared with or retrieved from the index. The query in Listing 6 illustrates the type of query that would not be able to use an index for function-changed WHERE clause columns. Although there is an index, DEP_WAGE_INCREASE_I, defined on both the DEPARTMENT_ID and WAGE_INCREASE_WORTHINESS columns of the EMPLOYEE table, only the EMP_DEP_FK index can be used, because the UPPER function is applied to the WAGE_INCREASE_WORTHINESS column in the WHERE clause.

Code Listing 6: Inability of WHERE clause columns with functions to make use of indexes

SQL> select emp_full_name employee_name
  2    from employee
  3   where department_id = (select department_id from department where name = 'Accounting')
  4     and UPPER(wage_increase_worthiness) = 'COST OF LIVING INCREASE ELIGIBLE'
  5  order by last_name;
EMPLOYEE_NAME
————————————————————————————————————————————————————————————————————————
Roger Friedli
Betsy James
Matthew Michaels
Frances Newton
Donald Newton
5 rows selected.

If you plan to apply a function to a particular column of a WHERE clause regularly, you can create a function-based index, as demonstrated in

SQL> CREATE INDEX dep_name_upper_i
  2  ON department (UPPER(name));
Index created.

However, be aware that virtual columns, which are function-derived, are not allowed in functional index expressions. However, virtual columns can be B-tree-indexed.

The following SQL

SQL> ALTER INDEX dep_name_upper_i
  2  RENAME TO dep_upper_name_i;
Index altered.

demonstrates how you can rename an existing index by using the ALTER INDEX command. You can also use the ALTER INDEX command to change the storage characteristics of an index, rebuild it (a topic to be introduced in subsequent articles in this series), or mark the index as visible or invisible. If an index is marked as invisible, the program that determines the fastest way to retrieve data (the Oracle Optimizer, to be more fully covered in subsequent articles in this series) will ignore the index unless a special session setting is enabled. This feature might come in handy, for example, when you are considering removing a possibly unused index but would like to mark it as invisible before doing so, to ensure that there is no negative impact on your application.

The following SQL

SQL> ALTER INDEX dep_upper_name_i INVISIBLE;
Index altered.

shows how you can mark an index as invisible. Once you’ve marked an index as invisible, it will not be used to access rows and you can determine whether application performance suffers or remains the same as before you marked the index as invisible.

Marking indexes as invisible before making your final decision to remove them can save you the work of having to re-create them if necessary. The following SQL

SQL> DROP INDEX dep_upper_name_i;
Index dropped.

demonstrates how to remove (drop) an index.

Should You or Shouldn’t You?

It’s important to look at the different ways Oracle Database can help you achieve good performance for both data retrieval and data changes. You should consider indexing columns that are frequently used in SQL statement WHERE clauses and in foreign key column access methods. A B-tree index may be useful if

Your column contains highly selective data, is accessed frequently, and is accessed by queries that typically return a few rows
Your columns contain many NULL values and are accessed frequently by queries that typically return a few rows, and you are searching for non-NULL values

A B-tree index will most likely not be useful if

A full table scan is determined by the Oracle Optimizer to be a faster method of access for row retrieval
Your indexed column is included in a WHERE clause that applies a function to it (In such cases, a function-based index might be more suitable for your SQL statement needs.)

Be aware that adding indexes may increase the time required for INSERT, UPDATE, and DELETE operations to complete. However, if you query the table more than you change its data, the disadvantage of the additional time required may be offset by the advantage gained in increased query performance.
Summary

This article introduced you to B-tree indexes, both single-column and composite. You learned how they can be used to increase performance when retrieving rows. You discovered how to create them and what kind of relationship they have with constraints. You also discovered how to alter them, how to drop them, and when they cannot be used. Last, you got a few guidelines for when B-tree indexes are most useful, when they are not useful, and when a function-based index can be helpful. The next article in this series will introduce you to bitmap indexes, statistics, and the cost-based Oracle Optimizer.

Next Steps

Share

Rapid Retrieval of Rows in Small Data Sets

Part 8 in a second series on the basics of the relational database and SQL