Skip to Main Content
  • Questions
  • Guide lines to write effective (great performance) Indexes and Keys

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Francisco.

Asked: January 31, 2022 - 8:30 am UTC

Last updated: February 03, 2022 - 3:13 am UTC

Version: 11

Viewed 1000+ times

You Asked

Hi
From time to time some developer come and said: we need to change the column order in the index because it will perform better.
I have searched but never able to find a good explanation regarding how to order the columns.

Our situation is something like this

COMPANY_CODE NVARCHAR(10)
DOCUMENT_TYPE NVARCHAR(10)
DOCUMENT_CODE NVARCHAR(10)
DOCUMENT_RELEASE INT
...


Some details about the column content
COMPANY_CODE is used because application is multitenant but 99% of times we have just ONE COMPANY CODE
DOCUMENT_TYPE has a domain with no more that 20 different values
DOCUMENT_CODE is a code generated by the application according to rules chosen by our customer
DOCUMENT_RELEASE normally has just one value -> 0

Time ago we have created the key with columns in this order:
COMPANY_CODE,DOCUMENT_TYPE,DOCUMENT_CODE,DOCUMENT_RELEASE

Some developer said that the column that has more different values DOCUMENT_CODE must be the first in the index, then
he proposal is
DOCUMENT_CODE,DOCUMENT_TYPE,COMPANY_CODE,DOCUMENT_RELEASE

May be I'm sounding "dumb", but I would like to have some good theoretical evidence.

IMHO performance results can be SQL Optimizer implementation dependent, and can vary according DBMS type, DBMS version.

Probably one answer can be analizing the Execution Plan.

Do you have any tip, link to documentation?


thanks a lot.









and Connor said...

Some developer said that the column that has more different values DOCUMENT_CODE must be the first in the index


This is total fiction, and has been a myth in the oracle community for decades. The number of distinct values in the column plays no role in where that column sits in the index. (I'll come back to this point shortly)

You choose the order of index columns based on the benefit they will have the application SQL code. For example, if your app does queries like:

where a = 
and b = 
and c = 

where b = 
and c = 

where c = 


Then I would add an index with columns as (c,b,a) because than all three queries can potentially get some benefit, whereas an index (a,b,c) only helps with the first, (b,c,a) only helps with 2 of them etc. Choose the column order based on the best overall benefit.

Now..what if your application only does:


where a = 
and b = 


The question then is: should it be (a,b) or (b,a).

Typically the better option will be to choose the column with the LEAST number of distinct values to go first, because then you can use the (free) basic index compression, eg, if 'a' has the most repeated values (ie, least distinct values) then

create index ... on .... (a,b) compress 1

yields a smaller, more dense index which makes it more efficient to use.




Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Columns order

Stefano, February 02, 2022 - 9:08 am UTC

Hi Connor,

Let's imagine those fields are the PK of the table instead. And let's imagine that serveral "child" tables are using those fields as part of their key.

In a head/details scenario, THEAD would have as PK:
COMPANY_CODE
DOCUMENT_TYPE
DOCUMENT_CODE
DOCUMENT_RELEASE

and TDETAILS would have as PK:
COMPANY_CODE
DOCUMENT_TYPE
DOCUMENT_CODE
DOCUMENT_RELEASE
DOCUMENT_ROW

In a query to extract head and details, thus on THEAD joining TDETAILS on the first 4 fields, wouldn't PK columns order matter for the search of TDETAILS, once THEAD rows are "identified" in the first step?

e.g.:
SELECT *
FROM THEAD, TDETAILS
WHERE THEAD.COMPANY_CODE = TDETAILS.COMPANY_CODE 
AND THEAD.DOCUMENT_TYPE =TDETAILS.DOCUMENT_TYPE
AND THEAD.DOCUMENT_CODE =TDETAILS.DOCUMENT_CODE
AND THEAD.DOCUMENT_RELEASE =TDETAILS.DOCUMENT_RELEASE
AND THEAD.DATE BETWEEN ... AND ...


Connor McDonald
February 03, 2022 - 3:04 am UTC

No difference there to my original answer. For the details table, you in-effect are running a query

WHERE TDETAILS.COMPANY_CODE = ...
AND TDETAILS.DOCUMENT_TYPE = ...
AND TDETAILS.DOCUMENT_CODE = ...
AND TDETAILS.DOCUMENT_RELEASE = ...

If that is the *only* query that is used, then as long as those 4 columns come first in the primary key index, no matter order they are in. Thus you would probably choose the order based on the best compression factor (and naturally the PK for the HEAD table would be the same)

However, if there were *other* queries, because lets face it, no matter how good a data model is, someone will always find some bizarre requirement to query based on the "3rd" logical column for some reason, even those its only a partial key :-) then you would take those other query requirements into account as well. eg Lets say there's a query that is also

WHERE TDETAILS.DOCUMENT_TYPE = ...
AND TDETAILS.DOCUMENT_CODE = ...

You *might* opt for a second index, or you *might* be able to get away with just one that is serves both the PK *and* this requirement, ie

DOCUMENT_TYPE
DOCUMENT_CODE
COMPANY_CODE
DOCUMENT_RELEASE


Columns order

Rajeshwaran Jeyabal, February 02, 2022 - 12:02 pm UTC

given this query
SELECT *
FROM THEAD, TDETAILS
WHERE THEAD.COMPANY_CODE = TDETAILS.COMPANY_CODE 
AND THEAD.DOCUMENT_TYPE =TDETAILS.DOCUMENT_TYPE
AND THEAD.DOCUMENT_CODE =TDETAILS.DOCUMENT_CODE
AND THEAD.DOCUMENT_RELEASE =TDETAILS.DOCUMENT_RELEASE
AND THEAD.DATE BETWEEN ... AND ...


this is your filter ..AND THEAD.DATE BETWEEN ... AND ..
and if this filter is selective, then we would end-up with index on THEAD.DATE,
and do a nice range scan to pick those handfull of records from THEAD table and join with TDETAILS on the keycolumns.
if FK on TDETAILS was indexed then we would end-up with nice range scan on FK index.


To know more about the order of columns in the index structure
read through them in detail @ https://www.amazon.com/Expert-One-Oracle-Thomas-2005-04-20/dp/B01MT3NDX4
Page no# 394 and read this topic "Myth: Most Discriminating Elements Should be First"
Connor McDonald
February 03, 2022 - 3:13 am UTC

Agreed.

One thing I forgot to note in my previous review, is that index columns can be a superset of constraint columns to get more "bang for your buck" out an index

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.