Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, Praveen.

Asked: January 28, 2018 - 6:50 am UTC

Last updated: January 31, 2018 - 12:03 am UTC

Version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

Viewed 1000+ times

You Asked

Hi Tom,

I have below query

select a.order_number from  xxdm.XXARX_INV_SOURCE_TBL_28_jan1 a,XXARX_INV_CNV_TBL_28th_jan1 b
where a.trx_number = b.trx_number 
and a.customer_number_source = b.customer_number_source
and a.trx_number = :trx_number
and a.creation_date > sysdate-1;


When I get the explain plan I get below information, so my question is how to decide which column should I index and how do I create the index ( index on single column or multiple columns).

Plan hash value: 1823763723

 
---------------------------------------------------------------------------------------------------
| Id  | Operation          | Name                         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                              | 41378 |  2384K|  2431   (1)| 00:00:01 |
|*  1 |  HASH JOIN         |                              | 41378 |  2384K|  2431   (1)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| XXARX_INV_CNV_TBL_28TH_JAN1  | 31764 |   527K|  1641   (1)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| XXARX_INV_SOURCE_TBL_28_JAN1 | 41670 |  1709K|   789   (2)| 00:00:01 |
---------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("A"."TRX_NUMBER"="B"."TRX_NUMBER" AND 
              "A"."CUSTOMER_NUMBER_SOURCE"="B"."CUSTOMER_NUMBER_SOURCE")
   2 - filter("B"."TRX_NUMBER" IS NOT NULL AND "B"."CUSTOMER_NUMBER_SOURCE" IS NOT NULL)
   3 - filter("A"."CREATION_DATE">SYSDATE@!-1 AND "A"."TRX_NUMBER" IS NOT NULL)
 
Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 1 Sql Plan Directive used for this statement


Regards

and Connor said...

This is how I would approach it.

When I look at:

and a.trx_number = :trx_number
and a.creation_date > sysdate-1;

things I would be considering are (and only *you* can truly answer these questions because I don't have the data):

- How many distinct values of trx_number do I have ? That helps me decide on whether there is any real benefit in having this column in an index

- From the names of the columns, it sounds like they are fixed (ie, won't be updated over time), so an index will likely only impact insert not subsequent updates.

- What is the data distribution of creation_date ? Are *all* the values more than sysdate-1 ? (because the names of the table suggest it might be just a daily extract). What percentage of rows fall into this bucket.

In the *absence* of that information, I can't really given an informed answer, but I'd be inclined to experiment with something like:

create index ix on t ( trx_number, creation_date ) compress 1;

Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

What if

Moris, January 29, 2018 - 7:49 am UTC

Hi Connor
You have bravely suggest an index even thought don't have all informations.
Have a question . What if the table was range partitioned on creation date. Do you maintain this index or modify order cols (also the type local or global suggestions are welcomed)
Connor McDonald
January 30, 2018 - 2:43 am UTC

"Maybe".

What if the partitioning scheme is weekly, and 99% of the data is more recent than sysdate-1. Then partitions give me nothiing (for *this* query).

What if only 1% of the data is in the last day. Then partitioning might be good, but only if the partitioning scheme is daily or perhaps weekly. Monthly or yearly might be a different story.

Without a full picture...you can never give a good recommendation.

Additional info

Praveen Kumar, January 29, 2018 - 2:52 pm UTC

Hi,

Many thanks for your response. The trx_number is distinct and for creation_date , yes, *all* the values more than sysdate-1.

When I created the indexes as suggested by you on both the tables the cost came down to 5 from 2427.
So when we look at the explain plan , do we just go ahead and create indexes on the *filter* columns only or is there anything else we need to take care of while creating the indexes.

I mean I want to know the pre requisites to keep in mind when I am creating the Indexes on the custom table by looking at the explain plan.

I would be very grateful to you if you could respond on this query of mine.
Regards
Connor McDonald
January 30, 2018 - 2:40 am UTC

I dont look at the explain plan (generally).

I look at the data, its distribution and ideally, the business usage and business requirement.

If TRX_NUMBER is *unique* (which is not the same as 'distinct' - I said the *number* of distinct values), then an index on just that column would suffice.

Try spend less time on stats and metrics, and more on the data, if that makes sense.

Elaborate

Moris, January 30, 2018 - 7:16 am UTC

If TRX_NUMBER is *unique* (which is not the same as 'distinct' - I said the *number* of distinct values), then an index on just that column would suffice. 

What do you mean by that? Trx_number could be actually unique but there is no such constraint on this col. So if this col contain say a sequence hence unique but there is no constraint on it -say for some reason whatever, would you still suggest a one key index on it?
Connor McDonald
January 31, 2018 - 12:03 am UTC

Lets say the table contains 1,000,000 rows, and my query predicates are:

where trx_number = :1
and created_date > :2

and we're deciding on index options of:

(trx_number,created_date) versus (created_date)

If TRX_NUMBER contains 1 distinct value, then I'd probably not use it in the index *but* if all that I am *selecting* is trx_number and created_date, then i *might* do so, not to speed up index access, but to remove the table access.

If TRX_NUMBER contains 10 distinct values, then I *probably* wont put an index on it, but I *might* depending on clustering factor and skew, and if I did, I would probably compress the leading column

If TRX_NUMBER contains 100 distinct values, then that probability of its usefulness in an index increases but *still* the caveat on clustering factor and skew is there.

If TRX_NUMBER contains 1000 distinct values, ie, it is unique, then that probability of its usefulness in an index makes it a near certainty.



More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.