Skip to Main Content
  • Questions
  • How to Maximize Performance for Date Logic Queries when Base Tables Only Contain Year and Month Columns

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, William.

Asked: October 22, 2016 - 3:02 am UTC

Last updated: October 24, 2016 - 10:23 am UTC

Version: 12g

Viewed 1000+ times

You Asked

Hi Tom,

It's my first question! Also, I love this forum--as a newbie to Oracle, Ask Tom is invaluable :)

In our organization, we have many base tables that are built with YEAR and MONTH number type columns that serve as part of a composite PK. For our needs, many of our quality control reports are based off month-on-month comparisons between records.

I am currently handling this type of logic by creating views of our tables that build a DATE column (alias PERIOD) using the YEAR and MONTH columns of the base tables. If I then need to compare month-on-month differences, I will join a view back to the original table by using the PERIOD in the join clause:

<SQL>

/* Example 1: if i am comparing a view with a base table */
select
A.period,
A.year,
A.month,
A.ID,
A.state,
A.item,
A.val,
B.val as val_prior
from A
inner join B
on A.period = add_months(to_date(B.month || '/01/' || B.Year, 'MM/DD/YYYY'), 1)
and A.ID = B.ID
and A.state = B.state
and A.item = B.item
;

/* Example 2: if i am comparing two views where I've created a date field in both */
select
A.period,
A.year,
A.month,
A.ID,
A.state,
A.item,
A.val,
B.val as val_prior
from A
inner join B
on A.period = add_months(B.period, 1)
and A.ID = B.ID
and A.state = B.state
and A.item = B.item
;


</SQL>

Those SQL statements above usually are part of WITH blocks that eventually get joined to several lookup tables to pull in descriptions and also tables with additional data for further calculations. This is where the performance really starts to suffer. None of our tables use date fields.

I will later filter out the couple of years of data that I need to run analysis on. I've started to notice some performance degradation and I'm guessing it's because I'm using a function to operate my joins, which means I lose the index functionality from the PK. Our tables are small by today's standards. The largest table is not over 1,000,000 rows (yet).

With this in mind, what is the best solution for query performance? Do I:

A) build a physical column onto our base tables with a date field? I feel like I still may get performance degradation with my queries since I am using the add_months() function in my SQL join logic.

B) build one or more function-based indices on the physical tables? I'm guessing one would be:
to_date(B.month || '/01/' || B.Year, 'MM/DD/YYYY')
while the other would be:
add_months(to_date(B.month || '/01/' || B.Year, 'MM/DD/YYYY'), 1)

C) incorporate some other solution that I'm unaware of because I'm new? I almost feel like I would might need a combination of A and B, where I create a physical date column and then create the add_months() function-based index...

As a little background, a lot of the quality control reports that I am building, I prefer to build as a complex view and then make the view available to other analysts that will then filter the view using MS Access and ODBC. This way, the analysts can either leverage on-the-fly Access queries, or they can build interesting visualizations while connecting to the views via MSExcel and ODBC. As much as I love building visualizations, that's something many people can do, whereas I'm really the only one around diving deeper into learning SQL and PL/SQL.

Also--do the order of the ON/AND clauses matter for performance when issuing a JOIN?

Many thanks, in advance!!!

and Chris said...

So what you want to do is generate columns so you can compare values from one row with another?

If so, sounds like what you really want to do is pivot!

This means you only access your table once and avoid the join. For example:

create table t (
  month_num int,
  year_num  int,
  val       int,
  primary key ( year_num, month_num )
);

insert into t values (1, 2016, 1);
insert into t values (2, 2016, 2);

select * from t
pivot (
  min(val) 
  for (month_num, year_num) 
  in (
    (1, 2016) prior_val, (2, 2016) val
  )
);

PRIOR_VAL  VAL  
1          2


This has a lot of powerful options. If you want to know more about pivoting, read:

https://blogs.oracle.com/sql/entry/how_to_convert_rows_to

In any case, I'd look at creating virtual columns to "store" the period and perhaps the previous period.

Oracle will then be able to calculate stats on the period. It doesn't do this for columns in views. These extra stats may be enough to nudge the optimizer in the right direction.

If a combination of pivoting and virtual columns isn't helping, further indexes *might* help. It's hard to say without seeing your query and schema though.

If you need further help, please post an execution (not explain!) plan for one of your queries. For instructions on how to do this, see:

https://blogs.oracle.com/sql/entry/how_to_create_an_execution

do the order of the ON/AND clauses matter for performance when issuing a JOIN?

It can. But the optimizer can rewrite your query, changing the order!

https://jonathanlewis.wordpress.com/2016/10/20/conjuctive-normal-form/


Is this answer out of date? If it is, please let us know via a Comment

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.