Skip to Main Content
  • Questions
  • optimistic search for most recent records

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, Alexander.

Asked: November 08, 2017 - 7:38 pm UTC

Last updated: November 10, 2017 - 1:43 am UTC

Version: 12c

Viewed 1000+ times

You Asked

Hi,

I have very large table which constantly grows. The search is executed by ID column, which is part of PK.
create table TEST (
 ID         varchar2(20) primary key,
 VALUE      varchar2(20),
 CREATED_TS timestamp default := systimestamp
);

Same time most of select queries are looking for relatively recent records (i.e. few months). To avoid full index scan all the time I can try to execute the query in two steps - first, I search with additional predicate for time frame, i.e.

select * from TEST where ID='xyz' and CREATED_TS > systimestamp-100


an only if record is not found I will execute query for all dates -

select * from TEST where ID='xyz'


99% of time that will be limited to only one query. Fine, especially if we add an index on (CREATED_TS, ID). But such approach is problematic when I need to search for list of ID values, i.e. using
"ID in ('val1', 'val2', 'val3'...)"
syntax. In such case I need narrow the list for the 2nd step by values not found on 1st step. That's is very inconvenient.

Can the same logic be implemented using built-in Oracle functionality? For example I was thinking about partitioning of the table by CREATED_TS ranges. If somehow it is possible to let Oracle know that it should start search in the most recent partition then the challenge could be solved. But I don't know optimization hints for ordered partition scanning.

I don't think this is unusual situation, so expect some already existing solution from Oracle.

Thank you,
Alexander

and Connor said...

With regard to

"select * from TEST where ID='xyz' and CREATED_TS > systimestamp-100
an only if record is not found I will execute query for all dates -
select * from TEST where ID='xyz'
"


if you had an index on ID,CREATED_TS then you would not need the first query - there is either data for ID or not. The same logic could then be applied to multiple ID's. The optimizer should process them iteratively to take advantage of that index, or worst case, you could phrase them as a UNION ALL.

However, if your test case is a simplification of the more general case, ie, users give totally adhoc search criteria, and you want to search recent data *first*, then in this case, partitioning on the appropriate date/timestamp column is an effective strategy because a full scan on just the relevant partition(s) gives improved query performance. Partitioning also lends itself to helping with the requirement of: "check recent first, otherwise go back farther" because you could phrase your queries as:
query1: where [criteria] and ts >= [starting point]
query2: where [criteria] and ts < [starting point]

which avoids double-scanning the most recent partitions.

In either case, good application instrumentation can be used to identify "popular" queries over time, and specialised indexing can be used to handle the most common queries to give a good user experience.

Obviously, there are other factors that might come into play here. For example, if you had an Exadata system, then storage indexes might mitigate the need for partitioning etc.

Hope this helps.

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

on full scans

Rajeshwaran Jeyabal, November 13, 2017 - 1:54 pm UTC

Given this.

I have very large table which constantly grows. The search is executed by ID column, which is part of PK.

create table TEST (
 ID         varchar2(20) primary key,
 VALUE      varchar2(20),
 CREATED_TS timestamp default := systimestamp
);


then you say this.

Same time most of select queries are looking for relatively recent records (i.e. few months).
To avoid full index scan all the time I can try to execute the query in two steps - first,
I search with additional predicate for time frame, i.e.


select * from TEST where ID='xyz' and CREATED_TS > systimestamp-100


the ID is the primary key of this table and when we have an equi filter on this ID column - dont we endup with a Index unique scan rather than index full scan?

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.