Skip to Main Content
  • Questions
  • Search by timestamp(6) on big tables (several approaches)

Breadcrumb

May 4th

Question and Answer

Chris Saxon

Thanks for the question, Jens.

Asked: January 22, 2019 - 11:54 pm UTC

Last updated: January 23, 2019 - 3:15 pm UTC

Version: 12.2.0.1

Viewed 10K+ times! This question is

You Asked

Hello Tom,

it's always instructive to read your opinion to different topics around Oracle databases.

We have a big table with round about 200 million records added a month. On this table there some selects doing some calculation per day and ones a month there are running some scripts for the whole month.

Database: 12.2.0.1

After some discussions (and search in the internet) we found several solutions to get the selects fast:

Solution one: index on a timestamp(6) column

In this solution we create a default index on the timestamp(6) column. In prepared statements we have to pass always timestamps. If we pass dates Oracle will not use the index.

create table solution1 (
    ts timestamp(6) not null
);
create index idx_solution1 ON solution1 (ts);

-- example select
select * from solution1
where ts >= TIMESTAMP '2019-01-01 00:00:00' and ts < TIMESTAMP '2019-02-01 00:00:00'
order by ts;


Solution two: functional index to_number

This solution uses a function index, which converts the timestamp to an integer. If we use a prepared statement we have to convert the passed date in the show way.

create table solution2 (
    ts timestamp(6) not null
);
create index idx_solution2 ON solution2 (to_number(to_char(ts,'YYYYMMDD')));

-- example select
select * from solution2
where to_number(to_char(ts,'YYYYMMDD'))
      between to_number(to_char(DATE '2019-01-01','YYYYMMDD'))
          and to_number(to_char(DATE '2019-01-31','YYYYMMDD'));


Solution three: functional index trunc

In some examples in the net we found the function index with trunc(). If we use prepare statement we have to use trunc if we pass timestamps and if we use date, we do not need to convert it.

create table solution3 (
    ts timestamp(6) not null
);
create index idx_solution3 ON solution3 (trunc(ts));

-- example select
select * from solution3
where trunc(ts)
      between trunc(TIMESTAMP '2019-01-01 00:00:00')
          and trunc(TIMESTAMP '2019-01-31 00:00:00');


Solution four: use additional number field

In solution we add an additional column to the table which will contain the date of the ts field as integer in the format YYYYMMDD -- so we can build an index over a number(8,0)

create table solution4 (
    ts timestamp(6) not null,
    ts_yyyymmdd number(8,0) not null
);
create index idx_solution4 ON solution4 (ts_yyyymmdd);

-- example select
select * from solution4
where ts_yyyymmdd >= 20190101 and ts_yyyymmdd <= 20190131;


Conclusion

I tried some performance tests, but did not find a solutions which is always much faster than the others. In the documentation I could also not find a hint, that index on timestamp(6) is a bad idea, even for big tables, so I assume this is not a problem (some people, are still praying other facts)
I have searched the internet and found one and the other solution (in different contexts), but there was no place where someone compare the different approaches. So I end up with my question at your desk.

Questions

a) Which of these solution would you recommend and wy (or is there a solution we did not mentioned) ?
b) Some people say, that solution one will not perform well on very big tables,
because the index on the timestamp(6) will get very big and will probably not fit into the memory (if we select a month with 20.000.000 rows). They suggest to go with solution two, which will create a much smaller index and which will be faster on big tables -- what do you think about this?

Regards,

Jens Mühlenhoff


with LiveSQL Test Case:

and Chris said...

a) The first approach - a plain B-tree - is my preferred method.

The performance difference between these will be minimal. After all, they're all reading the same number of index entries and accessing the same rows. The size of the index might come into effect - but more on that below.

So why a regular index?

It's the most general. So will support more queries in the future. Say you want to search for a smaller time range. e.g. those rows within a given hour. The regular index will "just work".

The other methods may fall back to a full table scan. At the very least, they'll read all the index entries for a given day. You'll then have to go to the table to filter out those rows you don't need.

That said, I would consider partitioning the table on this timestamp. This may also help you if you want to put in place data archiving strategies.

You'll need to review what your other queries against the table are first though. And what effect partitioning will have on your other indexes.

b) Who says? What evidence do they have to support that argument?

Let's compare!

So we'll create the table, stick a row in, and compare the sizes of the columns:

create table solution4 ( 
    ts timestamp(6) not null, 
    ts_yyyymmdd number(8,0) not null 
);

insert into solution4 values ( 
  systimestamp, to_number(to_char(systimestamp,'YYYYMMDD'))
);

select vsize ( ts ), vsize ( ts_yyyymmdd ) 
from   solution4;

VSIZE(TS)   VSIZE(TS_YYYYMMDD)   
         11                    5 


So for this row, the timestamp is 11 bytes vs. 5 for the numeric representation. So ~2x bigger.

What happens when we load it up with data?

Let's stick another 1 million rows in and create the indexes:

insert into solution4
  with rws as (
    select systimestamp + numtodsinterval ( level, 'second' ) ts
    from   dual
    connect by level <= 1000
  )
    select r1.ts,  
           to_number(to_char(r1.ts,'YYYYMMDD'))
    from   rws r1
    cross  join rws;
  
commit;

create index idx_solution4 ON solution4 (ts_yyyymmdd);
create index idx_solution4ts ON solution4 (ts);

exec dbms_stats.gather_table_stats ( user, 'solution4' ) ;
  
select count(*) from solution4;

COUNT(*)   
   1000001


So how much bigger is the timestamp index?

select index_name, blevel, leaf_blocks 
from   user_indexes
where  table_name = 'SOLUTION4';

INDEX_NAME        BLEVEL   LEAF_BLOCKS   
IDX_SOLUTION4            2          2370 
IDX_SOLUTION4TS          2          3206 

select segment_name, bytes / 1024 / 1024 
from   dba_segments
where  segment_name in (
  select index_name from user_indexes
  where  table_name = 'SOLUTION4'
);

SEGMENT_NAME      BYTES/1024/1024   
IDX_SOLUTION4                    19 
IDX_SOLUTION4TS                  26


So in terms of leaf blocks and segment sizes, it's only ~1.3 times bigger.

OK, it is larger. But 7Mb larger on 1 million rows? Do you really care?

Obviously, if you really care, test and validate on your own data. But I suspect you'll get much bigger benefit from tuning other parts of your system first.

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.