Thanks for the question, Krzysztof.

Asked: April 15, 2009 - 3:25 am UTC

Last updated: January 23, 2023 - 6:51 am UTC

Version: 10.2.0

Viewed 10K+ times! This question is

You Asked

Hi Tom

It is my first time, hope You won't be disappointed by my question :)

Recently I tried to optimize quite complex query and the problem was witch simple between condition on varchar2 column. When variable was from the beginning of column values range the query perform really nice. When we move the value to the end number of LIO get very high and time needed to perform query becomes unacceptable.

TestCase:

create table t1(
n number,
otype varchar2(20),
nr_start varchar2(20),
nr_stop varchar2(20)
);

truncate table t1;
insert into t1 (select object_id, object_type, lpad(to_char(object_id),10,'0'), lpad(to_char(object_id+50),10,'0') from all_objects );

create index t1_typ_nr_ss on t1(otype,nr_start, nr_stop);



ANALYZE  table t1 compute statistics;
ANALYZE  index t1_typ_nr_ss  compute statistics;

TEST@kwt > select LEAF_BLOCKS,DISTINCT_KEYS,NUM_ROWS, CLUSTERING_FACTOR from user_indexes where lower(index_name)='t1_typ_nr_ss';

LEAF_BLOCKS DISTINCT_KEYS   NUM_ROWS CLUSTERING_FACTOR
----------- ------------- ---------- -----------------
        271         46515      46515               802

Asking for beginning of the range (we have ~46000 records in 10gr2):

TEST@kwt > select otype,nr_start,nr_stop from t1 where otype='SYNONYM' and '0000000099' between nr_start and  nr_stop;

no rows returned ...

----------------------------------------------------------------------
| Id  | Operation        | Name         | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------
|   0 | SELECT STATEMENT |              |     3 |    84 |     2   (0)|
|*  1 |  INDEX RANGE SCAN| T1_TYP_NR_SS |     3 |    84 |     2   (0)|
----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OTYPE"='SYNONYM' AND "NR_STOP">='0000000099' AND
              "NR_START"<='0000000099')
       filter("NR_STOP">='0000000099')


Stats
----------------------------------------------------------
          2  consistent gets

Very nice - 2 LIO.
Moving to the middle of range.

TEST@kwt > select otype,nr_start,nr_stop from t1 where otype='SYNONYM' and '0000021199' between nr_start and  nr_stop;

OTYPE                NR_START             NR_STOP
-------------------- -------------------- --------------------
SYNONYM              0000021150           0000021200
SYNONYM              0000021152           0000021202
SYNONYM              0000021154           0000021204
SYNONYM              0000021156           0000021206
SYNONYM              0000021158           0000021208
SYNONYM              0000021160           0000021210
SYNONYM              0000021162           0000021212
SYNONYM              0000021164           0000021214
SYNONYM              0000021166           0000021216
SYNONYM              0000021168           0000021218
SYNONYM              0000021170           0000021220
SYNONYM              0000021172           0000021222
SYNONYM              0000021174           0000021224
SYNONYM              0000021176           0000021226
SYNONYM              0000021178           0000021228
SYNONYM              0000021180           0000021230
SYNONYM              0000021182           0000021232
SYNONYM              0000021184           0000021234
SYNONYM              0000021186           0000021236
SYNONYM              0000021188           0000021238
SYNONYM              0000021190           0000021240
SYNONYM              0000021192           0000021242
SYNONYM              0000021194           0000021244
SYNONYM              0000021196           0000021246
SYNONYM              0000021198           0000021248

25 wierszy zosta¿o wybranych.


Plan wykonywania
----------------------------------------------------------
Plan hash value: 2565012117

----------------------------------------------------------------------
| Id  | Operation        | Name         | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------
|   0 | SELECT STATEMENT |              |   399 | 11172 |     5   (0)|
|*  1 |  INDEX RANGE SCAN| T1_TYP_NR_SS |   399 | 11172 |     5   (0)|
----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OTYPE"='SYNONYM' AND "NR_STOP">='0000021199' AND
              "NR_START"<='0000021199')
       filter("NR_STOP">='0000021199')


Stats
----------------------------------------------------------
         49  consistent gets

LIO gets up.

Moving to the end of range.

TEST@kwt > select otype,nr_start,nr_stop from t1 where otype='SYNONYM' and '0000046199' between nr_start and  nr_stop;

OTYPE                NR_START             NR_STOP
-------------------- -------------------- --------------------
SYNONYM              0000046152           0000046202
SYNONYM              0000046153           0000046203
SYNONYM              0000046154           0000046204
SYNONYM              0000046155           0000046205
SYNONYM              0000046156           0000046206
SYNONYM              0000046157           0000046207
SYNONYM              0000046158           0000046208
SYNONYM              0000046159           0000046209
SYNONYM              0000046160           0000046210
SYNONYM              0000046161           0000046211
SYNONYM              0000046162           0000046212
SYNONYM              0000046163           0000046213
SYNONYM              0000046164           0000046214
SYNONYM              0000046165           0000046215
SYNONYM              0000046166           0000046216
SYNONYM              0000046167           0000046217
SYNONYM              0000046168           0000046218
SYNONYM              0000046169           0000046219
SYNONYM              0000046170           0000046220
SYNONYM              0000046171           0000046221
SYNONYM              0000046176           0000046226

21 wierszy zosta¿o wybranych.


Plan wykonywania
----------------------------------------------------------
Plan hash value: 2565012117

----------------------------------------------------------------------
| Id  | Operation        | Name         | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------
|   0 | SELECT STATEMENT |              |   186 |  5208 |    10   (0)|
|*  1 |  INDEX RANGE SCAN| T1_TYP_NR_SS |   186 |  5208 |    10   (0)|
----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OTYPE"='SYNONYM' AND "NR_STOP">='0000046199' AND
              "NR_START"<='0000046199')
       filter("NR_STOP">='0000046199')


Stats
----------------------------------------------------------
        110  consistent gets

110 LIO for 21 record with simple condition?

After some googling found some workaround - but works only for limited number of rows:

TEST@kwt > select <b>/*+ INDEX_DESC(t1 t1_typ_nr_ss) */</b>
  2         otype,nr_start,nr_stop from t1 where otype='SYNONYM' and '0000046199' between nr_start and  nr_stop <b>and ROWNUM<10</b>
  3  ;

OTYPE                NR_START             NR_STOP
-------------------- -------------------- --------------------
SYNONYM              0000046176           0000046226
SYNONYM              0000046171           0000046221
SYNONYM              0000046170           0000046220
SYNONYM              0000046169           0000046219
SYNONYM              0000046168           0000046218
SYNONYM              0000046167           0000046217
SYNONYM              0000046166           0000046216
SYNONYM              0000046165           0000046215
SYNONYM              0000046164           0000046214

9 wierszy zosta¿o wybranych.


Plan wykonywania
----------------------------------------------------------
Plan hash value: 2702594328

--------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes |Cost 
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |     9 |   252 |   2  
|*  1 |  COUNT STOPKEY               |              |       |       |
|*  2 |   INDEX RANGE SCAN DESCENDING| T1_TYP_NR_SS |    10 |   280 |   2   
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<10)
   2 - access("OTYPE"='SYNONYM' AND "NR_STOP">='0000046199' AND
              "NR_START"<='0000046199')
       filter("NR_STOP">='0000046199')


Statystyki
----------------------------------------------------------
          <b>3</b>  consistent gets

It was not acceptable for me, because I need all rows meeting this condition.

So, when asking for end of range, Oracle read all index entries for given otype.

TEST@kwt > select count(*) from t1 where otype='SYNONYM';

  COUNT(*)
----------
     20092


Plan wykonywania
----------------------------------------------------------
Plan hash value: 1921597393

-----------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| -----------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |     1 |     8 |    11   (0)| 
|   1 |  SORT AGGREGATE   |              |     1 |     8 |            |
|*  2 |   INDEX RANGE SCAN| T1_TYP_NR_SS |  1661 | 13288 |    11   (0)|
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OTYPE"='SYNONYM')


Statystyki
----------------------------------------------------------
        <b>116</b>  consistent gets

TEST@kwt > select otype,nr_start,nr_stop from t1 where otype='SYNONYM' and '<b>0000099199</b>' between nr_start and  nr_stop;

nie wybrano ¿adnych wierszy


Plan wykonywania
----------------------------------------------------------
Plan hash value: 2565012117

----------------------------------------------------------------------
| Id  | Operation        | Name         | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------
|   0 | SELECT STATEMENT |              |     1 |    28 |    11   (0)|
|*  1 |  INDEX RANGE SCAN| T1_TYP_NR_SS |     1 |    28 |    11   (0)|
----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OTYPE"='SYNONYM' AND "NR_STOP">='0000099199' AND
              "NR_START"<='0000099199')
       filter("NR_STOP">='0000099199')


Statystyki
----------------------------------------------------------
        <b>116</b>  consistent gets

Can You please explain, why Oracle have to do this ?
Adding additional condition "and nr_stop and nr_stop>=nr_start" change only filtering part, optimizer do not use this knowledge of correlation between columns nr_stop and nr_start.
Is this possible to make optimizer to do something like this :
1) look for otype (as currently)
2) look for the first value of nr_start > :X (index desc?)
3) go down until nr_stop < :X
4) records found between 2 and 3 steps are our result

Does it make sense, or I miss something?

What are other ways to tune query like this ?

greetings from Poland
Your admirer
Krzysztof Wlasiuk

and Tom said...

Ok, you have an index on:

create index t1_typ_nr_ss on t1(otype,nr_start, nr_stop);

and you have a where clause:

"OTYPE"='SYNONYM'
AND
"NR_STOP">='0000099199'
AND
"NR_START"<='0000099199'

so, the otype = 'value' is "easy", we know we can use that. It is in the beginning of the index - we can start our keyed reads using that.

The second bit in the index is NR_START, we can do

where otype='SYNONYM' and nr_start <='0000099199' through the index range scan. The last bit - "NR_STOP">='0000099199' - can be EVALUATED using the index - but doesn't fit into the range scanning - it cannot 'stop' the range scan.

So, when you "otype='value' and nr_start <= 'something small'", the range scan stops almost immediately - we know we don't have to look any further, once the first value of nr_start that exceeds that value - we stop.

however, when nr_start <= 'something BIG', we have to look at all of the values in the index whereby nr_start <= 'something big' - and that is thousands of entries (the value of nr_stop can be retrieved from the index, but nr_stop is NOT SORTED in the index, it is sorted WITHIN the nr_start values, nr_start is nicely sorted - NR_STOP --- it is *not* sorted in the range we are looking at)

So, your four steps at the end of your question - they cannot possibly apply.

I don't understand how you think "nr_stop>=nr_start" implies a correlation?? It is a filter you added, it doesn't change the nature of the data at all - the fact is that NR_STOP is *not* sorted in the index when you consider "otype='value' and nr_start <= 'something-big'". The values of NR_STOP (in general) could be ALL OVER the place.

You cannot range on NR_STOP in this index if you are range scanning on NR_START, think about it in general....

Rating

(6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Excelent answer!

Krzysztof Wlasiuk, April 16, 2009 - 2:53 am UTC

Oh - I get it now :)

I create to simple testcase, where nr_stop is always nr_start+50. With those data I can just simplify query condition not using between and nr_stop column.

In real data there can be any value in nr_stop for given nr_start so Oracle have to look for every entry "where otype='SYNONYM' and nr_start <='0000099199'".

Thank You Tom very much!

PS. Damn- Am I the only one who doesn't get it at first time?

Between Condition

Rupali, October 21, 2009 - 3:46 pm UTC

Sir,

We have similar condition, wherein between or range query is taking lot n lot of time.

Situation : In our source system we have monster table (CR_BANK_TRANS) having 300 million records. This table holds transactions for each customer. And for each customer there are on an average 250 entries.
Only those customers are picked from this table for whom communication is done by company. We have a table LKP_CUST at target end which behaves as a lookup to pick up entries from CR_BANK_TRANS. In other words we equi join CR_BANK_TRANS and LKP_CUST to select records.
Average number of records in LKP_CUST is 100 K.

We are using Informatic for fetching the records and to perform faster operation we have implemented Parallelism feature of Informatica. We have 4 queries at session level to fetch and insert records form CR_BANK_TRANS.
Query is

SELECT *
FROM CR_BANK_TRANS cr, LKP_CUST lk
WHERE lk.cust_id = cr.cust_id
and cust_id >=
(
SELECT MIN (cust_id)
FROM (SELECT cust_id, NTILE (4) OVER (ORDER BY cust_id) bucket FROM lkp_cust) GROUP BY bucket
having bucket = 1
)
and cust_id <=
(
SELECT MAX (cust_id)
FROM (SELECT cust_id, NTILE (4) OVER (ORDER BY cust_id) bucket FROM lkp_cust) GROUP BY bucket
having bucket = 1
)

Similary there are three more queries for bucket 2, 3 and 4.
This query brings around 7500000 records.

Index Information :- Idx_CustId is independent index
Idx_Pk is primary key index with Cust_id as a leading column.

Problem :- In each bucket we have approx 25000 records. Still query is taking 4 hrs. So even after applying parallism complete task is getting over in 6 hrs.

I tried all the hints and also with index (Idx_Pk) hint, but still it is taking much much time.

We have also tried with hardcoded range values (I mean subquery for NTILE replaced by hardcoded cust id's), still it is taking lot of time. With hardcoding execution plan looks like

============================================================
SELECT STATEMENT REMOTE ALL_ROWSCost: 79,277 Bytes: 55,456,074 Cardinality: 1,087,374

4 TABLE ACCESS BY INDEX ROWID TABLE CR_BANK_TRANS Cost: 79,273 Bytes: 55,456,074 Cardinality: 1,087,374

3 INDEX RANGE SCAN INDEX IDX_PK Cost: 5,619 Cardinality: 2,003,130
============================================================

Even though it is hitting right index, still it is taking lot of time (4 hrs). We are using Oracle 10g and configuration wise server is quite rich.

Can you please help us suggest approach on how to go about it and also what should be our benchmarking time for this.

Thanks so much Sir.

October 23, 2009 - 12:58 pm UTC

... having
300 million records. ...

that is a meaningless number.

that is like saying I have 300 million things. If your things were 50 pound weights, you would have a lot of weight. If the weights were made of plutonium, you would need less room than if they were made of Styrofoam - but at least we'd know the weight (but not the room needed to hold them all)

300 million records - first, that is pretty small these days and second it tells us nothing about the volume of data.

... We are using Informatic for fetching the records and to perform faster
operation we have implemented Parallelism feature of Informatica....

if you wanted to go faster, you would use a tool that did things in set operations instead of slow by slow by slow like that one does.

It looks like you are spending a heck of a lot of time to 'partition' the data.

That probably should not be using an index - do you have a hint in there you are not telling me about? You are most likely spending a lot of time doing single block physical reads all over the place because of that index.

why the range???

why not

select JUST THE STUFF YOU NEED DO NOT USE *
  from cr
 where cust_id in (select cust_id 
                     from (select cust_id,row_number() 
                                       over (order by cust_id) rn 
                             from lk) 
                    where mod(rn,4) = 0);

(and fingers crossed for your full scan)

Interval queries

Carsten, October 23, 2009 - 3:31 pm UTC

@Krzysztof:

Actually nr_start and nr_end should not be seen as independent columns but rather part of a complex datatype interval (nr_start,nr_stop).

the true query is then
select * from t1 where interval overlaps ('0000046199','0000046199')

This is a very common problem and there are different kind of queries to support this. Oracle spatial (extra cost option for EE) includes R-Trees and Quadtrees to solve this problem in general.

If your biggest interval size is very small compared to the overall data range, you might get accepatable performance with indexing the center of your interval and do a range scan on searchvalue -maxinterval <= center <= searchvalue+maxinterval and additional specify the overlap filtering.

A Solution using MySQL and RTree

Marcus Minervino, January 19, 2023 - 9:29 pm UTC

DROP TABLE BINRANGE;

CREATE TABLE BINRANGE (ID_BINRANGE int NOT NULL PRIMARY KEY,
BIN_LOW long NOT NULL,
BIN_HIGH long NOT NULL,
G LineString SRID 0 NOT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

INSERT INTO BINRANGE (ID_BINRANGE,BIN_LOW,BIN_HIGH,G)
WITH RECURSIVE DUMMY (LEVEL,BIN_LOW,BIN_HIGH) AS (SELECT 1,100,199
UNION ALL
SELECT LEVEL+1,BIN_LOW +100,BIN_LOW+199
FROM DUMMY
WHERE LEVEL < 100)
SELECT LEVEL,BIN_LOW,BIN_HIGH,ST_SRID(LineString (Point(-1, BIN_LOW-1),Point(1, BIN_LOW+1)), 0)
FROM DUMMY;

CREATE SPATIAL INDEX IX_BINRANGE_G ON BINRANGE (G);

SELECT * FROM BINRANGE;

1 100 199 ...
2 200 299 ...
3 300 399 ...
4 400 499 ...
5 500 599 ...

EXPLAIN ANALYZE
SELECT *
FROM BINRANGE
WHERE MBRContains(G, Point(0,100));

#-> Filter: mbrcontains(BINRANGE.G,<cache>(point(0,100))) (cost=0.71 rows=1) (actual time=0.047..0.049 rows=1 loops=1)
# -> Index range scan on BINRANGE using IX_BINRANGE_G over (G unprintable_geometry_value) (cost=0.71 rows=1) (actual time=0.032.....

SELECT *
FROM BINRANGE
WHERE MBRContains(G, Point(0,100));

# ID_BINRANGE, BIN_LOW, BIN_HIGH, G
'1', '100', '199', ?

January 23, 2023 - 6:50 am UTC

thanks for all of these great examples

A Solution using MySQL and RTree

Marcus Minervino, January 19, 2023 - 9:58 pm UTC

The inspiration came first from this forum, and then from here: https://stackoverflow.com/questions/5744314/mysql-select-between-two-columns-works-too-slowly/75178343#75178343 :-) ... I didn't have I still have the opportunity to test the solution with Oracle Spatial, but I'll try as soon as possible!

January 23, 2023 - 6:50 am UTC

thanks for all of these great examples

The same solution, now in Oracle Spatial

Marcus Minervino, January 20, 2023 - 7:26 pm UTC

CREATE TABLE BIN_RANGE_SPATIAL (
ID_BINRANGE NUMBER NOT NULL,
BIN_INICIAL NUMBER NOT NULL,
BIN_FINAL NUMBER NOT NULL,
SHAPE SDO_GEOMETRY) TABLESPACE USERS;

INSERT INTO BIN_RANGE_SPATIAL VALUES(1,100,199,SDO_GEOMETRY (2002, NULL, NULL, SDO_ELEM_INFO_ARRAY (1,2,1), SDO_ORDINATE_ARRAY (99,0,200,0)));
INSERT INTO BIN_RANGE_SPATIAL VALUES(1,200,299,SDO_GEOMETRY (2002, NULL, NULL, SDO_ELEM_INFO_ARRAY (1,2,1), SDO_ORDINATE_ARRAY (199,0,300,0)));

INSERT INTO user_sdo_geom_metadata (TABLE_NAME,COLUMN_NAME,DIMINFO,SRID) VALUES ('BIN_RANGE_SPATIAL', 'SHAPE',
SDO_DIM_ARRAY(
SDO_DIM_ELEMENT('X', 0, 500, 0.005),
SDO_DIM_ELEMENT('Y', 0, 500, 0.005)),
NULL -- SRID
);

CREATE INDEX IDX_BIN_RANGE_SPATIAL ON BIN_RANGE_SPATIAL(SHAPE) INDEXTYPE IS MDSYS.SPATIAL_INDEX;

BEGIN
DBMS_STATS.gather_table_stats('XYZ','BIN_RANGE_SPATIAL');
END;

EXPLAIN PLAN FOR
SELECT *
FROM BIN_RANGE_SPATIAL lines
WHERE SDO_CONTAINS(shape,SDO_GEOMETRY (2001, NULL, NULL, SDO_ELEM_INFO_ARRAY (1,1,1), SDO_ORDINATE_ARRAY (100,0))) = 'TRUE';

Operation Node Cost Object
------------------------------------ --------- ---------------------
SELECT STATEMENT 0,0 %
TABLE ACCESS (BY INDEX ROWID) 0,0 % BIN_RANGE_SPATIAL
DOMAIN INDEX ((SEL: 0.000139 %)) 100,0 % IDX_BIN_RANGE_SPATIAL

SELECT ID_RANGE,BIN_INICIAL,BIN_FINAL
FROM BIN_RANGE_SPATIAL lines
WHERE SDO_CONTAINS(shape,SDO_GEOMETRY (2001, NULL, NULL, SDO_ELEM_INFO_ARRAY (1,1,1), SDO_ORDINATE_ARRAY (100,0))) = 'TRUE';

ID_BINRANGE BIN_INICIAL BIN_FINAL
----------- ----------- ---------
1 100 199

January 23, 2023 - 6:51 am UTC

awesome stuff

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Excelent answer!

Between Condition

Interval queries

A Solution using MySQL and RTree

A Solution using MySQL and RTree

The same solution, now in Oracle Spatial

More to Explore

Performance

query performance with between

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Excelent answer!

Between Condition

Interval queries

A Solution using MySQL and RTree

A Solution using MySQL and RTree

The same solution, now in Oracle Spatial

More to Explore

Performance