Skip to Main Content
  • Questions
  • Help needed with more specific prefix matching

Breadcrumb

Easter

Question and Answer

Tom Kyte

Thanks for the question, Ben.

Asked: December 25, 2011 - 10:34 am UTC

Answered by: Tom Kyte - Last updated: July 19, 2012 - 9:58 am UTC

Category: Database - Version: 11gR2

Viewed 1000+ times

You Asked

Summary :

I have a large table (15k+ rows) of prefixes showing country level dialling prefixes (i.e. E164 country code + local prefix... eg. "1403" for Alberta/Canada).

The contents of the table range from the more specific (e.g. "180979555" for Santo Domingo in the Dominican Republic, to simply "1" for USA).

I'm racking my brains to find an efficient way to write an oracle function that could quickly identify the most specific match when given a full length number in E164 format (e.g. taking the alberta number "1403" plus the rest of the person's phone number).

The only way I can think of to do it is with a loop of some sort, but that could turn nasty very quickly and not scale well at all (not that I'm expecting much scale…. but I you know the 5 P's Tom … Prior Planning Prevents P**** Poor Performance !).

Database Tables

create table destinations(prefix number(9,0),desc varchar2(200));
insert into destinations (prefix,desc) values ('1','USA');
insert into destinations (prefix,desc) values ('1403','Alberta/Canada');
insert into destinations (prefix,desc) values ('1809','Dominican Republic');
insert into destinations (prefix,desc) values ('180979555','Dominican Republic - Santa Dom.');

So, in the above, given a number 180979555123456 I would expect my function to match the more specific prefix of Santa Domingo, and not any of the less specific options ("Dominican Republic" or "USA").


Thank you as always for your wise words Tom and your continued Oracle evangelism !

and we said...

you might consider this, use an index organized table for destinations (i filled it up with some test data in addition to your stuff)

ops$tkyte%ORA11GR2> create table destinations(prefix number(9,0),descrip varchar2(200), primary key(prefix) ) organization index;

Table created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> insert into destinations (prefix,descrip) values ('1','USA');

1 row created.

ops$tkyte%ORA11GR2> insert into destinations (prefix,descrip) values ('1403','Alberta/Canada');

1 row created.

ops$tkyte%ORA11GR2> insert into destinations (prefix,descrip) values ('1809','Dominican Republic');

1 row created.

ops$tkyte%ORA11GR2> insert into destinations (prefix,descrip) values ('180979555','Dominican Republic - Santa Dom.');

1 row created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> exec dbms_errlog.create_error_log('DESTINATIONS');

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> insert into destinations
  2  select dbms_random.value( 10, 999999999 ), dbms_random.string( 'a', 25 )
  3    from dual
  4   connect by level <= 100000
  5   log errors reject limit unlimited;

99998 rows created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'DESTINATIONS', cascade=>true );

PL/SQL procedure successfully completed.




and then we'll take your phone number input, parse it up into shorter and shorter prefixes - starting with the entire length of the string down to just the first character - and for each of those - lookup the prefix in the destinations table and stop after we find the first one:

ops$tkyte%ORA11GR2> variable x varchar2(30)
ops$tkyte%ORA11GR2> exec :x := '180979555123456'

PL/SQL procedure successfully completed.


ops$tkyte%ORA11GR2> select phone, prefix, len, (select descrip from destinations where prefix = x.prefix) descript
  2    from (
  3  select :x phone, to_number(substr( :x, 1, length(:x)-level+1 )) prefix, length(:x)-level+1 len
  4    from dual
  5  connect by level <= length(:x)
  6   order by level
  7         ) x
  8   where rownum = 1
  9     and (select descrip from destinations where prefix = x.prefix) is not null
 10  /

PHONE                                                   PREFIX
-------------------------------- -----------------------------
                          LEN
-----------------------------
DESCRIPT
-------------------------------------------------------------------------------
180979555123456                                      180979555
                            9
Dominican Republic - Santa Dom.




the tkprof for something like that would look pretty good:


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0         19          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.00          0         19          0           1

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 623
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  INDEX UNIQUE SCAN SYS_IOT_TOP_104844 (cr=3 pr=0 pw=0 time=9 us cost=2 size=33 card=1)(object id 104845)
         1          1          1  COUNT STOPKEY (cr=16 pr=0 pw=0 time=269 us)
         1          1          1   FILTER  (cr=16 pr=0 pw=0 time=261 us)
         7          7          7    VIEW  (cr=0 pr=0 pw=0 time=145 us cost=3 size=44 card=1)
         7          7          7     SORT ORDER BY (cr=0 pr=0 pw=0 time=128 us cost=3 size=0 card=1)
        15         15         15      CONNECT BY WITHOUT FILTERING (cr=0 pr=0 pw=0 time=101 us)
         1          1          1       FAST DUAL  (cr=0 pr=0 pw=0 time=2 us cost=2 size=0 card=1)
         1          1          1    INDEX UNIQUE SCAN SYS_IOT_TOP_104844 (cr=16 pr=0 pw=0 time=90 us cost=2 size=33 card=1)(object id 104845)




worst case, using an input like:


ops$tkyte%ORA11GR2> exec :x := '120979555123456'

PL/SQL procedure successfully completed.



which resulted in:

ops$tkyte%ORA11GR2> select phone, prefix, len, (select descrip from destinations where prefix = x.prefix) descript
  2    from (
  3  select :x phone, to_number(substr( :x, 1, length(:x)-level+1 )) prefix, length(:x)-level+1 len
  4    from dual
  5  connect by level <= length(:x)
  6   order by level
  7         ) x
  8   where rownum = 1
  9     and (select descrip from destinations where prefix = x.prefix) is not null
 10  /

PHONE                                                   PREFIX
-------------------------------- -----------------------------
                          LEN
-----------------------------
DESCRIPT
-------------------------------------------------------------------------------
120979555123456                                              1
                            1
USA




meaning it had to do the most number of probes, it is still pretty good:


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0         35          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.00          0         35          0           1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 623  
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  INDEX UNIQUE SCAN SYS_IOT_TOP_104847 (cr=3 pr=0 pw=0 time=7 us cost=2 size=33 card=1)(object id 104848)
         1          1          1  COUNT STOPKEY (cr=32 pr=0 pw=0 time=519 us)
         1          1          1   FILTER  (cr=32 pr=0 pw=0 time=505 us)
        15         15         15    VIEW  (cr=0 pr=0 pw=0 time=199 us cost=3 size=44 card=1) 
        15         15         15     SORT ORDER BY (cr=0 pr=0 pw=0 time=166 us cost=3 size=0 card=1)
        15         15         15      CONNECT BY WITHOUT FILTERING (cr=0 pr=0 pw=0 time=114 us)
         1          1          1       FAST DUAL  (cr=0 pr=0 pw=0 time=4 us cost=2 size=0 card=1)
         1          1          1    INDEX UNIQUE SCAN SYS_IOT_TOP_104847 (cr=32 pr=0 pw=0 time=165 us cost=2 size=33 card=1)(object id 104848)

and you rated our response

  (27 ratings)

We're not taking reviews currently, so please try again later if you want to add a review.

Reviews

Wow, wow and wow....

December 25, 2011 - 5:03 pm UTC

Reviewer: Ben

Tom,

I just don't know what to say other than if anyone dares ask me why I prefer to use Oracle over MySQL or Postgres, I'll just point them to your fine answer to my question as to why Oracle is such an awesome and powerful pice of software !

Thank you again, and seasons greetings and a happy new year to you and your family Sir.

Ben

How about this query

December 26, 2011 - 4:51 am UTC

Reviewer: Venkat from India

Hi Tom,

How about this query -

--********************************************
select descp from
(
select descp,length(prefix) len_prefix, max(length(prefix)) over() max_len_prefix
from destinations
where instr('180979555123456',prefix)>0
) where len_prefix = max_len_prefix
--********************************************

I am not Oracle Expert(yet), So not sure, how much this query is worse than the solution you gave. But just curious to know about your comments on this.

Thanks
Tom Kyte

Followup  

December 26, 2011 - 10:43 am UTC

That is the great thing about tracing, you can see and evaluate for yourself. For example, I just ran your query using sql_trace and tkprof and you can see for yourself:


select descrip from
(
select descrip,length(prefix) len_prefix, max(length(prefix)) over() max_len_prefix
from destinations
where instr('180979555123456',prefix)>0
) where len_prefix = max_len_prefix

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.04       0.04          0       1082          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.04       0.04          0       1082          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 623  
Number of plan statistics captured: 1

Row Source Operation
---------------------------------------------------
VIEW  (cr=1082 pr=0 pw=0 time=44857 us cost=276 size=640000 
 WINDOW BUFFER (cr=1082 pr=0 pw=0 time=44846 us cost=276 
  INDEX FAST FULL SCAN SYS_IOT_TOP_104847 (cr=1082 pr=0 pw=0 time=296 



by using "instr( :x, prefix )" you preclude the use of any index to find the first row as fast as possible.

Read my explanation above and you can see how the query sort of matches the approach I wanted to take..

build a list of shorter and shorter prefixes - sorted by length from biggest to smallest.

then row by row - find if a description exists, STOP when you find the first one.


if you need to join to a list of numbers

December 27, 2011 - 8:55 am UTC

Reviewer: Matteo from Italy

Hi Tom,
I've been struggling about this problem for a long, long while.
I found a similar solution, but the very next question was:
I have a table with all the phone numbers and for each I want the country name. Of course I wanted to avoid incapsulating the logic into a pl/sql function to be called from sql; the best approach I found was using cast (multiset (query splitting my phone number into substrings)).

This is my example:
create table list_of_numbers (phone varchar2(40) primary key) organization index;

insert into list_of_numbers
select rpad(prefix,20,'0')
from destinations;

insert into list_of_numbers
select rpad(prefix,20,'1')
from destinations;

begin dbms_stats.gather_table_stats( user, 'LIST_OF_NUMBERS', cascade=>true ); end;

create type mat_vc as table of varchar2(20);

select (select (select descrip from destinations where prefix = column_value)
from list_of_numbers lon2,
table(cast(multiset(select substr(lon2.phone, 1, length(lon2.phone) - level + 1) prefix
from dual
connect by level <= length(lon2.phone)
order by level
)
as mat_vc
)
)
where rownum = 1
and (select dd.descrip from destinations dd where dd.prefix = column_value) is not null
and lon2.phone = lon1.phone
) as dest_descr,
lon1.phone
from list_of_numbers lon1;


the reason I'm using list_of_numbers table twice is that the inner query is 2 levels of nesting deep and cannot see the phone of the outermost table.

The part
where rownum = 1
and (select dd.descrip from destinations dd where dd.prefix = column_value) is not null

is yours, I've just copied it!

Can you share your comments on this approach?
Thanks
Regards
Matteo

Tom Kyte

Followup  

December 27, 2011 - 9:42 am UTC

from list_of_numbers lon2,
table(cast(multiset(select substr(lon2.phone, 1,
length(lon2.phone) - level + 1) prefix
from dual
connect by level <= length(lon2.phone)
order by level
)
as mat_vc
)
)
where rownum = 1,


that bit worries me a little. After a join, the order by doesn't have to be there - it isn't a 'top n' query anymore.


Since we need that inline view, and we cannot push down the correlation variable two levels, this might call for a plsql wrapper function for this one.




I've thought of another optimization. If the prefix has a known maximum length, we could start our substringing at that length instead of the length of the phone number. For example - if the maximum prefix length is '9' we could change:

5 connect by level <= length(:x)

into

5 connect by level <= 9

my original solution was

December 28, 2011 - 4:53 am UTC

Reviewer: A reader

If I remove the order by I'm not able to get the most specific prefix (best matching). Are you saying that since there is a (collateral?) join the order is no more guaranteed?
My original solution was to use row_number to identify the best matching prefix:

create or replace type mat_vc as table of varchar2(200);

select *
from (
select phone, column_value as dest_descr
from list_of_numbers m,
table(cast(multiset (select row_number() over (order by length(to_char(prefix)) desc)||'-'||descrip
from destinations
,(select level le from dual connect by level <= 20 ) c
where prefix = substr(phone, 1, least(c.le, length(phone)))
order by length(to_char(prefix)) desc
)
as mat_vc
)
)
)
where
dest_descr like '1-%';

Then it is easy to strip the row number out.
I changed my solution because yours is more elegant and much easier to read.

Do you see any issues on this version of the query?
With the subquery that decides how many iterations to try it is possible to apply the optimisation you suggested.

thanks
Matteo


Tom Kyte

Followup  

December 29, 2011 - 10:53 am UTC

Are you saying that since there is a (collateral?) join the order is
no more guaranteed?


correct. Absolutely. I cannot say "the defined behavior assures the ordering will be preserved". In fact I can say "we do not assure the ordering will be preserved in general"

You are relying on a side effect.

As does:

                              from destinations
                              ,(select level le from dual connect by level <= 
20 ) c
                             where prefix = substr(phone, 1, least(c.le, 
length(phone)))
                             order by length(to_char(prefix))



that order by happens BEFORE THE JOIN, it is not relevant, it does not have to be preserved after joining.

A simple change in plans will not return what you want.

I suggest the plsql function in this case for multiple rows.

Re: Max Prefix Length Optimizations

December 28, 2011 - 6:18 am UTC

Reviewer: Ben

Re:
For example - if the maximum prefix length is '9' we could change:
5 connect by level <= length(:x)
into
5 connect by level <= 9


Unfortunatley E164 (a key ITU international standard for telephone numbering wizardry) only defines the maximum length of a country code (1-3 digits). It does not, unfortunately, define the maximum length of a "National Destination Code" (and infects designates that element as optional, only the country code and subscriber number are mandatory).

I like your thinking though !

reasonable limits

December 28, 2011 - 10:07 am UTC

Reviewer: Matteo from Italy

My solution starts the match from the left up to the full length of the number or the maximum you specify on the connect by.
I just tried with a max of 50 and it took 144 seconds to scan the 200k rows of this test case. It is 1400 rec/s in my test machine, not bad.
Also, the number of matchabe prefixes will be up to the number of digits of the phone, nothing to be worried about in the sorting phase.

I've never seen an E164 number longer than 20 digits after left trimming the "E164:0000" part.

Matteo

Ben

December 28, 2011 - 1:45 pm UTC

Reviewer: Ben

I've never seen an E164 number longer than 20 digits after left trimming the "E164:0000" part.

That's because E164 defines the maximum length of a number as 15 (country code + national destination code (optional) + subscriber number). ;-)

Another way that uses one index lookup

January 02, 2012 - 11:29 pm UTC

Reviewer: David from Melbourne, Australia

I've had to solve a very similar telephone prefix problem about 18 years ago. My solution treats these as strings, not numbers.

Index the prefix padded to the maximum length with a character that collates after '9', then select the row with the smallest padded value >= the target number.

This does an index min/max lookup then gets the target row via that index. It can easily be used inside a more complex query.

Assuming ASCII collation, which makes '~' greater than '9':

create table destinations(prefix number(15,0),descrip varchar2(200), primary key(prefix) ) ;

create unique index destinations_I1 on destinations( rpad(to_char(prefix), 15, '~') );

create view destinations_padded_v as
select prefix,
descrip,
rpad(to_char(prefix), 15, '~') as prefix_padded
from destinations;


select prefix, descrip
from destinations_padded_v
where prefix_padded= (select min(prefix_padded) from destinations_padded_v where prefix_padded > :x)
and :x like prefix || '%';

The last line checks the prefix really does match, in case there is a number that does not match any prefix.

For the data loaded by Tom above, this gives a SQLtrace result of just 5 consistent gets:


PREFIX DESCRIP
---------- ----------------------------------------
180979555 Dominican Republic - Santa Dom.

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2366095060

--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 33 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID | DESTINATIONS | 1 | 33 | 2 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | DESTINATIONS_I1 | 1 | | 1 (0)| 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 7 | | |
| 4 | FIRST ROW | | 1 | 7 | 2 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN (MIN/MAX)| DESTINATIONS_I1 | 1 | 7 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(:X LIKE TO_CHAR("PREFIX")||'%')
2 - access(RPAD(TO_CHAR("PREFIX"),15,'~')= (SELECT MIN(RPAD(TO_CHAR("PREFIX"),15,'~'))
FROM TAT_PROD."DESTINATIONS" "DESTINATIONS" WHERE RPAD(TO_CHAR("PREFIX"),15,'~')>:X))
5 - access(RPAD(TO_CHAR("PREFIX"),15,'~')>:X)


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size


Tom Kyte

Followup  

January 03, 2012 - 11:30 am UTC

very nice, I like that. I will have to remember this one

oops

January 03, 2012 - 5:15 pm UTC

Reviewer: David

Oops. My solution is nice, but it doesn't work with the overlapping prefixes of the problem. It does work with non-overlapping, variable length prefixes.

With the prefixes given, 1, 1403, 1809 my query works on 1810 but fails to look up 1808. There are ways to work around this, but Tom's PL/SQL ends up more consistently efficient.
Tom Kyte

Followup  

January 04, 2012 - 9:11 am UTC

ok, i'll remove my book mark then :)

I didn't fully review it, took your word for it...

thanks so much for following up.

order by only on analytics

January 04, 2012 - 7:00 am UTC

Reviewer: Matteo from Italy

Tom,
the use of row_number guarantees that its order by is executed after the join. I can remove the second order by, it is not useful in my query:

select *
from (
select phone, column_value as dest_descr
from list_of_numbers m,
table(cast(multiset (select row_number() over (order by length(to_char(prefix)) desc)||'-'||descrip
from destinations
,(select level le from dual connect by level <= 20 ) c
where prefix = substr(phone, 1, least(c.le, length(phone)))
)
as mat_vc
)
)
)
where
dest_descr like '1-%';


Correct?

Thanks
Matteo

Tom Kyte

Followup  

January 04, 2012 - 9:33 am UTC

incorrect.

The only way to have data ordered is to use an order by!!!!! by itself. at the end of the query.

at the end of the query.

Please do not be empirical here - nowhere in the definition of analytics does it ever say "the entire result set will therefore be sorted by these attributes".


This is not a safe construct.

only subquery sorted

January 05, 2012 - 2:20 am UTC

Reviewer: Matteo from Italy

My goal is to have the most specific matching prefix, not the entire resultset ordered. Given the list of matching prefixes, row_number() assigns value = 1 to the most specific because it sorts by length of prefix *after* the subquery join.
This piece of the query

select row_number() over (order by length(to_char(prefix)) desc)||'-'||descrip
from destinations
,(select level le from dual connect by level <= 20 ) c
where prefix = substr(phone, 1, least(c.le, length(phone)))

returns all the possible matching prefixes for phone fields (that belongs to the list_of_numbers outer table) and row_number() assigns value 1 to the longest matching prefix.
This resultset is joined to the outer table.
Then I keep only the matching pairs (phone/prefix) that have an assigned row_number() == 1.

I will get an unordered result set with the best matching prefixes only.

I still believe this is deterministic.
Tom Kyte

Followup  

January 05, 2012 - 9:43 am UTC

if you want something sorted, you need an order by on it, period.

otherwise, it might not be sorted someday.


I still believe this is deterministic.


this is not a faith based initiative here - where in the documentation does it say anywhere that "if you have an order by in an analytic, the resulting result set will be similarly ordered"?


In fact, I can point you to documentation that states the OPPOSITE is true:

http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions004.htm#CIHCEIGA

Analytic functions always operate on rows in the order specified in the order_by_clause of the function. However, the order_by_clause of the function does not guarantee the order of the result. Use the order_by_clause of the query to guarantee the final result ordering.


I have nothing more to say on this topic.

@David -- I think I still like your approach...

January 05, 2012 - 2:52 pm UTC

Reviewer: Matthew McPeak from Secaucus, NJ

Tom,

I think David's self-discredited approach still has value.  After all, doesn't the following:

<code>

WITH n AS (--SELECT '1808123456' n FROM DUAL
           --SELECT '1810123456' n from dual
           --SELECT '1809123456'  n from dual
--           SELECT '180979555123456' n FROM DUAL
--           SELECT '120979555123456' n FROM DUAL
           SELECT '140379555123456' n FROM DUAL
           )
SELECT * from  (SELECT n, p.*
FROM   destinations_padded_v p, n
WHERE  prefix_padded >= RPAD (n, 15, '~')
ORDER BY prefix_padded)
WHERE 1=1
AND    n LIKE prefix || '%'
AND    ROWNUM <= 1;


...essentially do what the initial answer does, with shorter and simpler code? Namely: probe an index for the largest prefix that could be the answer and contine probing smaller and smaller until it gets the answer?

The order by may be necessary... but isn't Oracle smart enough to skip that step when it sees that the rows will already be ordered thanks to the index range scan?

Actually, it's even better than the initial approach, since the initial approach was a separate index probe for each try. David's rpad view & index mean subsequent tries will be just the next entry in the index (not traversing the whole B*tree each time).

Thoughts? Can I bookmark David's approach again for future reference?




</code>
Tom Kyte

Followup  

January 05, 2012 - 3:01 pm UTC

it does it better than the initial answer. I do "n index range scans" , this does one.

Bookmark away....

David.....

January 05, 2012 - 4:08 pm UTC

Reviewer: Ben

But David only self-discredited his response because he spotted a bug in his implementation, and not because it was a poorer quality thought process than Tom's ?

Or are we now saying the bug David thought was there is non-existent ? Seems Tom can't see much wrong with it judging by his last follow-up ?
Tom Kyte

Followup  

January 05, 2012 - 4:38 pm UTC

Davids bug existed - if you run the query with

ops$tkyte%ORA11GR2> variable x varchar2(30)
ops$tkyte%ORA11GR2> exec :x := '180979555123456'

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select prefix, descript
  2  from destinations_padded_v
  3  where prefix_padded= (select min(prefix_padded) from destinations_padded_v where prefix_padded >
  4  :x)
  5  and :x like prefix || '%'
  6  /

    PREFIX DESCRIPT
---------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 180979555 Dominican Republic - Santa Dom.



it gets the right answer, but if you change 1809 to 1808 in the :x bind - you get this:

ops$tkyte%ORA11GR2> exec :x := '180879555123456'

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> /

no rows selected

ops$tkyte%ORA11GR2> 


You should be getting USA.


The modified query:

ps$tkyte%ORA11GR2> SELECT *
  2    from  (SELECT p.*
  3             FROM   destinations_padded_v p
  4            WHERE  prefix_padded >= RPAD (:x, 15, '~')
  5            ORDER BY prefix_padded)
  6   WHERE 1=1
  7   AND    :x LIKE prefix || '%'
  8   AND    ROWNUM <= 1;

    PREFIX DESCRIPT                                                                                                                                                                                                 PREFIX_PADDED
---------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------
         1 USA                                                                                                                                                                                                      1~~~~~~~~~~~~~~

ops$tkyte%ORA11GR2> 


finds it.



it works by taking this sorted data:

ops$tkyte%ORA11GR2> select prefix_padded from destinations_padded_v order by 1;

PREFIX_PADDED
---------------
1403~~~~~~~~~~~
180979555~~~~~~
1809~~~~~~~~~~~
1~~~~~~~~~~~~~~


it works by starting in the index (the prefix padded index) at the point where the index key is >= the string we are looking for.

With 1809 - that is at 180979555 and above.
With 1808 - that is also 180979555 and above. The ABOVE being *key*

ops$tkyte%ORA11GR2> l
  1* select * from destinations_padded_v where prefix_padded >= rpad(:x,15,'~')
ops$tkyte%ORA11GR2> /

    PREFIX DESCRIPT                                           PREFIX_PADDED
---------- -------------------------------------------------- ---------------
 180979555 Dominican Republic - Santa Dom.                    180979555~~~~~~
      1809 Dominican Republic                                 1809~~~~~~~~~~~
         1 USA                                                1~~~~~~~~~~~~~~




it returns the 1 USA row - which is what we ultimately need. This new query is allowed to go on from the MINIMUM to find something that matches.


What that means is....

it works great if you get it on the first hit.

it'll take longer and longer as you have to search through more to find the "first row that is like it"

sort of like mine - it works best when it finds a very specific match, it works "less well" when it has to find a less specific match.

With the test data (small set, just the original four rows)


ops$tkyte%ORA11GR2> set autotrace on
ops$tkyte%ORA11GR2> SELECT *
  2    from  (SELECT p.*
  3             FROM   destinations_padded_v p
  4            WHERE  prefix_padded >= RPAD (:x, 15, '~')
  5            ORDER BY prefix_padded)
  6   WHERE 1=1
  7   AND    :x LIKE prefix || '%'
  8   AND    ROWNUM <= 1;

    PREFIX DESCRIPT                                           PREFIX_PADDED
---------- -------------------------------------------------- ---------------
         1 USA                                                1~~~~~~~~~~~~~~


Execution Plan
----------------------------------------------------------
Plan hash value: 1568579591

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |     1 |   124 |     2   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |                 |       |       |            |          |
|   2 |   VIEW                        |                 |     1 |   124 |     2   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| DESTINATIONS    |     1 |   124 |     2   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | DESTINATIONS_I1 |     4 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1)
   3 - filter(:X LIKE TO_CHAR("PREFIX")||'%')
   4 - access(RPAD(TO_CHAR("PREFIX"),15,'~')>=RPAD(:X,15,'~'))

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        569  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

ops$tkyte%ORA11GR2> set autotrace off


it works great, but if we put lots of data that matches a bit of our string (lots of destinations that for example start with 188), we'll see something start to happen:

ops$tkyte%ORA11GR2> insert into destinations
  2  select '188' || rownum, rownum
  3    from all_objects;

72249 rows created.

ops$tkyte%ORA11GR2> commit;

Commit complete.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> set termout off
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> set autotrace on
ops$tkyte%ORA11GR2> SELECT *
  2    from  (SELECT p.*
  3             FROM   destinations_padded_v p
  4            WHERE  prefix_padded >= RPAD (:x, 15, '~')
  5            ORDER BY prefix_padded)
  6   WHERE 1=1
  7   AND    :x LIKE prefix || '%'
  8   AND    ROWNUM <= 1;

    PREFIX DESCRIPT                                           PREFIX_PADDED
---------- -------------------------------------------------- ---------------
         1 USA                                                1~~~~~~~~~~~~~~


Execution Plan
----------------------------------------------------------
Plan hash value: 1568579591

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |     1 |   124 |     2   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |                 |       |       |            |          |
|   2 |   VIEW                        |                 |   177 | 21948 |     2   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| DESTINATIONS    |   177 | 21948 |     2   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | DESTINATIONS_I1 |   637 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1)
   3 - filter(:X LIKE TO_CHAR("PREFIX")||'%')
   4 - access(RPAD(TO_CHAR("PREFIX"),15,'~')>=RPAD(:X,15,'~'))

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      14277  consistent gets
          0  physical reads
          0  redo size
        569  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

ops$tkyte%ORA11GR2> set autotrace off



So, depending on the nature of the data - this will not scale very well.


After going back and forth on this... I'll probably stay with my "sql but procedural" approach :)

Thanks Tom !

January 05, 2012 - 5:04 pm UTC

Reviewer: Ben

If I get a chance at some point over the next couple of days, I'll grab a copy of the full table (currently 54858 rows) and try out david and yours side by side and see how the query plans compare.... if anything it'll be good for my ongoing Oracle education !
Tom Kyte

Followup  

January 05, 2012 - 5:37 pm UTC

don't compare the plans

run actual queries


The plans will be our plans - but it depends on the nature of your data as to how they will perform.

If your matches are very specific - the "rownum" query will be very good.
If the matches have lots of padded things to search through - the original "index probe till you find it" will be good.

Time to re-read your book !

January 05, 2012 - 6:07 pm UTC

Reviewer: Ben

Guess I should have seen that follow-up coming !

Sounds like it's time for me to brush the dust off Effective Oracle by Design and re-read it.

Looks like the nature of the data errs firmly on the more specific side....

select avg_col_len from user_tab_columns where table_name='FS_PREFIX_TABLE' and column_name='FS_PREFIX';

AVG_COL_LEN
-----------
6

select count(*) from FS_PREFIX_TABLE where length(FS_PREFIX)>6

COUNT(*)
----------
45465

select count(*) from FS_PREFIX_TABLE where length(FS_PREFIX)<6

COUNT(*)
----------
4389

Tom Kyte

Followup  

January 06, 2012 - 4:46 am UTC

but it depends on the numbers contained therein - it has nothing to do with length, perhaps the word "specific" is a bad choice - if there are lots of potential matches to look through - we have a problem.

Suppose you have 45,465 prefixes that all start with

188xxxxx

they are length eight.

It took over 14,000 IOs to find that. It is the *nature* (the values) of the data, not the length.

ops$tkyte%ORA11GR2> select count( case when length(prefix)<=6 then 1 end ) lt6, count( case when length(prefix) > 6 then 1 end) gt6 from destinations;

       LT6        GT6
---------- ----------
      1002      71251




In my example, almost all of the values were longer than 6 - but....

Bad word indeed

January 06, 2012 - 6:16 am UTC

Reviewer: Ben

Aah I'm with you now, "specific" is indeed a bad word !

So basically we're saying your initial answer is generically optimised to be reasonably efficient, but to achieve true optimisation I would need to analyse the prefix list first and then look at query performance against the most commonly occurring least specific prefixes.

Sounds like something to keep me occupied on a rainy day !

Also a note to self that I must remember your count syntax for the future.

January 06, 2012 - 8:16 am UTC

Reviewer: A reader

Well, thanks to another historical Asktom thread, turns out analysing my table was less painful than anticipated and the quantity drops off rapidly for 5 digits plus....so it looks like numbers starting 449 are the ones for me to test SQL performance with... or am I barking up the wrong tree yet again ?

SQL> select * from (select substr(FS_PREFIX,1,3),count(*),rank() over(order by count(*) desc) r from FS_PREFIX_TABLE group by substr(FS_PREFIX,1,3)) where r=1;

SUBSTR(FS_PREFIX   COUNT(*)          R
------------ ---------- ----------
449                8802          1

For 4…..

SUBSTR(FS_PREFIX   COUNT(*)          R
---------------- ---------- ----------
4490                   6346          1


For 5….

SUBSTR(FSRATES_PREFI   COUNT(*)          R
-------------------- ---------- ----------
44905                       919          1


For 6…


SUBSTR(FS_PREFIX,1,   COUNT(*)          R
------------------------ ---------- ----------
448440                          104          1


Tom Kyte

Followup  

January 10, 2012 - 9:38 pm UTC

this is the right track. Basically when you search for 449xxxxxxxx numbers - you might have to search through a lot of 449xxxxx prefixes to find the right one.


In my example - my answer was "1" - the prefix was "1", but we had to search through most everything that started with a prefix of "1" to discover none of them matched - and "1" was the most specific match.

with the semi procedural approach (doing an index probe, the original answer), you have a bounded response time. the this function based index - you could have some that return immediately and others that take thousands of IO's


I feel good with the original solution after all of this.

similar select with row_number() and join

January 09, 2012 - 8:50 am UTC

Reviewer: falco from Italy

select phone
, d.prefix
, len
, d.descrip
from ( select :x phone
, substr( :x, 1, length( :x ) - level + 1 ) prefix
, length( :x ) - level + 1 len
, row_number() over (order by level) rn
from dual
connect by level <= length( :x )
) x
, destinations d
where d.prefix = x.prefix
and rn = 1
/

January 11, 2012 - 2:23 am UTC

Reviewer: Ben

>I feel good with the original solution after all of this.

So do I. Infact I've moved onto another Oracle based project for the time being, and might return to optimising this one on a rainy day !

Thank you for all your help.

January 18, 2012 - 5:44 am UTC

Reviewer: Amalendu Pramanik from London, UK

How about this:

Create Table Destinations(
Prefix Number(9,0)
,Description VarChar2(200)
,Primary Key(Prefix)
) Organization Index;

insert into Destinations values ('1403','Alberta/Canada');
insert into Destinations values ('1809','Dominican Republic');
insert into Destinations values ('180979555','Dominican Republic - Santa Dom.');

Exec DBMS_Stats.Gather_Table_Stats(User,'DESTINATIONS',Cascade=>True);


Select Distinct
:X X
,First_Value(Prefix) Over(Order By Length(Prefix) Desc) Prefix
,First_Value(Description) Over(Order By Length(Prefix) Desc) Description
From Test.Destinations
Where RegExp_Like(:X,'^'||Prefix)
/


X PREFIX
--------------- ----------
DESCRIPTION
--------------------------------------------------------------------------------
180979555123456 180979555
Dominican Republic - Santa Dom.



Execution Plan
----------------------------------------------------------
Plan hash value: 2576117026

--------------------------------------------------------------------------------
-------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Tim
e |

--------------------------------------------------------------------------------
-------

| 0 | SELECT STATEMENT | | 1 | 27 | 3 (67)| 00:
00:01 |

| 1 | HASH UNIQUE | | 1 | 27 | 3 (67)| 00:
00:01 |

| 2 | WINDOW SORT | | 1 | 27 | 3 (67)| 00:
00:01 |

|* 3 | INDEX FULL SCAN| SYS_IOT_TOP_70466 | 1 | 27 | 1 (0)| 00:
00:01 |

--------------------------------------------------------------------------------
-------


Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter( REGEXP_LIKE ('180979555123456','^'||TO_CHAR("PREFIX")))


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1 consistent gets
0 physical reads
0 redo size
592 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed
Tom Kyte

Followup  

January 18, 2012 - 7:39 am UTC

benchmark it with more than, on I don't know, three rows....

an index full scan with a regular expression - ouch, that'll hurt big time on more than three rows.

January 19, 2012 - 7:21 am UTC

Reviewer: Amalendu Pramanik from London, UK

Thanks...
Inserted some records:
Select Count(*) From Destinations;
258915 Rows

SQL used:

Select Distinct
:X X
,First_Value(Prefix) Over(Order By Length(Prefix) Desc) Prefix
,First_Value(Description) Over(Order By Length(Prefix) Desc) Description
From Test.Destinations
Where Prefix=SubStr(:X,1,Length(Prefix))
/


X
--------------------------------------------------------------------------------
PREFIX
----------
DESCRIPTION
--------------------------------------------------------------------------------
180979555123456
180979555
Dominican Republic - Santa Dom.



Execution Plan
----------------------------------------------------------
Plan hash value: 4137395842

--------------------------------------------------------------------------------
------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)
| Time |

--------------------------------------------------------------------------------
------------

| 0 | SELECT STATEMENT | | 1 | 31 | 674 (3)
| 00:00:09 |

| 1 | HASH UNIQUE | | 1 | 31 | 674 (3)
| 00:00:09 |

| 2 | WINDOW SORT | | 1 | 31 | 674 (3)
| 00:00:09 |

|* 3 | INDEX FAST FULL SCAN| SYS_IOT_TOP_70466 | 1 | 31 | 672 (3)
| 00:00:09 |

--------------------------------------------------------------------------------
------------


Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("PREFIX"=TO_NUMBER(SUBSTR(:X,1,LENGTH(TO_CHAR("PREFIX")))))


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2469 consistent gets
0 physical reads
0 redo size
592 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed


Tom Kyte

Followup  

January 19, 2012 - 12:29 pm UTC

see.

now run a tkprof on it and compare to


call     count       cpu    elapsed       disk      query    current        
rows
------- ------  -------- ---------- ---------- ---------- ----------  
----------
Parse        1      0.00       0.00          0          0          0           
0
Execute      1      0.00       0.00          0          0          0           
0
Fetch        2      0.00       0.00          0         35          0           
1
------- ------  -------- ---------- ---------- ---------- ----------  
----------
total        4      0.00       0.00          0         35          0           
1



I'll bet, in addition to yours doing 3 orders of magnitude more IO, it uses quite a bit more CPU


benchmarking, it is where it is at

January 20, 2012 - 4:54 am UTC

Reviewer: Amalendu Pramanik from London, UK

TKPROF output:


Select Distinct
:X X
,First_Value(Prefix) Over(Order By Length(Prefix) Desc) Prefix
,First_Value(Description) Over(Order By Length(Prefix) Desc) Description
From Test.Destinations
Where Prefix=SubStr(:X,1,Length(Prefix))

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.32 0.32 0 2469 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.32 0.32 0 2469 0 1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 5

Rows Row Source Operation
------- ---------------------------------------------------
1 HASH UNIQUE (cr=2469 pr=0 pw=0 time=0 us cost=674 size=31 card=1)
2 WINDOW SORT (cr=2469 pr=0 pw=0 time=7 us cost=674 size=31 card=1)
2 INDEX FAST FULL SCAN SYS_IOT_TOP_70466 (cr=2469 pr=0 pw=0 time=23267 us cost=672 size=31 card=1)(object id 70467)

Tom Kyte

Followup  

January 20, 2012 - 9:56 am UTC

you are just providing evidence that the approach suggested by you was - well - not a good idea.

I didn't really want you to post it, just wanted you to investigate it on your own - to learn the tools to use before proposing things, so you can see how much work something does. So that in the future, you can test your ideas before suggesting them...

It'll make you look a lot better as you'll tend to only suggest things that are better than the prior suggestions, or say nothing at all :)

Unexpectedly good bitmap index plan?

January 24, 2012 - 10:58 am UTC

Reviewer: Matthew McPeak from Secaucus, NJ

I'm still preoccupied with this for some reason.

Given the fact that there is a maximum length of 15 digits (or was it 9?) and the list of prefixes is relatively static, I began wondering about how a bitmap index might be useful here.

<code>
drop table destinations;

create table destinations(prefix varchar2(9),descr varchar2(200)); 
insert into destinations (prefix,descr) values ('1','USA'); 
insert into destinations (prefix,descr) values ('1403','Alberta/Canada'); 
insert into destinations (prefix,descr) values ('1809','Dominican Republic'); 
insert into destinations (prefix,descr) values ('180979555','Dominican Republic - Santa Dom.');

insert into destinations
  select '188' || rownum, rownum
   from all_objects
   where rownum <= 75000;
   
create bitmap index destinations_n99 on destinations ( prefix ); 

exec dbms_stats.gather_table_stats( user, 'DESTINATIONS', cascade=>true );


Then, I ran the following query:

select * from (
SELECT :n n /*18822379555123456*/, destinations.* from destinations
where prefix = substr(:n,1,1)
or prefix = substr(:n,1,2)
or prefix = substr(:n,1,3)
or prefix = substr(:n,1,4)
or prefix = substr(:n,1,5)
or prefix = substr(:n,1,6)
or prefix = substr(:n,1,7)
or prefix = substr(:n,1,8)
or prefix = substr(:n,1,9)
or prefix = substr(:n,1,10)
or prefix = substr(:n,1,11)
or prefix = substr(:n,1,12)
or prefix = substr(:n,1,13)
or prefix = substr(:n,1,14)
or prefix = substr(:n,1,15)
order by length(prefix) desc
)
where rownum = 1


... which has the following results...

------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 12 | | | |
|* 1 | COUNT STOPKEY | | 1 | | 1 |00:00:00.01 | 12 | | | |
| 2 | VIEW | | 1 | 15 | 1 |00:00:00.01 | 12 | | | |
|* 3 | SORT ORDER BY STOPKEY | | 1 | 15 | 1 |00:00:00.01 | 12 | 2048 | 2048 | 2048 (0)|
| 4 | INLIST ITERATOR | | 1 | | 2 |00:00:00.01 | 12 | | | |
| 5 | TABLE ACCESS BY INDEX ROWID | DESTINATIONS | 9 | 15 | 2 |00:00:00.01 | 12 | | | |
| 6 | BITMAP CONVERSION TO ROWIDS| | 9 | | 2 |00:00:00.01 | 11 | | | |
|* 7 | BITMAP INDEX SINGLE VALUE | DESTINATIONS_N99 | 9 | | 2 |00:00:00.01 | 11 | | | |
------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM=1)
3 - filter(ROWNUM=1)
7 - access(("PREFIX"=SUBSTR(:N,1,1) OR "PREFIX"=SUBSTR(:N,1,2) OR "PREFIX"=SUBSTR(:N,1,3) OR "PREFIX"=SUBSTR(:N,1,4) OR
"PREFIX"=SUBSTR(:N,1,5) OR "PREFIX"=SUBSTR(:N,1,6) OR "PREFIX"=SUBSTR(:N,1,7) OR "PREFIX"=SUBSTR(:N,1,8) OR
"PREFIX"=SUBSTR(:N,1,9) OR "PREFIX"=SUBSTR(:N,1,10) OR "PREFIX"=SUBSTR(:N,1,11) OR "PREFIX"=SUBSTR(:N,1,12) OR
"PREFIX"=SUBSTR(:N,1,13) OR "PREFIX"=SUBSTR(:N,1,14) OR "PREFIX"=SUBSTR(:N,1,15)))

What surprised me (because I don't really understand, I guess) is that the bitmap index could be accessed in one step with such a complex predicate.

Would you mind explaining exactly how that works? Is it just multiple ORs that can be combined into a single access or can ANDs be too?

Thanks in advance!
Matt</code>
Tom Kyte

Followup  

January 24, 2012 - 1:44 pm UTC

I began wondering about how a bitmap
index might be useful here.


as useful as a b*tree :) a little less so perhaps because of the extra work the bitmap would have to do (to convert a bitmap into a rowid)


What surprised me (because I don't really understand, I guess) is that the bitmap index could be accessed in one step with such a complex predicate.


so can a b*tree - step 4 is an inlist iterator, it is just looping over your OR values.


you can get a similar plan with a 'regular' index too - there is just a loop in there.


....
ops$tkyte%ORA11GR2> create index destinations_n99 on destinations ( prefix );
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'DESTINATIONS', cascade=>true );

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> variable n varchar2(20)
ops$tkyte%ORA11GR2> set autotrace traceonly explain
ops$tkyte%ORA11GR2> select * from (
  2  SELECT :n n /*18822379555123456*/, destinations.* from destinations
  3  where prefix = substr(:n,1,1)
  4  or prefix = substr(:n,1,2)
  5  or prefix = substr(:n,1,3)
  6  or prefix = substr(:n,1,4)
  7  or prefix = substr(:n,1,5)
  8  or prefix = substr(:n,1,6)
  9  or prefix = substr(:n,1,7)
 10  or prefix = substr(:n,1,8)
 11  or prefix = substr(:n,1,9)
 12  or prefix = substr(:n,1,10)
 13  or prefix = substr(:n,1,11)
 14  or prefix = substr(:n,1,12)
 15  or prefix = substr(:n,1,13)
 16  or prefix = substr(:n,1,14)
 17  or prefix = substr(:n,1,15)
 18  order by length(prefix) desc
 19  )
 20  where rownum = 1 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 1679893729

----------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                  |     1 |   126 |    19   (6)| 00:00:01 |
|*  1 |  COUNT STOPKEY                  |                  |       |       |            |          |
|   2 |   VIEW                          |                  |    15 |  1890 |    19   (6)| 00:00:01 |
|*  3 |    SORT ORDER BY STOPKEY        |                  |    15 |   225 |    19   (6)| 00:00:01 |
|   4 |     INLIST ITERATOR             |                  |       |       |            |          |
|   5 |      TABLE ACCESS BY INDEX ROWID| DESTINATIONS     |    15 |   225 |    18   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN          | DESTINATIONS_N99 |    15 |       |    15   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   3 - filter(ROWNUM=1)
   6 - access("PREFIX"=SUBSTR(:N,1,1) OR "PREFIX"=SUBSTR(:N,1,2) OR "PREFIX"=SUBSTR(:N,1,3)
              OR "PREFIX"=SUBSTR(:N,1,4) OR "PREFIX"=SUBSTR(:N,1,5) OR "PREFIX"=SUBSTR(:N,1,6) OR
              "PREFIX"=SUBSTR(:N,1,7) OR "PREFIX"=SUBSTR(:N,1,8) OR "PREFIX"=SUBSTR(:N,1,9) OR
              "PREFIX"=SUBSTR(:N,1,10) OR "PREFIX"=SUBSTR(:N,1,11) OR "PREFIX"=SUBSTR(:N,1,12) OR
              "PREFIX"=SUBSTR(:N,1,13) OR "PREFIX"=SUBSTR(:N,1,14) OR "PREFIX"=SUBSTR(:N,1,15))

ops$tkyte%ORA11GR2> set autotrace off



problem with that is that it'll always do the MAX number of index probes, then sort, then kick out the first.

Original plan above is still so far the best plan

Ah...

January 24, 2012 - 2:09 pm UTC

Reviewer: Matthew McPeak from Secaucus, NJ

Yup. I just have to let it go. ;)

Thanks for your help.

Enhanced Prefix Match

July 18, 2012 - 9:56 am UTC

Reviewer: Martin from Vienna, Austria

Hi,
i have a similar problem, i don't know how(if possible) to adapt your solution:
create table x_destinations(carrier number(5), quality varchar2(6), telnumber varchar2(20), dest_name varchar2(100)
, valid_from date, valid_until date, time_from varchar2(5), time_until varchar2(5), rate number(10,4));
insert into x_destinations values (1, 'SILVER', '1', 'USA', to_date('20120401', 'RRRRMMDD'), to_date('20120430', 'RRRRMMDD'),'00:00', '23:59', 1);
insert into x_destinations values (1, 'GOLD', '1', 'USA', to_date('20120401', 'RRRRMMDD'), to_date('20120430', 'RRRRMMDD'),'00:00', '23:59', 2);
insert into x_destinations values (1, 'SILVER', '1', 'USA', to_date('20120501', 'RRRRMMDD'), to_date('20120531', 'RRRRMMDD'),'00:00', '23:59', 1.2);
insert into x_destinations values (1, 'GOLD', '1', 'USA', to_date('20120501', 'RRRRMMDD'), to_date('20120531', 'RRRRMMDD'),'00:00', '23:59', 2.3);
insert into x_destinations values (1, 'SILVER', '1403', 'Alberta/Canada', to_date('20120401', 'RRRRMMDD'), to_date('20120430', 'RRRRMMDD'),'00:00', '23:59',1.2);
insert into x_destinations values (1, 'SILVER', '1809', 'Dominican Republic', to_date('20120401', 'RRRRMMDD'), to_date('20120430', 'RRRRMMDD'),'00:00', '23:59',2.5);
insert into x_destinations values (1, 'SILVER', '180979555', 'Dominican Republic - Santa-Dom.', to_date('20120401', 'RRRRMMDD'), to_date('20120430', 'RRRRMMDD'),'00:00', '23:59',3);
create index x_dest_idx on x_destinations
(carrier, quality, telnumber, valid_from, valid_until, time_from, time_until);

create table x_tickets(carrier number(5), quality varchar2(6), telnumber varchar2(20), date_start date, time_start varchar2(5), duration number(4));
insert into x_tickets values (1, 'SILVER', '123456789', to_date('20120411', 'RRRRMMDD'), '11:14',30);
insert into x_tickets values (1, 'SILVER', '123456789', to_date('20120511', 'RRRRMMDD'), '11:14',30);
insert into x_tickets values (1, 'GOLD', '123456789', to_date('20120411', 'RRRRMMDD'), '11:14',40);
insert into x_tickets values (1, 'GOLD', '123456789', to_date('20120411', 'RRRRMMDD'), '11:14',40);
insert into x_tickets values (1, 'SILVER', '18093456789', to_date('20120411', 'RRRRMMDD'), '11:14',45);
insert into x_tickets values (1, 'SILVER', '180979556789', to_date('20120411', 'RRRRMMDD'), '11:14',168);
insert into x_tickets values (1, 'SILVER', '1809795556789', to_date('20120411', 'RRRRMMDD'), '11:14',200);
insert into x_tickets values (1, 'SILVER', '1809795556789', to_date('20120511', 'RRRRMMDD'), '11:14',15);

create table x_tickets_new(carrier number(5), quality varchar2(6), telnumber varchar2(20), date_start date, time_start varchar2(5), duration number(4), rate number(10,4), call_sum number(15,4));

DECLARE
lv_rate NUMBER (10,4) := 0;
lv_call_sum NUMBER (15,4) := 0;

CURSOR c_get_tickets IS
SELECT *
FROM x_tickets;

TYPE tick_tab_typ IS TABLE OF c_get_tickets%ROWTYPE;
tick_tab tick_tab_typ;

BEGIN
OPEN c_get_tickets;
LOOP
FETCH c_get_tickets BULK COLLECT INTO tick_tab LIMIT 1000;
EXIT WHEN tick_tab.COUNT = 0;

FOR i IN tick_tab.FIRST .. tick_tab.LAST LOOP
select rate into lv_rate from
(select cp.rate as rate
from x_destinations cp
where cp.carrier = tick_tab(i).carrier
and cp.quality = tick_tab(i).quality
and cp.telnumber = substr(tick_tab(i).telnumber, 1, LENGTH(cp.telnumber))
and tick_tab(i).date_start between cp.valid_from and cp.valid_until
and tick_tab(i).time_start between cp.time_from and cp.time_until
order by LENGTH(cp.telnumber) desc)
where rownum = 1;

lv_call_sum := (tick_tab(i).duration * lv_rate) / 60;
insert into x_tickets_new values (tick_tab(i).carrier, tick_tab(i).quality, tick_tab(i).telnumber
, tick_tab(i).date_start, tick_tab(i).time_start, tick_tab(i).duration
, lv_rate, lv_call_sum);
END LOOP;
COMMIT;
EXIT WHEN c_get_tickets%NOTFOUND;
END LOOP;
CLOSE c_get_tickets;
END;

The x_destinations table has over 8.000.000 entries.
It takes 3 minutes to process 10.000 tickets - way too long. I tried partitioning on carrier, quality - still over 2 minutes....and i have 6Mio tickets a day....

thanks
Tom Kyte

Followup  

July 18, 2012 - 11:33 am UTC

if you actually expect me to reverse engineer your code and figure out what your algorithm is and from that derive a clear concise specification for the code that needs to be developed and then develop it....

you are not correct.

Now, if you would like to post, in the form of a clear concise specification, what the exact, detailed, correct programming requirements are - we might be able to take a look.

Enhanced Prefix Match

July 19, 2012 - 2:27 am UTC

Reviewer: Martin from Vienna, Austria

Hi,
you are right, should have been in my first post.

database tables:
1) table x_destinations: Holds the numbering plan and prices of several telephone-carriers. For instance Swisscom, British Telecom - and they all have their own numbering plan and of course different prices. Most of the rates are valid monthly and there also could be a difference in prices during peak and offpeak times.

2) table x_tickets: Those are tickets(telephone calls) generated daily by the switch. There i can identify the carrier, the quality of the line, startday/-time of the call and the duration. There are 6 million telephone calls each day.

3) table x_tickets_new: The result table with the correct rate and sum for each call.


specification:
The goal is to find the "best match"(beginning at the length of the number and getting smaller) for each ticket in the telnumber-plans in x_destinations to get the correct rate.
Of course the carrier, quality and date/time have to match too.


my solution so far:
So i came up with a select to find the best match - unfortunatly i had to use rownum and therefor did not find a way to solve it purely in sql. So i took the tickets in a bulk load and get for each of them the rate. Partitioning helps but not to the extent where it's enough. I also tried bulk insert with forall, but it's also not the problem(time consuming) here. Then i tried to run this parallel(5 times) - that's possible but still over 6 hours overall.....

So i am out of ideas now, in fact i need a very different solution to solve this performance problem.

thanks
Martin
test environment:
Oracle 11.2.0.1.0
AIX with 2 Power7

code:
create table x_destinations(carrier number(5), quality varchar2(6), telnumber varchar2(20), dest_name varchar2(100)
                        , valid_from date, valid_until date
                        , time_from varchar2(5), time_until varchar2(5), rate number(10,4));
insert into x_destinations values (1, 'SILVER', '1', 'USA', to_date('20120401', 'RRRRMMDD'), to_date('20120430', 'RRRRMMDD'),                         
                                    '00:00', '23:59', 1);
insert into x_destinations values (1, 'GOLD', '1', 'USA', to_date('20120401', 'RRRRMMDD'), to_date('20120430', 'RRRRMMDD'),                         
                                    '00:00', '23:59', 2);
insert into x_destinations values (1, 'SILVER', '1', 'USA', to_date('20120501', 'RRRRMMDD'), to_date('20120531', 'RRRRMMDD'),                         
                                    '00:00', '23:59', 1.2);
insert into x_destinations values (1, 'GOLD', '1', 'USA', to_date('20120501', 'RRRRMMDD'), to_date('20120531', 'RRRRMMDD'),                         
                                    '00:00', '23:59', 2.3);
insert into x_destinations values (1, 'SILVER', '1403', 'Alberta/Canada', to_date('20120401', 'RRRRMMDD'), to_date('20120430', 'RRRRMMDD'),                         
                                    '00:00', '23:59',1.2);
insert into x_destinations values (1, 'SILVER', '1809', 'Dominican Republic', to_date('20120401', 'RRRRMMDD'), to_date('20120430', 'RRRRMMDD'),                         
                                    '00:00', '23:59',2.5);
insert into x_destinations values (1, 'SILVER', '180979555', 'Dominican Republic - Santa-Dom.', to_date('20120401', 'RRRRMMDD'), to_date('20120430', 'RRRRMMDD'),                         
                                    '00:00', '23:59',3);
create index x_dest_idx on x_destinations
(carrier, quality, telnumber, valid_from, valid_until, time_from, time_until);
create table x_tickets(carrier number(5), quality varchar2(6), telnumber varchar2(20)
                        , date_start date, time_start varchar2(5), duration number(4)); 
insert into x_tickets values (1, 'SILVER', '123456789', to_date('20120411', 'RRRRMMDD'), '11:14',30);
insert into x_tickets values (1, 'SILVER', '123456789', to_date('20120511', 'RRRRMMDD'), '11:14',30);
insert into x_tickets values (1, 'GOLD', '123456789', to_date('20120411', 'RRRRMMDD'), '11:14',40);
insert into x_tickets values (1, 'GOLD', '123456789', to_date('20120511', 'RRRRMMDD'), '11:14',40);
insert into x_tickets values (1, 'SILVER', '18093456789', to_date('20120411', 'RRRRMMDD'), '11:14',45);
insert into x_tickets values (1, 'SILVER', '180979556789', to_date('20120411', 'RRRRMMDD'), '11:14',168);
insert into x_tickets values (1, 'SILVER', '1809795556789', to_date('20120411', 'RRRRMMDD'), '11:14',200);
insert into x_tickets values (1, 'SILVER', '1809795556789', to_date('20120511', 'RRRRMMDD'), '11:14',15);
commit;
create table x_tickets_new(carrier number(5), quality varchar2(6), telnumber varchar2(20)
                        , date_start date, time_start varchar2(5), duration number(4), rate number(10,4), call_sum number(15,4)); 


DECLARE
  lv_rate                        NUMBER (10,4) := 0;
  lv_call_sum                    NUMBER (15,4) := 0;

  CURSOR c_get_tickets IS
      SELECT *
        FROM x_tickets;

  TYPE tick_tab_typ IS TABLE OF c_get_tickets%ROWTYPE;
  tick_tab tick_tab_typ;  

        
BEGIN
  OPEN c_get_tickets;
  LOOP
    FETCH c_get_tickets BULK COLLECT INTO tick_tab LIMIT 1000;
    EXIT WHEN tick_tab.COUNT = 0;

    FOR i IN tick_tab.FIRST .. tick_tab.LAST LOOP
        select rate into lv_rate from 
            (select cp.rate as rate
            from x_destinations cp
            where cp.carrier = tick_tab(i).carrier
              and cp.quality  = tick_tab(i).quality
              and cp.telnumber = substr(tick_tab(i).telnumber, 1, LENGTH(cp.telnumber))
              and tick_tab(i).date_start between cp.valid_from and cp.valid_until
              and tick_tab(i).time_start between cp.time_from and cp.time_until
            order by LENGTH(cp.telnumber) desc)
        where  rownum = 1;

        lv_call_sum := (tick_tab(i).duration * lv_rate) / 60;            
        insert into x_tickets_new values (tick_tab(i).carrier, tick_tab(i).quality, tick_tab(i).telnumber
                                        , tick_tab(i).date_start, tick_tab(i).time_start, tick_tab(i).duration
                                        , lv_rate, lv_call_sum);
    END LOOP; 
    COMMIT;    
    EXIT WHEN c_get_tickets%NOTFOUND;
  END LOOP;
  CLOSE c_get_tickets;
END;

Tom Kyte

Followup  

July 19, 2012 - 9:58 am UTC

ops$tkyte%ORA11GR2> select carrier, quality, telnumber, date_start, time_start, duration, rate, call_sum
  2    from (
  3  select t.rowid rid, t.carrier, t.quality, t.telnumber, t.date_start, t.time_start, t.duration, d.rate, (t.duration * d.rate / 60) call_sum,
  4         row_number() over (partition by t.rowid order by length(d.telnumber) DESC) rn
  5    from x_tickets T, x_destinations D
  6   where t.carrier = d.carrier
  7     and t.quality = d.quality
  8     and t.telnumber like d.telnumber || '%'
  9     and t.date_start between d.valid_from and d.valid_until
 10     and t.time_start between d.time_from and d.time_until
 11         )
 12   where rn = 1
 13  /

   CARRIER QUALIT TELNUMBER            DATE_STAR TIME_   DURATION       RATE   CALL_SUM
---------- ------ -------------------- --------- ----- ---------- ---------- ----------
         1 SILVER 123456789            11-APR-12 11:14         30          1         .5
         1 SILVER 123456789            11-MAY-12 11:14         30        1.2         .6
         1 GOLD   123456789            11-APR-12 11:14         40          2 1.33333333
         1 GOLD   123456789            11-MAY-12 11:14         40        2.3 1.53333333
         1 SILVER 18093456789          11-APR-12 11:14         45        2.5      1.875
         1 SILVER 180979556789         11-APR-12 11:14        168        2.5          7
         1 SILVER 1809795556789        11-APR-12 11:14        200          3         10
         1 SILVER 1809795556789        11-MAY-12 11:14         15        1.2         .3

8 rows selected.


Just make sure that not a single index is even considered for this query - you want two full scans and a hash join.


hints for the future for plsql coding:

a) if you are bulk collecting, why not bulk inserting??? code should have been:

open c;
loop
   fetch c bulk collect into array limit N;
   for i in 1 .. array.count
   loop
       whatever
   end loop;

   forall i in 1 .. array.count
       insert into t (....) values ( array(i).x, array(i).y ... );

   exit when c%notfound;
end loop;
close c;



b) don't commit in that loop, that just makes things go slower


david approach reviewed

July 10, 2013 - 10:24 am UTC

Reviewer: Matteo from Italy

Hi Tom,
time has come for me to do a best match on a prefix list in a realtime env,
so I'm reviewing this question and the proposed approaches.
As you stated your solution is response time bounded by the number of iterations (length of the telephone number). David approach however was so linear that I tried it again
changing the logic a little. Instead of searching the MIN
on (padded) prefixes greater than my number I search for
the MAX on (original) prefixes smaller than my number.
I also believe that David original sin was that he wrote the like condition outside the MIN subquery.

test case:
create table destinations6 (prefix varchar2(20) primary key ,descrip varchar2(200) ) organization index ;

insert into destinations6 (prefix,descrip) values ('1','USA');
insert into destinations6 (prefix,descrip) values ('1403','Alberta/Canada');
insert into destinations6 (prefix,descrip) values ('1809','Dominican Republic');
insert into destinations6 (prefix,descrip) values ('180979555','Dominican Republic - Santa Dom.');


begin dbms_errlog.create_error_log('DESTINATIONS6'); end;

insert into destinations6
select dbms_random.value( 10, 999999999 ), dbms_random.string( 'a', 25 )
from dual
connect by level <= 100000
log errors reject limit unlimited;


insert into destinations6
select '180' || rownum, rownum
from all_objects
log errors reject limit unlimited;

insert into destinations6
select '188' || rownum, rownum
from all_objects
log errors reject limit unlimited;

commit;

begin dbms_stats.gather_table_stats( user, 'DESTINATIONS6', cascade=>true ); end;
/

my query:
select prefix, descrip
from destinations6
where prefix = (select MAX(prefix) from destinations6 where prefix < :x
and :x like prefix || '%');

tried these values:

exec :x := '180979555123456' -- 4 consistent gets
exec :x := '1808123456' -- 4 consistent gets
exec :x := '120979555123456' -- 4 consistent gets
exec :x := '18123456' -- 270 consistent gets
exec :x := '18923456' -- 573 consistent gets
exec :x := '1891234567890123456' -- 573 consistent gets


your query:
select prefix, (select descrip from destinations6 where prefix = x.prefix) descript
from (
select :x phone, substr( :x, 1, length(:x)-level+1 ) prefix,
length(:x)-level+1 len
from dual
connect by level <= length(:x)
order by level
) x
where rownum = 1
and (select descrip from destinations6 where prefix = x.prefix) is not null;

tried these values:
exec :x := '180979555123456' -- 11 consistent gets
exec :x := '1808123456' -- 8 consistent gets
exec :x := '120979555123456' -- 11 consistent gets
exec :x := '18123456' -- 13 consistent gets
exec :x := '18923456' -- 13 consistent gets
exec :x := '18912345678901234567' -- 13 consistent gets


The biggest difference is when the searched number is greater than most of the prefixes; then the LIKE condition has a lot of work to do.

In your case the number of gets are bounded to the
number of iterations to do based on number length.

I'll go for the iterative one.
Many thanks

Matteo

More to Explore

DBMS_ERRLOG

More on PL/SQL routine DBMS_ERRLOG here