Arun Kumar -- Thanks for the question regarding "SQL Query - Best match", version 10.2.0.3
Submitted on 16-Jan-2008 1:42 Central time zone
Last updated 29-Jan-2008 7:00
You Asked
Hi Tom,
create table id_test (id number, cola number, colb number, colc number, cold number, cole number);
insert into id_test values (1,6495,0,1,1,null);
insert into id_test values (2,6495,0,null,1,2);
SQL> select * from id_test;
id colA colB colC colD colE
---- ----- ----- ---- ----- -----
1 6495 0 1 1
2 6495 0 1 2
Could please let me know how to do the following:-
Based on input parameters, the record which matches most columns should return the id.
For e.g.,
a) If my input parameter is, Para1=6495, Para2=0, Para3=1, Para4=1, Para5=2 it should return id 1
b) If my input parameter is, Para1=6495, Para2=0, Para4=1, Para5=2 it should return id 2
Thanks
Arun
and we said...
Now, this ISN'T going to be fast of course, because you want to match on 1 to five columns.
In short, it would resemble this (you might have to modify the predicate depending on how you feel NULLS should be treated, I'm treating them the way sql says to - null is not equal (nor the converse) to null..
SQL> create table id_test (id number, cola number, colb number, colc number, cold number,
2 cole number);
Table created.
SQL>
SQL> insert into id_test values (1,6495,0,1,1,null);
1 row created.
SQL> insert into id_test values (2,6495,0,null,1,2);
1 row created.
SQL>
SQL>
SQL> create or replace function get_best
2 ( p_1 in number, p_2 in number, p_3 in number, p_4 in number, p_5 in number )
3 return sys_refcursor
4 is
5 l_cursor sys_refcursor;
6 begin
7 open l_cursor for
8 select *
9 from (
10 select *
11 from id_test
12 where cola = p_1 or colb = p_2 or colc = p_3 or cold = p_4 or cole = p_5
13 order by case when cola=p_1 then 1 else 0 end +
14 case when colb=p_2 then 1 else 0 end +
15 case when colc=p_3 then 1 else 0 end +
16 case when cold=p_4 then 1 else 0 end +
17 case when cole=p_5 then 1 else 0 end DESC
18 )
19 where rownum = 1;
20 return l_cursor;
21 end;
22 /
Function created.
SQL>
SQL> variable x refcursor
SQL> exec :x := get_best( 6495, 0, 1, 1, 2 );
PL/SQL procedure successfully completed.
SQL> print x
ID COLA COLB COLC COLD COLE
---------- ---------- ---------- ---------- ---------- ----------
1 6495 0 1 1
SQL> exec :x := get_best( 6495, 0, null, 1, 2 );
PL/SQL procedure successfully completed.
SQL> print x
ID COLA COLB COLC COLD COLE
---------- ---------- ---------- ---------- ---------- ----------
2 6495 0 1 2
SQL Query - Best match
January 16, 2008 - 10am Central time zone
Reviewer: A reader
Hi Tom,
Thanks a lot. Fantastic logic and it works like a charm.
(my actual query has to match against 10 columns, might
have performance issues)
Thanks & Regards
R.Arun Kumar
deterministic best matches ... nondeterministic best match
January 16, 2008 - 10am Central time zone
Reviewer: nameless
The where rownum = 1 makes the solution nondeterministic.
The custom ranking expression in the order by clause does not yield unique values.
Followup January 16, 2008 - 4pm Central time zone:
correct, hopefully also - obvious?
Determinism and a natural extension
January 16, 2008 - 1pm Central time zone
Reviewer: Dan Blum from Cambridge, MA
No, the solution is not deterministic, because the problem is not - it is not stated what should
happen if more than one record matches the same maximum number of columns. If deterministic results
are required, either all best matches need to be returned, or the query needs to do a secondary
sort on a unique column such as ID.
I've had occasion several times now to use a natural extension of this solution, which handles the
case where the desired record is the one that matches the most columns, but where there is a column
preference order to handle ties. (Of course it is still necessary to include a unique column to get
a completely deterministic result.)
To do this, the ranking expression needs to change slightly:
order by case when cola=p_1 then 1.08 else 0 end +
case when colb=p_2 then 1.04 else 0 end +
case when colc=p_3 then 1.02 else 0 end +
case when cold=p_4 then 1.01 else 0 end +
case when cole=p_5 then 1 else 0 end DESC
As shown, ColA has the highest priority, followed by ColB, etc.
This works for up to 7 columns - beyond that, you need to use values 1, 1.001, 1.002, etc. (the sum
of N match values must be less than N+1).
Re: Dan
January 16, 2008 - 3pm Central time zone
Reviewer: nameless
the case where the desired record is the one that matches the most columns, but where there is a column preference order to handle ties
Should you want to dispense of those special fractions and have a generic solution ... then just vector the columns in the desired order:
order by /* the most column matches */
case when cola=p_1 then 1 else 0 end +
case when colb=p_2 then 1 else 0 end +
case when colc=p_3 then 1 else 0 end +
case when cold=p_4 then 1 else 0 end +
case when cole=p_5 then 1 else 0 end desc
,/* the preference vector for ties */
bin_to_num(case when cola=p_1 then 1 else 0 end,
case when colb=p_2 then 1 else 0 end,
case when colc=p_3 then 1 else 0 end,
case when cold=p_4 then 1 else 0 end,
case when cole=p_5 then 1 else 0 end) desc
Followup January 16, 2008 - 4pm Central time zone:
nice idea
Best Match
January 26, 2008 - 8am Central time zone
Reviewer: Arun
Hi Tom,
We have requirement to match 23 columns and ofcourse its not going to be fast as mentioned by you.
We thought of having all 23 columns (all are numeric) into single aggregate key of the format
(0:1021:4:6931:0:1:3:1:1:16875:2:0:0:0:0:0:0:0:0:0:0:0:1)
(0:1021:4:6931:0:1:3:1:4:16959:2:0:0:0:0:0:0:0:0:0:0:0:1)where 0 represents it can be ANY.
Could you please help how we can query this to get maximum matching records?
Thanks & Regards
Arun
Followup January 29, 2008 - 2am Central time zone:
you gave me sort of an idea....
ops$tkyte%ORA11GR1> create table t ( x varchar2(50) );
Table created.
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> insert into t values ( '1 12345 987' );
1 row created.
ops$tkyte%ORA11GR1> insert into t values ( '1 12345 987 567' );
1 row created.
ops$tkyte%ORA11GR1> create index t_idx on t(x) indextype is ctxsys.context;
Index created.
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> select x, score(0)
2 from t
3 where contains( x, '1 or 12345 or 567', 0 ) > 0
4 order by 2 desc
5 /
X SCORE(0)
-------------------------------------------------- ----------
1 12345 987 567 4
1 12345 987 3
you might be able to use a text index and the contains operator - play around with that - lots of examples on this site, documentation for text exists on otn.oracle.com. You might be indexing a function that strings together your numbers - then contains and score can be used.
Thanks
January 29, 2008 - 3am Central time zone
Reviewer: Arun Kumar Ramachandran
Hi Tom,
Thanks a lot for your help.
Thanks
R.Arun
Best match
January 29, 2008 - 3am Central time zone
Reviewer: Arun
Hi Tom,
Tried the following.
create table t (sel_key varchar2(500));
create index t_idx on t(sel_key) indextype is ctxsys.context;
insert into t values ('0:1035:1:7032:1:0:1106:1:10:0:1000:1001:7:0:0:0:0:0:0:0:0:0:0');
insert into t values ('0:1035:0:0:-1:0:0:0:0:0:0:0:2:0:0:0:0:0:0:0:0:0:0');
select * from t
SEL_KEY
0:1035:1:7032:1:0:1106:1:10:0:1000:1001:7:0:0:0:0:0:0:0:0:0:0
0:1035:0:0:-1:0:0:0:0:0:0:0:2:0:0:0:0:0:0:0:0:0:0
"0' represents ANY value.
If my input is '0:1035:0:7032', it should return me first row, but not getting any record.
select sel_key, score(0)
from t where contains( sel_key, '0:1035:0:7032', 0 ) > 0
order by 2 desc
Thanks
Arun
Followup January 29, 2008 - 7am Central time zone:
did you read about contains.
do you understand what it does.
do you know what it does.
do you know how it does it.
please read a bit before using it. I gave you an example, you modified it very much, into something that doesn't make sense with a text index.
Regarding the text index ...
January 30, 2008 - 12am Central time zone
Reviewer: nameless
Hopefully I'm not pointing the obvious again.
To avoid false positives the columns need to be encoded in the text somehow.
SQL> insert into t values ( 'c1:1 c2:12345 c3:987' );
1 row created.
SQL>
SQL> insert into t values ( 'c1:1 c2:12345 c3:987 c4:567' );
1 row created.
SQL>
SQL> create index t_idx on t(x) indextype is ctxsys.context;
Index created.
SQL>
SQL> select x, score(0)
2 from t where contains( x, 'c1:1 or c2:12345 or c4:567', 0 ) > 0
3 order by 2 desc;
X SCORE(0)
-------------------------------------------------- ----------
c1:1 c2:12345 c3:987 c4:567 4
c1:1 c2:12345 c3:987 3
SQL>
SQL> select x, score(0)
2 from t where contains( x, 'c1:567 or c2:1 or c4:12345', 0 ) > 0
3 order by 2 desc;
no rows selected
SQL>
Arun ...
January 30, 2008 - 4pm Central time zone
Reviewer: nameless
We have requirement to match 23 columns and of course its not going to be fast as mentioned by you.
Why "of course "?
Did you try Tom's original solution or you just took his word that is not going to be fast? How much data do you have? Indeed there will be a full table scan every time but that may still be, performance wise, acceptable.
For this problem, it is highly unlikely that there is a clever solution ... so you'll have to rely on Oracle's muscle one way or another.
If you're looking for a solution with index access replacing that full scan then you'll have to make compromises somewhere else ... like extra space for a fast refresh on commit MV, or a text index ... plus the re-shaping of your transaction (you'll have to commit to see your modifications in those additional structures).
In my own testing, for a 125K table, Tom's solution clocked some 0.65 seconds ... a MV-based solution clocked some 0.2 seconds. Is that worth it?
February 20, 2008 - 9pm Central time zone
Reviewer: A reader