Skip to Main Content
  • Questions
  • Query for exact matches and next best matches

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Arun.

Asked: August 26, 2004 - 12:16 pm UTC

Last updated: August 02, 2013 - 4:59 pm UTC

Version: 9.2.0.1.0

Viewed 10K+ times! This question is

You Asked

Tom,

I'm running a report that needs to match records from an old system with a new one. However, I'm not guaranteed an exact match for each record in the old system. In the case where an exact match is not found, I am to get the "next best" match where it matches on two given fields. Otherwise, no match is found. Below is an example given in a simplified fashion:

amathur> create table old_table(old_meas_id varchar2(20),
2 field1 number(11),
3 field2 number(11),
4 field3 number(11));

Table created.

amathur> create table new_table(new_meas_id varchar2(20),
2 field1 number(11),
3 field2 number(11),
4 field3 number(11));

Table created.

amathur> insert into old_table values ('A_OLD',1,2,3);

1 row created.

amathur> insert into old_table values ('B_OLD',4,5,6);

1 row created.

amathur> insert into old_table values ('C_OLD',7,8,9);


amathur> insert into new_table values ('A_NEW',1,2,3);

1 row created.

amathur> insert into new_table values ('B_NEW',4,5,6);

1 row created.

amathur> insert into new_table values ('C_NEW',7,8,10);

1 row created.

amathur> commit;


amathur> select * from old_table;

OLD_MEAS_ID FIELD1 FIELD2 FIELD3
-------------------- ---------- ---------- ----------
A_OLD 1 2 3
B_OLD 4 5 6
C_OLD 7 8 9

amathur> select * from new_table;

NEW_MEAS_ID FIELD1 FIELD2 FIELD3
-------------------- ---------- ---------- ----------
A_NEW 1 2 3
B_NEW 4 5 6
C_NEW 7 8 10

If an exact match is found (field1,field2,and field3 match), report it. Let's say the "next best" match is decided to be when field1 and field2 match, but not field3. Report the next best match as well.

So, based on the given data, my report would exactly match A_OLD with A_NEW, likewise B_OLD and B_NEW. C_OLD doesn't exactly match any records from new_table, but it does find the next best match since field1 and field2 matches:

amathur> l
1 select old.old_meas_id,new.new_meas_id
2 from old_table old,new_table new
3 where old.field1=new.field1
4 and old.field2=new.field2
5 and old.field3=new.field3
6 or
7 (old.field1=new.field1
8 and old.field2=new.field2
9 and (select count(*) from new_table
10 where field1=old.field1
11 and field2=old.field2
12 and field3=old.field3)=0
13* )
amathur> /

OLD_MEAS_ID NEW_MEAS_ID
-------------------- ------------------
C_OLD C_NEW
A_OLD A_NEW
B_OLD B_NEW


In my case here, the "old_table" is really 2-3G with several more columns, as with the "new_table". I queried these tables using the same approach as above. After two days, it was halfway through fetching the rows. The TKPROF trace shows me lots of I/O waits, temp work, etc.., so I can see why this is happening. But, I was wondering if you had any thoughts regarding my select statement?

Thanks Tom, as always.

Arun






and Tom said...

that correlated subquery could be killing you and an "or'ed join" like that is really expensive.

Assuming index on old_table(field1,field2) -- the following should do much better -- looking for a nice big full scan of new and old, hash joined together and run the scalar subquery only for "no matches" using the index.

ops$tkyte@ORA9IR2> select nt.new_meas_id,
2 nvl( ot.old_meas_id,
3 (select old_meas_id
4 from old_table
5 where old_table.field1 = nt.field1
6 and old_table.field2 = nt.field2
7 and rownum = 1 ) ) old_meas_id
8 from new_table nt left join old_table ot on
9 ( nt.field1 = ot.field1
10 and
11 nt.field2 = ot.field2
12 and nt.field3 = ot.field3 );

NEW_MEAS_ID OLD_MEAS_ID
-------------------- --------------------
A_NEW A_OLD
B_NEW B_OLD
C_NEW C_OLD

Or, maybe:

ops$tkyte@ORA9IR2> insert into old_table values ('XXXXX',1,2,44);

1 row created.


ops$tkyte@ORA9IR2> select *
2 from (
3 select nt.new_meas_id, ot.old_meas_id,
4 count(*) over (partition by nt.field1, nt.field2) cnt1,
5 count(decode(nt.field3,ot.field3,1)) over (partition by nt.field1, nt.field2) cnt2,
6 nt.field3 nt_f3,
7 ot.field3 ot_f3
8 from new_table nt left join old_table ot on
9 ( nt.field1 = ot.field1
10 and
11 nt.field2 = ot.field2 )
12 )
13 where cnt1 = 1
14 or (cnt1 > 1 and ((cnt2 > 0 and nt_f3 = ot_f3) or (cnt2 = 0)))
15 /

NEW_MEAS_ID OLD_MEAS_ID CNT1 CNT2 NT_F3 OT_F3
-------------------- -------------------- ---------- ---------- ---------- ----------
A_NEW A_OLD 2 1 3 3
B_NEW B_OLD 1 1 6 6
C_NEW C_OLD 1 0 10 9

here, we join by f1 and f2 -- and....

a) count how many rows for that f1/f2 pair. if that cnt = 1, we know we want that row
b) if that cnt > 1 (we got some partials matches) we keep the one where f3 = f3 if cnt2 is greater than 1 (meaning there was an exact match) -- or keep all of the partials if cnt2 = 0 (meaning many partial matches but no exact)



Rating

  (24 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

nvl?

A reader, August 26, 2004 - 2:41 pm UTC

For the first method, use nvl to short circuit or decode?

Tom Kyte
August 26, 2004 - 3:31 pm UTC

excellent point

use

decode( c, null, (scalar subquery), c )

intead of NVL -- or case

case when c is null then (scalar subquery) else c end


nvl does not short circut, it would run the scalar subquery over and over.

distance

Alberto Dell'Era, August 26, 2004 - 7:08 pm UTC

What about this variant of your 2nd analytical approach:

SQL> insert into new_table values ('NEW'  ,99,99,99);
SQL> insert into old_table values ('XXXXX',1 ,2 ,44);

SQL> l
  1  select y.*
  2    from (
  3  select x.*, min(distance) over (partition by nt_f1, nt_f2) min_dist
  4    from (
  5  select nt.new_meas_id, ot.old_meas_id,
  6         nt.field1 nt_f1, nt.field2 nt_f2, nt.field3 nt_f3,
  7         ot.field1 ot_f1, ot.field2 ot_f2, ot.field3 ot_f3,
  8         abs (nt.field3-ot.field3) distance
  9    from new_table nt left join old_table ot on
 10           ( nt.field1 = ot.field1
 11             and
 12             nt.field2 = ot.field2
 13           )
 14         ) x
 15         ) y
 16  where (min_dist = 0 and distance = 0) -- exact match
 17     or min_dist is null -- no match
 18*    or min_dist > 0 -- partial match
SQL> /

NEW_M OLD_M NT_F1 NT_F2 NT_F3 OT_F1 OT_F2 OT_F3 DISTANCE MIN_DIST
----- ----- ----- ----- ----- ----- ----- ----- -------- --------
A_NEW A_OLD     1     2     3     1     2     3        0        0
B_NEW B_OLD     4     5     6     4     5     6        0        0
C_NEW C_OLD     7     8    10     7     8     9        1        1
NEW            99    99    99

The only requirements for "distance" are
1) return 0 for exact match
2) return > 0 for partial match
2) return null if no match
so, by changing just one line, we could handle non-numerical fields, more complex partial-matching rules, and so on.

It could easily answer a requirement such as "give me only the three best matches" (dense_rank ..) also. 

Tom Kyte
August 26, 2004 - 7:18 pm UTC

less procedural, more set based

we have a commercial over here for beer, goes like this:

"tastes great, less filling"

that describes this approach :) i like it, didn't fully "regression test it", but the concept looks sound.

Why couldn't it be simpler

Tony, August 26, 2004 - 7:55 pm UTC

Since you don't care about whether field3 is matched or not, (of course if field3 matches, it's the best. if not, you're looking for the next best when field1 and field2 are both matched). So simply querying on feild1 and feild2 will solve your problem. The query result won't tell you which is the best match and which is the next best match anyway.

select old.old_meas_id,new.new_meas_id
from old_table old,new_table new
where old.field1=new.field1
and old.field2=new.field2

Tom Kyte
August 26, 2004 - 8:30 pm UTC

because it doesn't give the "best match"?

i thought that was part of the problem, to find the best match.

With another column to Tony's query

Anindya, August 27, 2004 - 7:38 am UTC

select old.old_meas_id,new.new_meas_id,
decode(old.field3,new.field3,'BEST','NEXT BEST') MATCH
from old_table old,new_table new
where old.field1=new.field1
and old.field2=new.field2


typical ?!?!

Gabe, August 27, 2004 - 2:14 pm UTC

Nah …

This is as good an example of apparently trivial but totally incomplete requirement as one can get … the test case is totally flaky … there just is a myriad of interpretations and assumptions in here.

1. Where does it say (field1, filed2, field3) is unique?

2. Where does it say all these columns are mandatory? How is _matching_ defined when nulls are involved?

3. My own interpretation is that Arun wants to _drive_ the matching from the OLD side (his own query certainly does this) … the solutions I see use NEW as the reference point.

4. There is no such thing as “next best” match … there can be “an arbitrary next best” match (see Tom’s first implementation) … there can be “many next best” matches (see Tom’s second implementation or the ones using inner joins)

5. As a corollary to #4, the two solutions from Tom cannot possibly be equivalent simply because one is deterministic (the one with analytics) and the other is not (the one with the scalar subquery and “rownum = 1” condition).

6. A solution like one from Tony (inner join on filed1 and field2) cannot possibly be correct since the query is _symmetrical_ (interchange OLD with NEW and you have the same query) … when, in fact, the result of the matching is very much subject to the point of reference (matching OLD records against NEW records rarely is the same to matching NEW records against OLD records); more, the inner join on just field1 and filed2 will bring back both the “exact” and “all next best” matches … at least on this detail, the requirement is clear: bring the “next best” only if an “exact” match is not found.

Leaving aside the nulls and assuming (field1, filed2, field3) is unique … a _reasonable_ data set will exhibit how _weak_ the requirement is and how each _solution_ provides a different answer:

insert into old_table values ('o123',1,2,3);
insert into old_table values ('o784',7,8,4);
insert into old_table values ('o785',7,8,5);
insert into old_table values ('o222',2,2,2);
insert into old_table values ('o224',2,2,4);

insert into new_table values ('n123',1,2,3);
insert into new_table values ('n789',7,8,9);
insert into new_table values ('n111',1,1,1);
insert into new_table values ('n222',2,2,2);
insert into new_table values ('n223',2,2,3);

Arun’s query adjusted to drive from NEW (in order to match what the others did):

flip@flop> select nt.new_meas_id,ot.old_meas_id
2 from old_table ot,new_table nt
3 where nt.field1=ot.field1
4 and nt.field2=ot.field2
5 and nt.field3=ot.field3 or ( nt.field1=ot.field1
6 and nt.field2=ot.field2
7 and (select count(*) from old_table
8 where field1=nt.field1
9 and field2=nt.field2
10 and field3=nt.field3)=0
11 )
12 order by nt.new_meas_id, ot.old_meas_id
13 ;

NEW_MEAS_ID OLD_MEAS_ID
-------------------- --------------------
n123 o123
n222 o222
n223 o222
n223 o224
n789 o784
n789 o785

Tom’s… using scalar the subquery:

NEW_MEAS_ID OLD_MEAS_ID
-------------------- --------------------
n111
n123 o123
n222 o222
n223 o222
n789 o784

Tom’s … using analytics:

NEW_MEAS_ID OLD_MEAS_ID
-------------------- --------------------
n111
n123 o123
n222 o222
n789 o784
n789 o785

Tony+Anyndia … inner join on field1 and field2 only:

NEW_MEAS_ID OLD_MEAS_ID MATCH
-------------------- -------------------- ---------------
n123 o123 BEST
n222 o222 BEST
n222 o224 NEXT BEST
n223 o222 NEXT BEST
n223 o224 NEXT BEST
n789 o784 NEXT BEST
n789 o785 NEXT BEST

<quote>that correlated subquery could be killing you and an "or'ed join" like that is really expensive.</quote>
Could be … what about (or in addition to that) an _explosion_ of data? … two 2-3GB tables with lots of matching (not exactly an “exception” report, is it?) and a query which apparently does a cartesian product on the “next best” matched records … maybe the sheer volume of generated data has something to do with the two days elapsed time!

In the end is all about having a well-rounded problem … systems crumble like houses of cards exactly because so many are built on incomplete requirements conducive to conflicting interpretations.

Thanks.


Tom Kyte
August 27, 2004 - 2:48 pm UTC

:)

you should see many of the ones i don't publish.

hmm .. post too long !?

Gabe, August 27, 2004 - 2:47 pm UTC

NB. My previous post was as short as I could muster to "well-define" my point :-)

I needed that

Arun Mathur, August 27, 2004 - 4:10 pm UTC

Very good point Gabe. My cue to get better requirements. Tom, as always, I enjoyed reading your followup.

Take care,
Arun

not my intention ...

Gabe, August 27, 2004 - 4:49 pm UTC

Hi Arun,

I truly hope I didn't offend you ... my critique was mostly a reflection to what I see day to day in my workplace. I guess I jumped in simply because all the techniques being proposed did not have any chance of being correct ... the _problem_ just wasn't all there. I hope you got enough technical hints from Tom and the others to help you solve your problem.

Regards.

No worries Gabe

Arun Mathur, August 27, 2004 - 11:33 pm UTC

Thanks for the post. Hope all is well.

Arun


Very useful discussion

A reader, August 30, 2004 - 10:59 pm UTC

Thanks for the discussion and to Gabe for raising some of the underlying requirements deficiencies.

Too often us tech geeks jump to come up with a neat solution before stepping back to see if the requirements are complete and more importantly, make sense!

Same problem but different?

Tony, February 17, 2005 - 8:48 am UTC

Tom,

I have a similar problem that I can't seem to crack in a performance friendly way :)

Anyway, this is what I basically got (table along with some sample data):

CREATE TABLE DESCRIPTIONS
(
MASTER_CODE VARCHAR2(8 BYTE) NOT NULL,
DIV_CODE VARCHAR2(8 BYTE) NOT NULL,
PART_CODE VARCHAR2(64 BYTE) NOT NULL,
SUB_CODE VARCHAR2(8 BYTE) NOT NULL,
DESCRIPTION NVARCHAR2(256) NOT NULL
);

ALTER TABLE DESCRIPTIONS ADD (
PRIMARY KEY (MASTER_CODE, DIV_CODE, PART_CODE, SUB_CODE ));

insert into descriptions values
('INPROD','A100','B100','C100','SMALL CAR WHEEL');
insert into descriptions values
('INPROD','A100','B100','C110','BIG CAR WHEEL');
insert into descriptions values
('INPROD','A100','B100','ANY','CAR WHEEL');
insert into descriptions values
('INPROD','A100','ANY','ANY','CAR PARTS');
insert into descriptions values
('INPROD','ANY','ANY','ANY','VEHICLE PARTS');
insert into descriptions values
('INPROD','ANY','ANY','C110','BIG WHEEL');

So I have the master_code (always) and some of the codes for div, part and sub when I make the query for receiving the description. DIV_CODE, PART_CODE, SUB_CODE are unique for a master_code. The 'ANY' value in a code is a wildcard for matching.

My problem is:
1. I have only the sub code = C100 or C120 and need to select the best match.
2. Or I have Div_Code = A100 or A110 and need to select the best match.

Do you have any ideas on how to do this best?

Thanks,
/Tony


Tom Kyte
February 17, 2005 - 10:07 am UTC

I've no idea what your concept of "best match" is though...

Same problem but different?

Tony, February 17, 2005 - 8:52 am UTC

In my example above I'll always have:
... WHERE master_code = 'INPROD'
(or actually ... WHERE master_code = :a1 :-)

/Tony

Tony, February 17, 2005 - 10:20 am UTC

Ok, say that I have these DIV_CODE PART_CODE SUB_CODE
in the DB (among other records):
ANY ANY ANY
ANY B100 C110
ANY ANY C110

And then I'll ask for the description for the values
PART_CODE: B100 and SUB_CODE: C110
Then I'd at first glance get hits for all three records - but the one that I'd want is "ANY B100 C110" since this one has the best match - but if the "ANY B100 C110" record was "A100 B100 C110" instead, I'd want to select the "ANY ANY C110" records - and if the "ANY ANY C110" record also was different, e.g: "ANY ANY C130" I'd want to select the "ANY ANY ANY" record, and if also the "ANY ANY ANY" record did not exist I'd want to select nothing.

Thanks ;-)
/Tony


Tom Kyte
February 17, 2005 - 1:42 pm UTC

and what if there were a ANY B100 ANY or are the ANY's always on the leading edge.

if leading edge, I suppose:

ops$tkyte@ORA9IR2> variable pc varchar2(20)
ops$tkyte@ORA9IR2> variable sc varchar2(20)
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec :pc := 'B100'; :sc := 'C110'
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create or replace view v
  2  as
  3  select div_code, part_code, sub_code from descriptions;
 
View created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from v;
 
DIV_CODE PART_CODE  SUB_CODE
-------- ---------- --------
A100     B100       C100
A100     B100       C110
A100     B100       ANY
A100     ANY        ANY
ANY      ANY        ANY
ANY      ANY        C110
 
6 rows selected.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select *
  2   from (
  3  select 1 oc, v.* from v where div_code = 'ANY' and part_code = :pc and sub_code = :sc
  4  union all
  5  select 2 oc, v.* from v where div_code = 'ANY' and part_code = 'ANY' and sub_code = :sc
  6  union all
  7  select 3 oc, v.* from v where div_code = 'ANY' and part_code = 'ANY' and sub_code = 'ANY'
  8  order by oc
  9        )
 10   where rownum =1;
 
        OC DIV_CODE PART_CODE  SUB_CODE
---------- -------- ---------- --------
         2 ANY      ANY        C110
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> update v set div_code = 'ANY' where part_code = :pc and sub_code = :sc;
 
1 row updated.
 
ops$tkyte@ORA9IR2> select *
  2   from (
  3  select 1 oc, v.* from v where div_code = 'ANY' and part_code = :pc and sub_code = :sc
  4  union all
  5  select 2 oc, v.* from v where div_code = 'ANY' and part_code = 'ANY' and sub_code = :sc
  6  union all
  7  select 3 oc, v.* from v where div_code = 'ANY' and part_code = 'ANY' and sub_code = 'ANY'
  8  order by oc
  9        )
 10   where rownum =1;
 
        OC DIV_CODE PART_CODE  SUB_CODE
---------- -------- ---------- --------
         1 ANY      B100       C110
 

How about ...

Venkat, February 17, 2005 - 2:21 pm UTC

select * from
(select d.*, :div_code, :part_code, :sub_code
from descriptions d
where master_code = 'INPROD'
and (part_code = :part_code or part_code = 'ANY')
and (sub_code = :sub_code or sub_code = 'ANY')
and (div_code = :div_code or div_code = 'ANY')
order by (case when div_code = :div_code then 1 else 0 end +
case when part_code = :part_code then 1 else 0 end +
case when sub_code = :sub_code then 1 else 0 end) desc
) where rownum = 1

The results are as follows with varying inputs (the last 3 columns are the inputs)

INPROD A100 B100 C100 SMALL CAR WHEEL A100 B100 C100
INPROD A100 B100 C110 BIG CAR WHEEL A100 B100 C110
INPROD ANY ANY C110 BIG WHEEL A110 B110 C110
INPROD ANY ANY C110 BIG WHEEL B100 C110
INPROD ANY ANY C110 BIG WHEEL C110
INPROD A100 B100 ANY CAR WHEEL A100 B100
INPROD ANY ANY ANY VEHICLE PARTS C100
INPROD A100 B100 ANY CAR WHEEL A100 B100 C120

Tom Kyte
February 17, 2005 - 2:44 pm UTC

compare the plans......

I liked the union all for its simplicity and absolute ability to use an index on the three columns readily.

Plans

Venkat, February 17, 2005 - 7:05 pm UTC

Tom, I added about 120000 rows to the table. The first plan is for my query and the second for yours. What is the difference (or will there be a difference) between the 3 index scans + table access by index rowid (in the second plan) Vs the inlist iterator in the first plan?

Thank You!

Operation Object Name Rows Bytes Cost

SELECT STATEMENT Hint=CHOOSE 1 4
COUNT STOPKEY
VIEW 1 70 4
SORT ORDER BY STOPKEY 1 11 4
INLIST ITERATOR
TABLE ACCESS BY INDEX ROWID DESCRIPTIONS 1 11 2
INDEX RANGE SCAN SYS_C00101396 1 1


Operation Object Name Rows Bytes Cost

SELECT STATEMENT Hint=CHOOSE 1 6
COUNT STOPKEY
VIEW 3 201 6
SORT ORDER BY STOPKEY 3 33 6
UNION-ALL
TABLE ACCESS BY INDEX ROWID DESCRIPTIONS 1 11 2
INDEX SKIP SCAN SYS_C00101396 1 3
TABLE ACCESS BY INDEX ROWID DESCRIPTIONS 1 11 2
INDEX SKIP SCAN SYS_C00101396 1 3
TABLE ACCESS BY INDEX ROWID DESCRIPTIONS 1 11 2
INDEX SKIP SCAN SYS_C00101396 1 3



Tom Kyte
February 17, 2005 - 7:41 pm UTC

skip scan? hmm, what is the index on here?

The primary key index?

Vladimir Andreev, February 18, 2005 - 4:18 am UTC

This should be the index in question, no?

<quote>
ALTER TABLE DESCRIPTIONS ADD (
PRIMARY KEY (MASTER_CODE, DIV_CODE, PART_CODE, SUB_CODE ));
</quote>

You are skip-scanning because you ignored this:

<quote>
In my example above I'll always have:
... WHERE master_code = 'INPROD'
(or actually ... WHERE master_code = :a1 :-)
</quote>

Hope that helps.

Flado

Tom Kyte
February 18, 2005 - 8:34 am UTC

i just took the data I was given to work with in the last example....

you'll need to look at the indexing and use the most appropriate scheme.

but if you pushed that predicate right down into my query, it should be three index scans, no skip scans.

response of February 17, 2005

A reader, May 05, 2009 - 8:15 am UTC

Hi Tom

Your solution posted on February 17, 2005 was extremely useful for me and using that I was able to meet one of my deadines for deliverables .the solution you have given , i probably could never have thought of ( I did write a procedureal code which was running too long) .

Thank you

Matching of columns

anupam pandey, August 19, 2009 - 5:28 am UTC

Hi Tom ,
Suppose I have following four tables
create table T1
(a number,
b number,
c number,
d number
) ;

create table T2
(a number,
b number,
c number,
d number,
e number
) ;

create table T3
(a number,
b number,
c number
) ;


create table T4
(b number,
c number,
e number
);


Now out of all these tables I want those tables which contain column ('B','C','E')
So I executed following query :-

select distinct table_name
from user_tab_columns
where column_name in ('B','C','E')

O/P-->
TABLE_NAME
T2
T1
T4
T3

It gave all four tables as output which in not true .
Is there any way by which I can get this information .

Thanks And Regards,
Anupam Pandey
Tom Kyte
August 24, 2009 - 4:58 pm UTC

select table_name
from user_tab_columns
where column_name in ('B','C','E')
group by table_name
having count(*) = 3;

Response to my above comment

anupam pandey, August 19, 2009 - 8:07 am UTC

select table_name
from user_tables a
where not exists
(
(select 'B'
from dual
union
select 'C'
from dual
union
select 'E'
from dual)
minus
(
select column_name
from all_tab_columns
where table_name = a.table_name
)
)

Is this a good solution or there is something else more optimized for this purpose.
Tom Kyte
August 24, 2009 - 4:59 pm UTC

see above...

Best Match for Numbers

Snehasish Das, October 21, 2011 - 5:59 am UTC

Hi Tom,

Hope you are doing well.

Suppose we have a table T_LOOKUP with data as below.

NUMBER_BLOCK START_DATE END_DATE A1141_CODE
------------------------------------------------------------
09903XXXXXX 10/10/2011 10/31/2011 X1
0990348XXXX 10/10/2011 10/31/2011 X2
0990347XXXX 10/10/2011 10/31/2011 X3
09903485XXX 10/10/2011 10/31/2011 X4


As you can see number_block column has some numbers along with XXXX which can be anything. Now say we have a number 09903485478 (which is a column value for a bigger table) and we need to match this with lookup and get the value X4 as its the best possible match here.

Can you please suggest anything here.

Best regards,
Snehasish Das.
Tom Kyte
October 21, 2011 - 8:31 pm UTC

what if nothing in the table starts with 0?

what if in the table all you have is

0123xxxxx
0345xxxxx

which one matches "best"?

Number Matching

Sneshasish Das, October 25, 2011 - 1:45 am UTC

Hi Tom,

Apologies for late reply.

There will be always some matching pattern (as agreed with data model), our problem is to find the best match.
One way ahead was to replace the X with 0 and then get a absolute value of the substraction of the Number and the Pattern (X replaced with 0) and then take the Minimum value.
But this way we have to join with each row and get a cartesian result before we calculate the minimum.

Can you suggest any other way out.

Thanks and Regards,
Snehasish Das.
Tom Kyte
October 25, 2011 - 7:02 am UTC

here is an approach -
ops$tkyte%ORA11GR2> column str format a15
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> variable str varchar2(30)
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> exec :str := '09903485478';

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> create table t_lookup
  2  as
  3  select '09903XXXXXX'  number_block,'X1' a1141_code from dual union all
  4  select '0990348XXXX'  ,'X2' a1141_code from dual union all
  5  select '0990347XXXX'  ,'X3' a1141_code from dual union all
  6  select '09903485XXX'  ,'X4' a1141_code from dual
  7  /

Table created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> alter table t_lookup add constraint t_pk primary key(number_block);

Table altered.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> exec dbms_stats.set_table_stats( user, 'T_LOOKUP', numrows => 10000, numblks => 100 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> set serveroutput off
ops$tkyte%ORA11GR2> with data
  2  as
  3  (select level l, substr( :str, 1, level ) || rpad( 'X', length(:str)-level, 'X') str
  4     from dual
  5  connect by level <= length(:str) )
  6  select *
  7    from (
  8  select t_lookup.* , data.*, :str
  9    from t_lookup, data
 10   where data.str = t_lookup.number_block
 11   order by l desc
 12         )
 13   where rownum = 1;

NUMBER_BLOC A1          L STR             :STR
----------- -- ---------- --------------- --------------------------------
09903485XXX X4          8 09903485XXX     09903485478

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  6an8q6cfw9bv0, child number 0
-------------------------------------
with data as (select level l, substr( :str, 1, level ) || rpad( 'X',
length(:str)-level, 'X') str    from dual connect by level <=
length(:str) ) select *   from ( select t_lookup.* , data.*, :str
from t_lookup, data  where data.str = t_lookup.number_block  order by l
desc        )  where rownum = 1

Plan hash value: 1799846801

--------------------------------------------------------------------------------
| Id  | Operation                          | Name     | Rows  | Bytes | Cost (%C
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |          |       |       |     4 (1
|*  1 |  COUNT STOPKEY                     |          |       |       |
|   2 |   VIEW                             |          |   100 |   200K|     4  (
|*  3 |    SORT ORDER BY STOPKEY           |          |   100 |   198K|     4  (
|   4 |     NESTED LOOPS                   |          |       |       |
|   5 |      NESTED LOOPS                  |          |   100 |   198K|     3
|   6 |       VIEW                         |          |     1 |  2015 |     2
|   7 |        CONNECT BY WITHOUT FILTERING|          |       |       |
|   8 |         FAST DUAL                  |          |     1 |       |     2
|*  9 |       INDEX UNIQUE SCAN            | T_PK     |     1 |       |     0
|  10 |      TABLE ACCESS BY INDEX ROWID   | T_LOOKUP |   100 |  1700 |     1
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   3 - filter(ROWNUM=1)
   9 - access("DATA"."STR"="T_LOOKUP"."NUMBER_BLOCK")


33 rows selected.



my assumption is my question above wasn't relevant really. I wrote:

what if in the table all you have is

0123xxxxx
0345xxxxx

which one matches "best"?


but now I think - if I was searching for 09999999999 - it wouldn't match EITHER of them - would it?

So, does this do the trick for you?

Thanks

Snehasish Das, November 03, 2011 - 12:32 am UTC

Hi Tom,
Thanks a lot for the idea.

Regards,
Snehasish Das

Snehasish Das, July 24, 2013 - 9:16 am UTC

Hi Tom,

I have a similar query to previous one.

We have a customer address table which has a structure as below.
"NEURO_NEW_ID","NEURO_STREET_NUMBER","NEURO_STREET_NAME","NEURO_STREET_TYPE","NEORO_SUBURB","NEURO_TOWN","NEURO_CITY","NEURO_POSTAL_CODE","NEURO_PROVINCE","NAD_STR_NUMBER","NAD_STR_NAME","NAD_STR_TYPE","NAD_OLD_STREET","NAD_SUBURB","NAD_TOWN","NAD_POSTCODE","NAD_PROVINCE","COMB_MATCH","PERM_MATCH"
"941","3","8th","Avenue","",""," Summerstrand","6001","Eastern Cape","9","KAMDEBO","STREET","","ALGOA PARK","PORT ELIZABETH","6001","EASTERN CAPE","",""
"941","3","8th","Avenue","",""," Summerstrand","6001","Eastern Cape","3","KAREE","STREET","","ALGOA PARK","PORT ELIZABETH","6001","EASTERN CAPE","",""
"941","3","8th","Avenue","",""," Summerstrand","6001","Eastern Cape","6","KAREE","STREET","","ALGOA PARK","PORT ELIZABETH","6001","EASTERN CAPE","",""


We also have a National address database with similar structure.
The customer address may have some of the fields incorrect,null or spelling errors. We need to match the customer address to the National address database to find the best match. Any of the fields can be null and are prone to errors so we can't join on 2-3 fields and then calculate the match.

Can you please guide me in doing this match.

Regards,
Snehasish Das.
Tom Kyte
July 31, 2013 - 6:53 pm UTC

no creates, no inserts, no look

Creates and Insertes.

Snehasish Das, August 02, 2013 - 2:08 pm UTC

Hi Tom,

Apologies,below are the creates and inserts.
create table EAA_NAD
(
ag_nad_id NUMBER,
confidence NUMBER,
type VARCHAR2(255),
str_number VARCHAR2(255),
str_name VARCHAR2(1000),
str_type VARCHAR2(255),
street VARCHAR2(1000),
old_street VARCHAR2(1000),
x NUMBER,
y NUMBER,
ag_sub_id NUMBER,
ag_sub_code VARCHAR2(255),
suburb VARCHAR2(1000),
postcode NUMBER,
ag_sgtn_id NUMBER,
sgtown VARCHAR2(1000),
ag_town_id NUMBER,
town VARCHAR2(1000),
ag_mun_id NUMBER,
mun_code VARCHAR2(255),
s12_name VARCHAR2(1000),
munic VARCHAR2(1000),
ag_prov_id NUMBER,
province VARCHAR2(255),
source VARCHAR2(1000),
updated VARCHAR2(255)
);


create table NEUROCOM_ADDR_VERIFICATION
(
neuro_new_id VARCHAR2(128),
neuro_street_number VARCHAR2(128),
neuro_street_name VARCHAR2(128),
neuro_street_type VARCHAR2(128),
neoro_suburb VARCHAR2(128),
neuro_town VARCHAR2(128),
neuro_city VARCHAR2(128),
neuro_postal_code VARCHAR2(128),
neuro_province VARCHAR2(128),
nad_str_number VARCHAR2(255),
nad_str_name VARCHAR2(1000),
nad_str_type VARCHAR2(255),
nad_old_street VARCHAR2(128),
nad_suburb VARCHAR2(1000),
nad_town VARCHAR2(1000),
nad_postcode NUMBER,
nad_province VARCHAR2(255),
comb_match NUMBER(10,2),
perm_match NUMBER(10,2)
);


Below is the data for the Address in customer table which is NEUROCOM_ADDR_VERIFICATION.
insert into NEUROCOM_ADDR_VERIFICATION (NEURO_NEW_ID, NEURO_STREET_NUMBER, NEURO_STREET_NAME, NEURO_STREET_TYPE, NEORO_SUBURB, NEURO_TOWN, NEURO_CITY, NEURO_POSTAL_CODE, NEURO_PROVINCE, NAD_STR_NUMBER, NAD_STR_NAME, NAD_STR_TYPE, NAD_OLD_STREET, NAD_SUBURB, NAD_TOWN, NAD_POSTCODE, NAD_PROVINCE, COMB_MATCH, PERM_MATCH)
values ('865', '2', 'Osmond', 'Street', 'Wilsonia', null, 'East London', '5247', 'Eastern Cape', '39', 'KINGSTON', 'PLACE', null, 'AMALINDA', 'EAST LONDON', 5247, 'EASTERN CAPE', null, null);

insert into NEUROCOM_ADDR_VERIFICATION (NEURO_NEW_ID, NEURO_STREET_NUMBER, NEURO_STREET_NAME, NEURO_STREET_TYPE, NEORO_SUBURB, NEURO_TOWN, NEURO_CITY, NEURO_POSTAL_CODE, NEURO_PROVINCE, NAD_STR_NUMBER, NAD_STR_NAME, NAD_STR_TYPE, NAD_OLD_STREET, NAD_SUBURB, NAD_TOWN, NAD_POSTCODE, NAD_PROVINCE, COMB_MATCH, PERM_MATCH)
values ('865', '56', null, 'Street', 'Wilsonia', null, 'PORT ELIZABETH', '5247', 'EASTERN CAPE', '4', 'KINGSTON', 'PLACE', null, 'AMALINDA', 'EAST LONDON', 5247, 'EASTERN CAPE', null, null);

insert into NEUROCOM_ADDR_VERIFICATION (NEURO_NEW_ID, NEURO_STREET_NUMBER, NEURO_STREET_NAME, NEURO_STREET_TYPE, NEORO_SUBURB, NEURO_TOWN, NEURO_CITY, NEURO_POSTAL_CODE, NEURO_PROVINCE, NAD_STR_NUMBER, NAD_STR_NAME, NAD_STR_TYPE, NAD_OLD_STREET, NAD_SUBURB, NAD_TOWN, NAD_POSTCODE, NAD_PROVINCE, COMB_MATCH, PERM_MATCH)
values ('865', '13', 'ALUZA', 'Street', null, null, 'PORT ELIZABETH', '5247', null, '14', null, 'ROAD', null, 'AMALINDA', 'EAST LONDON', 5247, 'EASTERN CAPE', null, null);

insert into NEUROCOM_ADDR_VERIFICATION (NEURO_NEW_ID, NEURO_STREET_NUMBER, NEURO_STREET_NAME, NEURO_STREET_TYPE, NEORO_SUBURB, NEURO_TOWN, NEURO_CITY, NEURO_POSTAL_CODE, NEURO_PROVINCE, NAD_STR_NUMBER, NAD_STR_NAME, NAD_STR_TYPE, NAD_OLD_STREET, NAD_SUBURB, NAD_TOWN, NAD_POSTCODE, NAD_PROVINCE, COMB_MATCH, PERM_MATCH)
values ('865', '58', 'AGGREY ROAD', 'Street', 'NEW BRIGHTON 1', null, 'PORT ELIZABETH', '5247', 'EASTERN CAPE', '36', 'AGGREY', 'PLACE', null, 'AMALINDA', 'EAST LONDON', 5247, 'EASTERN CAPE', null, null);

Commit;

Below is sample address from National address database.
insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612201, 1, null, '54', 'AGGREY', 'ROAD', 'AGGREY ROAD', null, 25.589714, -33.900755, 29519, '000263-029519', 'NEW BRIGHTON 1', 6200, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612202, 1, null, '56', 'AGGREY', 'ROAD', 'AGGREY ROAD', null, 25.589778, -33.900807, 29519, '000263-029519', 'NEW BRIGHTON 1', 6200, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612203, 1, null, '58', 'AGGREY', 'ROAD', 'AGGREY ROAD', null, 25.589824, -33.900877, 29519, '000263-029519', 'NEW BRIGHTON 1', 6200, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612204, 1, null, '60', 'AGGREY', 'ROAD', 'AGGREY ROAD', null, 25.589897, -33.900959, 29519, '000263-029519', 'NEW BRIGHTON 1', 6200, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612205, 1, null, '62', 'AGGREY', 'ROAD', 'AGGREY ROAD', null, 25.589952, -33.901046, 29519, '000263-029519', 'NEW BRIGHTON 1', 6200, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612206, 1, null, '64', 'AGGREY', 'ROAD', 'AGGREY ROAD', null, 25.590022, -33.901011, 29519, '000263-029519', 'NEW BRIGHTON 1', 6200, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612207, 1, null, '10610', 'AJI', 'ROAD', 'AJI ROAD', null, 25.598912, -33.887063, 29512, '000263-029512', 'KWAZAKELE 1', 6205, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612208, 1, null, '10611', 'AJI', 'ROAD', 'AJI ROAD', null, 25.599022, -33.887008, 29512, '000263-029512', 'KWAZAKELE 1', 6205, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612209, 1, null, '10612', 'AJI', 'ROAD', 'AJI ROAD', null, 25.599156, -33.886994, 29512, '000263-029512', 'KWAZAKELE 1', 6205, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612210, 1, null, '10613', 'AJI', 'ROAD', 'AJI ROAD', null, 25.599292, -33.886979, 29512, '000263-029512', 'KWAZAKELE 1', 6205, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612211, 1, null, '45251', 'AJI', 'ROAD', 'AJI ROAD', null, 25.599426, -33.886966, 29512, '000263-029512', 'KWAZAKELE 1', 6205, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612212, 1, null, '1', 'ALUZA', 'STREET', 'ALUZA STREET', null, 25.572435, -33.864464, 29504, '000263-029504', 'SOWETO ON SEA', 6201, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612213, 1, null, '10', 'ALUZA', 'STREET', 'ALUZA STREET', null, 25.572629, -33.864433, 29504, '000263-029504', 'SOWETO ON SEA', 6201, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612214, 1, null, '11', 'ALUZA', 'STREET', 'ALUZA STREET', null, 25.572571, -33.86467, 29504, '000263-029504', 'SOWETO ON SEA', 6201, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612215, 1, null, '12', 'ALUZA', 'STREET', 'ALUZA STREET', null, 25.572663, -33.864478, 29504, '000263-029504', 'SOWETO ON SEA', 6201, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612216, 1, null, '13', 'ALUZA', 'STREET', 'ALUZA STREET', null, 25.572611, -33.864715, 29504, '000263-029504', 'SOWETO ON SEA', 6201, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612217, 1, null, '14', 'ALUZA', 'STREET', 'ALUZA STREET', null, 25.572698, -33.864515, 29504, '000263-029504', 'SOWETO ON SEA', 6201, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4612218, 1, null, '15', 'ALUZA', 'STREET', 'ALUZA STREET', null, 25.572649, -33.864777, 29504, '000263-029504', 'SOWETO ON SEA', 6201, 10564, 'IBHAYI', 263, 'PORT ELIZABETH', 227, 'NMA', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY', 'PORT ELIZABETH', 1, 'EASTERN CAPE', 'NELSON MANDELA BAY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);
insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (5742384, 1, null, '3A', 'OSMOND', 'STREET', 'OSMOND STREET', null, 27.82532, -32.986006, 16822, '000175-016822', 'WILSONIA', 5247, 11356, 'EAST LONDON', 175, 'EAST LONDON', 28, 'BUF', 'BUFFALO CITY METROPOLITAN MUNICIPALITY', 'EAST LONDON', 1, 'EASTERN CAPE', 'BUFFALO CITY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4562725, 1, null, '11', 'OSMOND', 'STREET', 'OSMOND STREET', null, 27.824107, -32.97394, 16822, '000175-016822', 'WILSONIA', 5247, 11356, 'EAST LONDON', 175, 'EAST LONDON', 28, 'BUF', 'BUFFALO CITY METROPOLITAN MUNICIPALITY', 'EAST LONDON', 1, 'EASTERN CAPE', 'BUFFALO CITY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4562726, 1, null, '13', 'OSMOND', 'STREET', 'OSMOND STREET', null, 27.826172, -32.983173, 16822, '000175-016822', 'WILSONIA', 5247, 11356, 'EAST LONDON', 175, 'EAST LONDON', 28, 'BUF', 'BUFFALO CITY METROPOLITAN MUNICIPALITY', 'EAST LONDON', 1, 'EASTERN CAPE', 'BUFFALO CITY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4562727, 1, null, '15', 'OSMOND', 'STREET', 'OSMOND STREET', null, 27.825395, -32.984498, 16822, '000175-016822', 'WILSONIA', 5247, 11356, 'EAST LONDON', 175, 'EAST LONDON', 28, 'BUF', 'BUFFALO CITY METROPOLITAN MUNICIPALITY', 'EAST LONDON', 1, 'EASTERN CAPE', 'BUFFALO CITY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4562728, 1, null, '17', 'OSMOND', 'STREET', 'OSMOND STREET', null, 27.824433, -32.985233, 16822, '000175-016822', 'WILSONIA', 5247, 11356, 'EAST LONDON', 175, 'EAST LONDON', 28, 'BUF', 'BUFFALO CITY METROPOLITAN MUNICIPALITY', 'EAST LONDON', 1, 'EASTERN CAPE', 'BUFFALO CITY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4562729, 1, null, '2', 'OSMOND', 'STREET', 'OSMOND STREET', null, 27.824982, -32.987044, 16822, '000175-016822', 'WILSONIA', 5247, 11356, 'EAST LONDON', 175, 'EAST LONDON', 28, 'BUF', 'BUFFALO CITY METROPOLITAN MUNICIPALITY', 'EAST LONDON', 1, 'EASTERN CAPE', 'BUFFALO CITY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4562730, 1, null, '3', 'OSMOND', 'STREET', 'OSMOND STREET', null, 27.824507, -32.985759, 16822, '000175-016822', 'WILSONIA', 5247, 11356, 'EAST LONDON', 175, 'EAST LONDON', 28, 'BUF', 'BUFFALO CITY METROPOLITAN MUNICIPALITY', 'EAST LONDON', 1, 'EASTERN CAPE', 'BUFFALO CITY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4562731, 1, null, '5', 'OSMOND', 'STREET', 'OSMOND STREET', null, 27.827851, -32.980201, 16822, '000175-016822', 'WILSONIA', 5247, 11356, 'EAST LONDON', 175, 'EAST LONDON', 28, 'BUF', 'BUFFALO CITY METROPOLITAN MUNICIPALITY', 'EAST LONDON', 1, 'EASTERN CAPE', 'BUFFALO CITY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

insert into eaa_nad (AG_NAD_ID, CONFIDENCE, TYPE, STR_NUMBER, STR_NAME, STR_TYPE, STREET, OLD_STREET, X, Y, AG_SUB_ID, AG_SUB_CODE, SUBURB, POSTCODE, AG_SGTN_ID, SGTOWN, AG_TOWN_ID, TOWN, AG_MUN_ID, MUN_CODE, S12_NAME, MUNIC, AG_PROV_ID, PROVINCE, SOURCE, UPDATED)
values (4562732, 1, null, '6', 'OSMOND', 'STREET', 'OSMOND STREET', null, 27.825971, -32.977687, 16822, '000175-016822', 'WILSONIA', 5247, 11356, 'EAST LONDON', 175, 'EAST LONDON', 28, 'BUF', 'BUFFALO CITY METROPOLITAN MUNICIPALITY', 'EAST LONDON', 1, 'EASTERN CAPE', 'BUFFALO CITY METROPOLITAN MUNICIPALITY SPATIAL DATA', null);

We are trying to match the 2 tables, but problem is in customer table any of the columns can have null value,special characters, trimmed value. But we want to match the tables to get the best possible addresses with may be match percentage.

Regards,
Snehasish Das
Tom Kyte
August 02, 2013 - 4:59 pm UTC

you wrote:

... We need to match the customer address to the National address database to find the best match. Any of the fields can be null and are prone to errors so we can't join on 2-3 fields and then calculate the match. ..

... but problem is in customer table any of
the columns can have null value,special characters, trimmed value. ...


... But we want
to match the tables to get the best possible addresses with may be match
percentage. ...


Ok, so now - define the logic to get the best possible match percentage. What are your rules?

you haven't even told me what columns are to be compared to what columns?

what "special characters" are or means. and what logic is to be applied to them when comparing (maybe you want to translate all 'special' characters into something not so special - i don't know?)

what trimmed values are - do you mean truncated? trimmed to me means "no trailing blanks" - if that is, just rtrim stuff.

be very very very detailed (you'll find the code writes itself when you have a detailed specification to work from!)


and I'm very sure that for your test case, you only need a few columns in each table - the rest of the columns are just in the way - make the example as small as possible - get rid of anything not relevant to the join.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.