Skip to Main Content
  • Questions
  • Fuzzy Name Match Stored Procedure Optimization

Breadcrumb

more

Connor and Chris don't just spend all day on AskTOM. You can also catch regular content via Connor's blog and Chris's blog. Or if video is more your thing, check out Connor's latest video and Chris's latest video from their Youtube channels. And of course, keep up to date with AskTOM via the official twitter account.

Question and Answer

Chris Saxon

Thanks for the question, Richard.

Asked: July 07, 2016 - 8:46 am UTC

Answered by: Chris Saxon - Last updated: July 08, 2016 - 3:35 am UTC

Category: Developer - Version: 11g

Viewed 1000+ times

Whilst you are here, check out some content from the AskTom team: Those pesky LONG columns

You Asked

Hello,

Have written PL/SQL stored proc 'FuzzyNameMatch' that interrogates first, middle, last names from a single column in two distinct tables, ie source and compare columns. The algo parses shorter strings through longer and increments counter variable when applicable match conditions are satisfied. User input params are desired % of match and algo type. With this stated, I ran stored proc successfully against a 60 X 30 matrix of names in 1.2 seconds. Next tried 3K X 70K matix and at the time of this posting, my procedure is still running 1.5 hours deep. This will not satisfy my colleagues and looking for advice/best practice for either splitting source/compare name tables into multiple tables or triggering some kind of iterative routine that spreads procedures over multiple calls. Looked through this thread from 2003 and got the sense that it was not same kind of issue at hand: 'Spawn Jobs from a Procedure that run in parallel'

Many thanks for your tutelage.

RO

and we said...

Sounds like what you need is Oracle Text!

If you create a text index on the name, you can use it to do fuzzy searching. For example:

create table t (
  username varchar2(20)
);

insert into t values ('Chris R Saxon');
insert into t values ('Christopher J Sexton');
insert into t values ('Christina Sascon');
insert into t values ('Arnold J Rimmer');
insert into t values ('Dave Lister');
commit;

create index text_i on t ( username )
indextype is ctxsys.context
order by username 
parameters ('sync (on commit)')
/

select score(1), t.* from t
where  contains(username, 'fuzzy(Chris, 40, 100, w)', 1 ) > 0
order  by score(1) desc;

  SCORE(1) USERNAME           
---------- --------------------
        42 Chris R Saxon       
        10 Christina Sascon    
         4 Christopher J Sexton

select score(1), t.* from t
where  contains(username, 'fuzzy(Saxon, 40, 100, w)', 1 ) > 0
order  by score(1) desc;

  SCORE(1) USERNAME           
---------- --------------------
        42 Chris R Saxon       
        24 Christina Sascon    
        19 Christopher J Sexton

select score(1), t.* from t
where  contains(username, 'fuzzy(J, 40, 100, w)', 1 ) > 0
order  by score(1) desc;

  SCORE(1) USERNAME           
---------- --------------------
        42 Arnold J Rimmer     
        42 Christopher J Sexton

Load the table up with data and the database can still find the what you're looking for quickly:

insert into t 
  with rws as (
    select rownum x from dual connect by level <= 100000
  )
  select initcap(dbms_random.string('l', 5) || ' ' || 
         dbms_random.string('u', 1) || ' ' || 
         dbms_random.string('u', 8))
  from rws;
  
commit;

select * from (
select score(1), t.* from t
where  contains(username, 'fuzzy(Chris, 40, 100, w)', 1 ) > 0
order  by score(1) desc
)
where  rownum <= 5;

  SCORE(1) USERNAME           
---------- --------------------
       100 Chris R Saxon       
        71 Chjrs F Qdljlkvx    
        63 Chsti Q Lqjefwpy    
        48 Igqlv Z Chredsiv    
        48 Chmys H Zwywezht    

Elapsed: 00:00:00.279

select * from (
select score(1), t.* from t
where  contains(username, 'fuzzy(Saxon, 40, 100, w)', 1 ) > 0
order  by score(1) desc
)
where  rownum <= 5;

  SCORE(1) USERNAME           
---------- --------------------
       100 Chris R Saxon       
        90 Soxon B Qhwsdkpd    
        83 Sabon J Yaiauupa    
        75 Christina Sascon    
        71 Saojn B Mkbqkibc    

Elapsed: 00:00:00.274

select * from (
select score(1), t.* from t
where  contains(username, 'fuzzy(J, 40, 100, w)', 1 ) > 0
order  by score(1) desc
)
where  rownum <= 5;

  SCORE(1) USERNAME           
---------- --------------------
        72 Christopher J Sexton
        72 Arnold J Rimmer     
        72 Uvkxc J Ghcltdnq    
        72 Yrwtn J Yytzfzuh    
        72 Dnvjb J Clpvgpxu    

Elapsed: 00:00:00.261


For more details on how fuzzy searching works, see:

https://docs.oracle.com/cd/E11882_01/text.112/e24436/cqoper.htm#CCREF0307

and you rated our response

  (1 rating)

Is this answer out of date? If it is, please let us know via a Review

Reviews

Fuzzy name match optimization

July 07, 2016 - 6:25 pm UTC

Reviewer: Richard Oths from Lafayette, CO

Thank you kindly. Will employ this technique with fuzzy operator.
Chris Saxon

Followup  

July 08, 2016 - 3:35 am UTC

Glad we could help

More to Explore

DBMS_RANDOM

More on PL/SQL routine DBMS_RANDOM here