Skip to Main Content
  • Questions
  • Fuzzy Name Match Stored Procedure Optimization

Breadcrumb

May 4th

Question and Answer

Chris Saxon

Thanks for the question, Richard.

Asked: July 07, 2016 - 8:46 am UTC

Last updated: July 08, 2016 - 3:35 am UTC

Version: 11g

Viewed 1000+ times

You Asked

Hello,

Have written PL/SQL stored proc 'FuzzyNameMatch' that interrogates first, middle, last names from a single column in two distinct tables, ie source and compare columns. The algo parses shorter strings through longer and increments counter variable when applicable match conditions are satisfied. User input params are desired % of match and algo type. With this stated, I ran stored proc successfully against a 60 X 30 matrix of names in 1.2 seconds. Next tried 3K X 70K matix and at the time of this posting, my procedure is still running 1.5 hours deep. This will not satisfy my colleagues and looking for advice/best practice for either splitting source/compare name tables into multiple tables or triggering some kind of iterative routine that spreads procedures over multiple calls. Looked through this thread from 2003 and got the sense that it was not same kind of issue at hand: 'Spawn Jobs from a Procedure that run in parallel'

Many thanks for your tutelage.

RO

and Chris said...

Sounds like what you need is Oracle Text!

If you create a text index on the name, you can use it to do fuzzy searching. For example:

create table t (
  username varchar2(20)
);

insert into t values ('Chris R Saxon');
insert into t values ('Christopher J Sexton');
insert into t values ('Christina Sascon');
insert into t values ('Arnold J Rimmer');
insert into t values ('Dave Lister');
commit;

create index text_i on t ( username )
indextype is ctxsys.context
order by username 
parameters ('sync (on commit)')
/

select score(1), t.* from t
where  contains(username, 'fuzzy(Chris, 40, 100, w)', 1 ) > 0
order  by score(1) desc;

  SCORE(1) USERNAME           
---------- --------------------
        42 Chris R Saxon       
        10 Christina Sascon    
         4 Christopher J Sexton

select score(1), t.* from t
where  contains(username, 'fuzzy(Saxon, 40, 100, w)', 1 ) > 0
order  by score(1) desc;

  SCORE(1) USERNAME           
---------- --------------------
        42 Chris R Saxon       
        24 Christina Sascon    
        19 Christopher J Sexton

select score(1), t.* from t
where  contains(username, 'fuzzy(J, 40, 100, w)', 1 ) > 0
order  by score(1) desc;

  SCORE(1) USERNAME           
---------- --------------------
        42 Arnold J Rimmer     
        42 Christopher J Sexton

Load the table up with data and the database can still find the what you're looking for quickly:

insert into t 
  with rws as (
    select rownum x from dual connect by level <= 100000
  )
  select initcap(dbms_random.string('l', 5) || ' ' || 
         dbms_random.string('u', 1) || ' ' || 
         dbms_random.string('u', 8))
  from rws;
  
commit;

select * from (
select score(1), t.* from t
where  contains(username, 'fuzzy(Chris, 40, 100, w)', 1 ) > 0
order  by score(1) desc
)
where  rownum <= 5;

  SCORE(1) USERNAME           
---------- --------------------
       100 Chris R Saxon       
        71 Chjrs F Qdljlkvx    
        63 Chsti Q Lqjefwpy    
        48 Igqlv Z Chredsiv    
        48 Chmys H Zwywezht    

Elapsed: 00:00:00.279

select * from (
select score(1), t.* from t
where  contains(username, 'fuzzy(Saxon, 40, 100, w)', 1 ) > 0
order  by score(1) desc
)
where  rownum <= 5;

  SCORE(1) USERNAME           
---------- --------------------
       100 Chris R Saxon       
        90 Soxon B Qhwsdkpd    
        83 Sabon J Yaiauupa    
        75 Christina Sascon    
        71 Saojn B Mkbqkibc    

Elapsed: 00:00:00.274

select * from (
select score(1), t.* from t
where  contains(username, 'fuzzy(J, 40, 100, w)', 1 ) > 0
order  by score(1) desc
)
where  rownum <= 5;

  SCORE(1) USERNAME           
---------- --------------------
        72 Christopher J Sexton
        72 Arnold J Rimmer     
        72 Uvkxc J Ghcltdnq    
        72 Yrwtn J Yytzfzuh    
        72 Dnvjb J Clpvgpxu    

Elapsed: 00:00:00.261


For more details on how fuzzy searching works, see:

https://docs.oracle.com/cd/E11882_01/text.112/e24436/cqoper.htm#CCREF0307

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Fuzzy name match optimization

Richard Oths, July 07, 2016 - 6:25 pm UTC

Thank you kindly. Will employ this technique with fuzzy operator.
Chris Saxon
July 08, 2016 - 3:35 am UTC

Glad we could help

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library