## Question and Answer

## You Asked

Hi TOM,

Is there a way to make my code faster ?

I'm trying to find the longest common sub-string (short: lcs) using dynamic programming; in this respect, i'm building a 2D array with x rows and Y columns, where X - length of string1, Y = length of string2.

First, i initialize the array with 0; (can this be avoided ?)

Then, i start from top-left, one character (from string 1) at a time and each time there's a match with a character from string2, i put +1 in that cell

Here's a visual description:

https://imgur.com/a/gz8JTms

I'm already using PL/SQL native compilation.

Following this lcs function, i'm using

Thanks!

Is there a way to make my code faster ?

I'm trying to find the longest common sub-string (short: lcs) using dynamic programming; in this respect, i'm building a 2D array with x rows and Y columns, where X - length of string1, Y = length of string2.

First, i initialize the array with 0; (can this be avoided ?)

Then, i start from top-left, one character (from string 1) at a time and each time there's a match with a character from string2, i put +1 in that cell

Here's a visual description:

https://imgur.com/a/gz8JTms

I'm already using PL/SQL native compilation.

Following this lcs function, i'm using

**Ratcliff-Obershelp**similarity algorithm and i'm relly stucked at how to build a function iteratively that would use this "lcs" function; any help would be greatly appreciated.Thanks!

with LiveSQL Test Case:

## and we said...

I'm not familiar with the Ratcliff-Obershelp similarity algorithm, but here are some thoughts:

Yes. You can cut the loops to initialize the matrix at the start.

Within the loop to test the length, if neither index variable is 1 (the else branch of i = 1 OR j = 1), get the previous index value if it exists. Otherwise set the current position to one.

Which looks something like this:

You should be able to build in logic to break out of the loops earlier too. If the length of current position + length of remaining string <= max length found so far, the remaining characters can't make a string any longer than the one you've already found. So you can exit the loop at this point.

Exactly how much this saves you depends on the input strings. Worst case there's no matching characters, so you're still looping through every element of both for O ( n*m ) runtime.

You could short-circuit by checking there are at least some matching characters before the loops.

One way to do this is with translate. Pass the input strings as the first two arguments and an out-of-bounds character as the third (e.g. #). Replace this out-of-bounds character in the result.

If the length of the final string equals the length of the first input, you know there are no matching characters. So can bypass the loops altogether.

I'm sure there are other optimizations; others may have ideas that can help.

*First, i initialize the array with 0; (can this be avoided ?)*Yes. You can cut the loops to initialize the matrix at the start.

Within the loop to test the length, if neither index variable is 1 (the else branch of i = 1 OR j = 1), get the previous index value if it exists. Otherwise set the current position to one.

Which looks something like this:

if p_matrix.exists ( i-1 ) and p_matrix ( i-1 ).exists ( j-1 ) then p_matrix ( i )( j ) := p_matrix ( i-1 )( j-1 ) + 1; else p_matrix ( i )( j ) := 1; end if;

You should be able to build in logic to break out of the loops earlier too. If the length of current position + length of remaining string <= max length found so far, the remaining characters can't make a string any longer than the one you've already found. So you can exit the loop at this point.

Exactly how much this saves you depends on the input strings. Worst case there's no matching characters, so you're still looping through every element of both for O ( n*m ) runtime.

You could short-circuit by checking there are at least some matching characters before the loops.

One way to do this is with translate. Pass the input strings as the first two arguments and an out-of-bounds character as the third (e.g. #). Replace this out-of-bounds character in the result.

select translate ( 'WIKIMEDIA', 'REMEDYA', '#' ) tr1, replace ( translate ('WIKIMEDIA', 'REMEDYA', '#'), '#' ) reptr1, translate ( 'WIKIMEDIA', 'NOSUB', '#' ) tr2, replace ( translate ('WIKIMEDIA', 'NOSUB', '#' ), '#' ) reptr2 from dual; TR1 REPTR1 TR2 REPTR2 WIKII WIKII WIKIMEDIA WIKIMEDIA

If the length of the final string equals the length of the first input, you know there are no matching characters. So can bypass the loops altogether.

I'm sure there are other optimizations; others may have ideas that can help.

## and you rated our response

# Reviews

Thanks Chris for the ideas to improve the lcs (longest common sub-string) algorithm, which is a per-requisite for the mentioned algorithm

Any thoughts on how to apply the function recursively ?

i.e. start from the input strings :

next we need to check the remaining chunks on either sides:

and

which finally leads to:

So a loop that exists whenever lcs() = '', but the tricky part is that the strings split in smaller strings, much like a nuclear fission reaction.

Thanks,

Any thoughts on how to apply the function recursively ?

i.e. start from the input strings :

lcs('WIKIMEDIA', 'REMEDYA') = 'MED'

next we need to check the remaining chunks on either sides:

lcs('WIKI', 'RE') = ''

and

lcs ('IA', 'YA') = 'A'

which finally leads to:

lcs('I', 'Y') = ''

So a loop that exists whenever lcs() = '', but the tricky part is that the strings split in smaller strings, much like a nuclear fission reaction.

Thanks,

I won't have time until after the weekend, but I'm going to attempt to do this in pure SQL in the MODEL clause ;-)

@Chris: Yes, i need to do that (apply

Count how many chars in

Then the string similarity score of the 2 initial strings is :

More info here:

https://en.wikipedia.org/wiki/Gestalt_Pattern_Matching

**lcs**) recursively, as this is what**Ratcliff-Obershelp**(aka Gestalt pattern matching) algorithm is all about.Count how many chars in

**lcs**until**lcs**returns for all strings.Then the string similarity score of the 2 initial strings is :

2 * (lcs_1 + lcs_2 + .. + lcs_n)/ (sum of characters in the 2 strings combined)

More info here:

https://en.wikipedia.org/wiki/Gestalt_Pattern_Matching

Followup

I see...Once you've found the LCS, you can get the precediing/following strings with a bit of SUBSTR/INSTR magic:

with strs as ( select 'WIKIMEDIA' s from dual ) select substr ( s, 1, instr ( s, 'MED' ) - 1 ) st, substr ( s, instr ( s, 'MED' ) + length ( 'MED' ) ) en from strs; ST EN WIKI IA

And use these to call the LCS algorithm again on each starting/ending pair.

This should be faster when there is a long common substring and slower when there is no match at all. It is better when "B" is the shorter string. "STRT" and "LEN" give you a head start for breaking off the unmatched bits from "B".

with input(a, b) as ( select 'antidisestablishmentarianism', 'disentanglement' from dual ) select strt, len, substr(b,strt,len) lcs from input i, lateral(select length(i.b) + 1 - level len from dual connect by length(i.b) + 1 - level > 0) s, lateral(select level strt from dual connect by level + s.len - 1 <= length(i.b)) where instr(a, substr(b,strt,len)) > 0 and rownum = 1; LCS ---- dise

Actually, this solution calculates for both strings the starting position of the LCS, so I should have provided both. This should make it easier to extract the extra bits before and after the LCS.

with input(a, b) as ( select 'antidisestablishmentarianism', 'disentanglement' from dual ) select instr(a, substr(b,b_strt,len)) a_strt, b_strt, len, substr(b, b_strt, len) lcs from input i, lateral(select length(i.b) + 1 - level len from dual connect by length(i.b) + 1 - level > 0) s, lateral(select level b_strt from dual connect by level + s.len - 1 <= length(i.b)) where instr(a, substr(b,b_strt,len)) > 0 and rownum = 1; A_STRT B_STRT LEN LCS ------ ------ --- ---- 5 1 4 dise

@Stew: nice use of lateral!

@Chris:

Coming back to your suggestion,

which outputs:

Now to the part of making the recursive loop: i was thinking of calling

and

You guys have a better idea ?

@Chris:

Coming back to your suggestion,

with strs as ( select 'WIKIMEDIA' s from dual UNION select 'REMEDIYA' from dual ) select substr ( s, 1, instr ( s, 'MED' ) - 1 ) left_Str, substr ( s, instr ( s, 'MED' ) + length ( 'MED' ) ) right_Str from strs;

which outputs:

LEFT_STR RIGHT_STR RE IYA WIKI IA

Now to the part of making the recursive loop: i was thinking of calling

lcs((1, 1), (2,1))

and

lcs((1,2), (2,2))

You guys have a better idea ?

Something like this:

The challenge is to make this recursion in (pl)sql..

TYPE left_string is TABLE OF VARCHAR2 INDEX BY PLS_INTEGER; TYPE right_string is TABLE OF VARCHAR2 INDEX BY PLS_INTEGER; left_string(1) := 'WIKIMEDIA'; --string length: 9 right_string(1) := 'REMEDIYA'; --string length: 8 lcs(left_string(1), right_string(1)) := 'MEDI'; left_string(2) := 'WIKI'; right_string(2) := 'RE'; left_string(3) := 'IA'; right_string(3) := 'YA'; lcs(left_string(2), right_string(2)) := ''; --dead end lcs(left_string(3), right_string(3)) := 'A'; left_string(4) := 'I'; left_string(4) := 'Y'; left_string(5) := ''; --right side of 'A' from 'IA', dead end, stop computing right_string(5). lcs(left_string(5), right_string(5)) := ''; --last string chunks to compare, exit loop --We now compute the length of all lcs: 4 + 1 = 5 and considering the length of the initial strings 17 (9 + 8), --this makes the Ratcliff-Obershelp similarity score: 2 * 5 / 17 = 0.588

The challenge is to make this recursion in (pl)sql..

create or replace function gestalt( l in varchar2, r in varchar2 ) return number is function lcs ( l in varchar2, r in varchar2 ) return number is ll varchar2(4000); rl varchar2(4000); lr varchar2(4000); rr varchar2(4000); begin for rec in ( select instr(l, substr(r,r_strt,len)) l_strt, r_strt, len from dual, lateral(select length(r) + 1 - level len from dual connect by length(r) + 1 - level > 0) s, lateral(select level r_strt from dual connect by level + s.len - 1 <= length(r)) e where instr(l, substr(r,e.r_strt,len)) > 0 and rownum <= 1 ) loop ll := substr(l, 1, rec.l_strt - 1); rl := substr(r, 1, rec.r_strt - 1); lr := substr(l, rec.l_strt + rec.len); rr := substr(r, rec.r_strt + rec.len); return rec.len + lcs(ll, rl) + lcs(lr, rr); end loop; return 0; end lcs; begin return 2 * lcs(l, r) / ( length(l) + length(r) ); end gestalt; / select round(gestalt('WIKIMEDIA', 'REMEDIYA'),3) from dual; 0.588

WITH strings (string1, string2) as (select 'antidisestablishmentarianism', 'disentanglement' from dual) ,string1_data(stringcount, stringlen, start_pos_counter, start_pos ) AS ( SELECT (LENGTH(string1)/2)*(LENGTH(string1)+1) r, LENGTH(string1), LENGTH(string1), 1 FROM strings t UNION ALL SELECT stringcount-1 , stringlen , case when start_pos >= least(stringlen,start_pos_counter) then start_pos_counter - 1 else start_pos_counter end , case when start_pos >= least(stringlen,start_pos_counter) then 1 else start_pos + 1 end FROM string1_data WHERE stringcount > 1 ),string1_occurences (substring) as ( SELECT substr(string1,start_pos,stringlen - start_pos_counter + 1) FROM string1_data cross join strings ),string2_data(stringcount, stringlen, start_pos_counter, start_pos ) AS ( SELECT (LENGTH(string2)/2)*(LENGTH(string2)+1) r, LENGTH(string2), LENGTH(string2), 1 FROM strings t UNION ALL SELECT stringcount-1 , stringlen , case when start_pos >= least(stringlen,start_pos_counter) then start_pos_counter - 1 else start_pos_counter end , case when start_pos >= least(stringlen,start_pos_counter) then 1 else start_pos + 1 end FROM string2_data WHERE stringcount > 1 ),string2_occurences (substring) as ( SELECT substr(string2,start_pos,stringlen - start_pos_counter + 1) FROM string2_data cross join strings ) select str1.substring from string1_occurences str1 join string2_occurences str2 on str1.substring = str2.substring order by length(str1.substring) desc fetch first 1 rows only;

@Stew: you're

I was struggling with building a strings array with all lcs (iteratively, divide&conquer style):

https://livesql.oracle.com/apex/livesql/s/kj0kjr5p5o47fh3pohybch14w

But your

Thank you very much, fantastic community!

__amazing__!I was struggling with building a strings array with all lcs (iteratively, divide&conquer style):

https://livesql.oracle.com/apex/livesql/s/kj0kjr5p5o47fh3pohybch14w

But your

**gestalt**function made it look trivial!Thank you very much, fantastic community!

Can i have one last wish ?:)

Using Stew's SQL approach/function i get to process 70K strings in ~20 sec on my box.

I am assuming this is due to PL/SQL - SQL context switch.

Ideally, the results should come in 1-3 seconds for 100K strings.

I'm thinking of a PL/SQL approach for the (recursive) lcs function could perform better.

Any thoughts ?

Using Stew's SQL approach/function i get to process 70K strings in ~20 sec on my box.

I am assuming this is due to PL/SQL - SQL context switch.

Ideally, the results should come in 1-3 seconds for 100K strings.

I'm thinking of a PL/SQL approach for the (recursive) lcs function could perform better.

Any thoughts ?

I'm comparing the PL/SQL and SQL version to compute

Scenario:

min_threshold: 0.65 (empiric value), testing string similarity for 'PRIVIL' against 105314 rows (all_objects * 2).

(1) PL/SQL version

https://livesql.oracle.com/apex/livesql/s/kj3h8q55suglbbyy0edrnmieq

Result:

2 strings selected out of pool of [105314] in

(2) SQL version

https://livesql.oracle.com/apex/livesql/s/kj3pcfcl4hlqh6db9cfr45mjf

Result:

Can i make the PL/SQL version drop to

**Ratcliff-Obershelp**string similarity.Scenario:

min_threshold: 0.65 (empiric value), testing string similarity for 'PRIVIL' against 105314 rows (all_objects * 2).

(1) PL/SQL version

https://livesql.oracle.com/apex/livesql/s/kj3h8q55suglbbyy0edrnmieq

Result:

2 strings selected out of pool of [105314] in

**13.42 seconds**.(2) SQL version

https://livesql.oracle.com/apex/livesql/s/kj3pcfcl4hlqh6db9cfr45mjf

Result:

**ORA-00040: active time limit exceeded**Can i make the PL/SQL version drop to

**max 3 seconds**for same workload (100K records) ? Or the SQL version (that is currently dropped by the resource manager before finishing) ?
Borrowing the so elegant first step of Stew Ashton, here is another "divide and conquer" solution,

just for your fun.

Still using a single SELECT, you can extend it to process several pairs by replacing the INPUT query.

Here the output shows the recursion step by step, for the final result you should aggregate to get a single row

for each input entry.

Cheers :)

just for your fun.

WITH FUNCTION f_lcs (p_str1 IN VARCHAR2, p_str2 IN VARCHAR2) RETURN VARCHAR2 AS l_lcs VARCHAR2(2000); BEGIN -- if p_str1 is null or p_str2 is null then return null; end if; -- select substr(i.b, strt.b_strt, slen.len) lcs into l_lcs from ( select p_str1 a, p_str2 b from dual ) i, lateral(select length(i.b) + 1 - level len from dual connect by length(i.b) + 1 - level > 0 ) slen, lateral(select level b_strt from dual connect by level + slen.len - 1 <= length(i.b) ) strt where instr(i.a, substr(i.b,strt.b_strt,slen.len)) > 0 order by slen.len desc fetch first 1 row only; -- return l_lcs; EXCEPTION WHEN NO_DATA_FOUND THEN RETURN NULL; END; -- input(str1, str2) as ( select 'antidisestablishmentarianism', 'disentanglement' from dual ), -- recurs(str1, str2, lcs, lvl) as ( select str1, str2, f_lcs(str1, str2), 1 from input union all select case d.rn when 1 then substr(r.str1, 1, instr(r.str1, r.lcs) - 1) else substr(r.str1, instr(r.str1, r.lcs) + length(r.lcs)) end str1 , case d.rn when 1 then substr(r.str2, 1, instr(r.str2, r.lcs) - 1) else substr(r.str2, instr(r.str2, r.lcs) + length(r.lcs)) end str2 , f_lcs( case d.rn when 1 then substr(r.str1, 1, instr(r.str1, r.lcs) - 1) else substr(r.str1, instr(r.str1, r.lcs) + length(r.lcs)) end, case d.rn when 1 then substr(r.str2, 1, instr(r.str2, r.lcs) - 1) else substr(r.str2, instr(r.str2, r.lcs) + length(r.lcs)) end ) lcs , r.lvl + 1 from recurs r, ( select rownum rn from dual connect by level <= 2 ) d where r.lcs is not null ) -- select * from recurs / STR1 STR2 LCS LVL ---------------------------------------------------------- antidisestablishmentarianism disentanglement dise 1 anti - - 2 stablishmentarianism ntanglement ment 2 stablish ntangle ta 3 arianism - - 3 s n - 4 blish ngle l 4 b ng - 5 ish e - 5 9 rows selected.

Still using a single SELECT, you can extend it to process several pairs by replacing the INPUT query.

Here the output shows the recursion step by step, for the final result you should aggregate to get a single row

for each input entry.

Cheers :)

SQL> create table t as select object_name from all_objects; Table T created. SQL> ALTER SESSION SET PLSQL_CODE_TYPE='NATIVE'; Session altered. SQL> create or replace function gestalt2( 2 l in varchar2, 3 r in varchar2 4 ) return number is 5 function lcs ( 6 l in varchar2, 7 r in varchar2 8 ) return number is 9 l_start pls_integer := 0; 10 r_start pls_integer := 0; 11 lr_length pls_integer := least(length(l),length(r)); 12 ll varchar2(128); 13 rl varchar2(128); 14 lr varchar2(128); 15 rr varchar2(128); 16 begin 17 if l is null or r is null then 18 return 0; 19 end if; 20 <<len_loop>> 21 loop 22 l_start := 1; 23 loop 24 r_start := instr(r,substr(l,l_start,lr_length)); 25 exit len_loop when r_start > 0; 26 exit when l_start = length(l) - lr_length + 1; 27 l_start := l_start + 1; 28 end loop; 29 exit when lr_length = 1; 30 lr_length := lr_length - 1; 31 end loop; 32 if r_start = 0 then 33 return 0; 34 else 35 ll := substr(l, 1, l_start - 1); 36 rl := substr(r, 1, r_start - 1); 37 lr := substr(l, l_start + lr_length); 38 rr := substr(r, r_start + lr_length); 39 return lr_length + lcs(ll, rl) + lcs(lr, rr); 40 end if; 41 end lcs; 42 begin 43 return 2 * lcs(l, r) / ( length(l) + length(r) ); 44 end gestalt2; 45 / Function GESTALT2 compiled SQL> set timing on SQL> declare 2 l_gestalt number; 3 begin 4 for rec in (select * from t) loop 5 l_gestalt := gestalt2('PRIVIL', rec.object_name); 6 end loop; 7 end; 8 / PL/SQL procedure successfully completed. Elapsed: 00:00:00.709Note: T has 80306 rows

@Stew: Chapeau!

It's ugly and slow, but here it is anyway ;-)

For one word-pair:

For about a thousand:

But takes a couple seconds for 988 word pairs for me, so definitely not better than Stew's solution.

And this is also a lot uglier and hard to understand - but was fun to make ;-)

For one word-pair:

select w1, w2, 2 * lmx / (length(w1) + length(w2)) ro_score from ( select * from ( select 'WIKIMEDIA' as w1, 'REMEDIYA' as w2 from dual ) model partition by (w1, w2) dimension by (0 as i, 0 as p1, 0 as p2) measures (upper(w1) as m1, upper(w2) as m2, cast(null as varchar2(1)) as ch, 0 as mx, 0 as lmx, 0 as l1, 0 as l2) ignore nav rules sequential order iterate(10) until iteration_number >= l2[-1, 0, 0] ( ch[iteration_number, for p1 from 1 to length(m1[iteration_number,0,0]) increment 1, 0] = substr(m1[cv(i),0,0], cv(p1), 1) , ch[iteration_number, 0, for p2 from 1 to length(m2[iteration_number,0,0]) increment 1] = substr(m2[cv(i),0,0], cv(p2), 1) , mx[iteration_number, for p1 from 1 to length(m1[iteration_number,0,0]) increment 1, for p2 from 1 to length(m2[iteration_number,0,0]) increment 1] = case ch[cv(i), cv(p1), 0] when ch[cv(i), 0, cv(p2)] then mx[cv(i), cv(p1)-1, cv(p2)-1] + 1 end , lmx[iteration_number, 0, 0] = max(mx)[cv(i), any, any] , l1[iteration_number, any, any] = case mx[cv(), cv(), cv()] when lmx[cv(i), 0 ,0] then cv(p1) end , l2[iteration_number, any, any] = case mx[cv(), cv(), cv()] when lmx[cv(i), 0 ,0] then cv(p2) end , l1[iteration_number, 0, 0] = min(l1)[cv(i), any, any] , l2[iteration_number, 0, 0] = min(l2)[cv(i), any, any] , lmx[-1, 0, 0] = lmx[-1, 0, 0] + lmx[iteration_number, 0, 0] , m1[l2[-1, 0, 0]+1, 0, 0] = case when lmx[iteration_number, 0, 0] > 0 then substr(m1[iteration_number, 0, 0], 1, l1[iteration_number, 0, 0] - lmx[iteration_number, 0, 0]) end , m2[l2[-1, 0, 0]+1, 0, 0] = case when lmx[iteration_number, 0, 0] > 0 then substr(m2[iteration_number, 0, 0], 1, l2[iteration_number, 0, 0] - lmx[iteration_number, 0, 0]) end , l2[-1, 0, 0] = l2[-1, 0, 0] + case when m1[l2[-1, 0, 0]+1, 0, 0] is not null and m2[l2[-1, 0, 0]+1, 0, 0] is not null then 1 else 0 end , m1[l2[-1, 0, 0]+1, 0, 0] = case when lmx[iteration_number, 0, 0] > 0 then substr(m1[iteration_number, 0, 0], l1[iteration_number, 0, 0] + 1) end , m2[l2[-1, 0, 0]+1, 0, 0] = case when lmx[iteration_number, 0, 0] > 0 then substr(m2[iteration_number, 0, 0], l2[iteration_number, 0, 0] + 1) end , l2[-1, 0, 0] = l2[-1, 0, 0] + case when m1[l2[-1, 0, 0]+1, 0, 0] is not null and m2[l2[-1, 0, 0]+1, 0, 0] is not null then 1 else 0 end ) ) where (i, p1, p2) = ((-1,0,0)) order by ro_score desc nulls last;

For about a thousand:

select w1, w2, 2 * lmx / (length(w1) + length(w2)) ro_score from ( select * from ( select distinct 'PRIVIL' as w1, object_name as w2 from all_objects where rownum <= 1000 ) model partition by (w1, w2) dimension by (0 as i, 0 as p1, 0 as p2) measures (upper(w1) as m1, upper(w2) as m2, cast(null as varchar2(1)) as ch, 0 as mx, 0 as lmx, 0 as l1, 0 as l2) ignore nav rules sequential order iterate(10) until iteration_number >= l2[-1, 0, 0] ( ch[iteration_number, for p1 from 1 to length(m1[iteration_number,0,0]) increment 1, 0] = substr(m1[cv(i),0,0], cv(p1), 1) , ch[iteration_number, 0, for p2 from 1 to length(m2[iteration_number,0,0]) increment 1] = substr(m2[cv(i),0,0], cv(p2), 1) , mx[iteration_number, for p1 from 1 to length(m1[iteration_number,0,0]) increment 1, for p2 from 1 to length(m2[iteration_number,0,0]) increment 1] = case ch[cv(i), cv(p1), 0] when ch[cv(i), 0, cv(p2)] then mx[cv(i), cv(p1)-1, cv(p2)-1] + 1 end , lmx[iteration_number, 0, 0] = max(mx)[cv(i), any, any] , l1[iteration_number, any, any] = case mx[cv(), cv(), cv()] when lmx[cv(i), 0 ,0] then cv(p1) end , l2[iteration_number, any, any] = case mx[cv(), cv(), cv()] when lmx[cv(i), 0 ,0] then cv(p2) end , l1[iteration_number, 0, 0] = min(l1)[cv(i), any, any] , l2[iteration_number, 0, 0] = min(l2)[cv(i), any, any] , lmx[-1, 0, 0] = lmx[-1, 0, 0] + lmx[iteration_number, 0, 0] , m1[l2[-1, 0, 0]+1, 0, 0] = case when lmx[iteration_number, 0, 0] > 0 then substr(m1[iteration_number, 0, 0], 1, l1[iteration_number, 0, 0] - lmx[iteration_number, 0, 0]) end , m2[l2[-1, 0, 0]+1, 0, 0] = case when lmx[iteration_number, 0, 0] > 0 then substr(m2[iteration_number, 0, 0], 1, l2[iteration_number, 0, 0] - lmx[iteration_number, 0, 0]) end , l2[-1, 0, 0] = l2[-1, 0, 0] + case when m1[l2[-1, 0, 0]+1, 0, 0] is not null and m2[l2[-1, 0, 0]+1, 0, 0] is not null then 1 else 0 end , m1[l2[-1, 0, 0]+1, 0, 0] = case when lmx[iteration_number, 0, 0] > 0 then substr(m1[iteration_number, 0, 0], l1[iteration_number, 0, 0] + 1) end , m2[l2[-1, 0, 0]+1, 0, 0] = case when lmx[iteration_number, 0, 0] > 0 then substr(m2[iteration_number, 0, 0], l2[iteration_number, 0, 0] + 1) end , l2[-1, 0, 0] = l2[-1, 0, 0] + case when m1[l2[-1, 0, 0]+1, 0, 0] is not null and m2[l2[-1, 0, 0]+1, 0, 0] is not null then 1 else 0 end ) ) where (i, p1, p2) = ((-1,0,0)) order by ro_score desc nulls last;

But takes a couple seconds for 988 word pairs for me, so definitely not better than Stew's solution.

And this is also a lot uglier and hard to understand - but was fun to make ;-)

I enjoyed spending an "Office Hour" on this topic with Chris and several attendees. While explaining the "pure PL/SQL" solution, I was thinking to myself "why is the looping logic so complicated???". Afterwards, I realized that plain old FOR loops could do the job just as well!

Anyway, here is a solution that puts the function in a WITH clause (as Iudith Wentzel did previously). The function is easier to understand. I do apologize for inflicting some unnecessary complications.

Anyway, here is a solution that puts the function in a WITH clause (as Iudith Wentzel did previously). The function is easier to understand. I do apologize for inflicting some unnecessary complications.

with function gestalt4(l in varchar2, r in varchar2) return number is function lcs (l in varchar2, r in varchar2) return number is l_start pls_integer; ll varchar2(128); rl varchar2(128); lr varchar2(128); rr varchar2(128); begin if l is null or r is null then return 0; end if; for lr_length in reverse 1..least(length(l),length(r)) loop for r_start in 1..length(r) - lr_length + 1 loop l_start := instr(l,substr(r,r_start,lr_length)); if l_start > 0 then dbms_output.put_line(substr(r,r_start,lr_length)); ll := substr(l, 1, l_start - 1); rl := substr(r, 1, r_start - 1); lr := substr(l, l_start + lr_length); rr := substr(r, r_start + lr_length); return lr_length + lcs(ll, rl) + lcs(lr, rr); end if; end loop; end loop; return 0; end lcs; begin return 2 * lcs(l, r) / ( nvl(length(l),0) + nvl(length(r),0) ); end gestalt4; select gestalt4('gestalt practice', 'gestalt pattern matching') diff from dual; / DIFF ---------- 0.6 gestalt p a t e