• Questions
• # Longest common substring - Ratcliff-Obershelp similarity algorithm Thanks for the question.

Asked: August 18, 2020 - 11:00 am UTC

Last updated: October 27, 2020 - 1:33 am UTC

Version: 11.2.0.4

Viewed 1000+ times

Hi TOM,

Is there a way to make my code faster ?
I'm trying to find the longest common sub-string (short: lcs) using dynamic programming; in this respect, i'm building a 2D array with x rows and Y columns, where X - length of string1, Y = length of string2.
First, i initialize the array with 0; (can this be avoided ?)
Then, i start from top-left, one character (from string 1) at a time and each time there's a match with a character from string2, i put +1 in that cell
Here's a visual description:

https://imgur.com/a/gz8JTms

I'm already using PL/SQL native compilation.

Following this lcs function, i'm using Ratcliff-Obershelp similarity algorithm and i'm relly stucked at how to build a function iteratively that would use this "lcs" function; any help would be greatly appreciated.

Thanks!

with LiveSQL Test Case:

## and we said...

I'm not familiar with the Ratcliff-Obershelp similarity algorithm, but here are some thoughts:

First, i initialize the array with 0; (can this be avoided ?)

Yes. You can cut the loops to initialize the matrix at the start.

Within the loop to test the length, if neither index variable is 1 (the else branch of i = 1 OR j = 1), get the previous index value if it exists. Otherwise set the current position to one.

Which looks something like this:

```if p_matrix.exists ( i-1 ) and p_matrix ( i-1 ).exists ( j-1 ) then
p_matrix ( i )( j ) := p_matrix ( i-1 )( j-1 ) + 1;
else
p_matrix ( i )( j ) := 1;
end if;```

You should be able to build in logic to break out of the loops earlier too. If the length of current position + length of remaining string <= max length found so far, the remaining characters can't make a string any longer than the one you've already found. So you can exit the loop at this point.

Exactly how much this saves you depends on the input strings. Worst case there's no matching characters, so you're still looping through every element of both for O ( n*m ) runtime.

You could short-circuit by checking there are at least some matching characters before the loops.

One way to do this is with translate. Pass the input strings as the first two arguments and an out-of-bounds character as the third (e.g. #). Replace this out-of-bounds character in the result.

```select
translate ( 'WIKIMEDIA', 'REMEDYA', '#' ) tr1,
replace ( translate ('WIKIMEDIA', 'REMEDYA', '#'), '#' ) reptr1,
translate ( 'WIKIMEDIA', 'NOSUB', '#' ) tr2,
replace ( translate ('WIKIMEDIA', 'NOSUB', '#' ), '#' ) reptr2
from dual;

TR1     REPTR1   TR2         REPTR2
WIKII   WIKII    WIKIMEDIA   WIKIMEDIA   ```

If the length of the final string equals the length of the first input, you know there are no matching characters. So can bypass the loops altogether.

I'm sure there are other optimizations; others may have ideas that can help.

## Rating

(17 ratings)

Is this answer out of date? If it is, please let us know via a Comment

### Ratcliff-Obershelp

Alex, August 21, 2020 - 9:39 am UTC

Thanks Chris for the ideas to improve the lcs (longest common sub-string) algorithm, which is a per-requisite for the mentioned algorithm
Any thoughts on how to apply the function recursively ?
i.e. start from the input strings :
`lcs('WIKIMEDIA', 'REMEDYA') = 'MED'`

next we need to check the remaining chunks on either sides:
`lcs('WIKI', 'RE') = ''`

and
`lcs ('IA', 'YA') = 'A'`

`lcs('I', 'Y') = ''`

So a loop that exists whenever lcs() = '', but the tricky part is that the strings split in smaller strings, much like a nuclear fission reaction.
Thanks, August 21, 2020 - 10:59 am UTC

I'm not sure exactly what you're getting at - why do you want to do this recursively? Please clarify!

### Setting myself a little challenge ;-)

Kim Berg Hansen, August 21, 2020 - 9:57 am UTC

I won't have time until after the weekend, but I'm going to attempt to do this in pure SQL in the MODEL clause ;-) August 21, 2020 - 11:00 am UTC

Go Kim! I had a quick try with match_recognize, maybe you'll have more luck :)

### Follow-up

Alex, August 21, 2020 - 1:00 pm UTC

@Chris: Yes, i need to do that (apply lcs) recursively, as this is what Ratcliff-Obershelp (aka Gestalt pattern matching) algorithm is all about.
Count how many chars in lcsuntil lcs returns for all strings.
Then the string similarity score of the 2 initial strings is :
`2 * (lcs_1 + lcs_2 + .. + lcs_n)/ (sum of characters in the 2 strings combined) `

https://en.wikipedia.org/wiki/Gestalt_Pattern_Matching August 21, 2020 - 2:08 pm UTC

I see...

Once you've found the LCS, you can get the precediing/following strings with a bit of SUBSTR/INSTR magic:

```with strs as (
select 'WIKIMEDIA' s from dual
)
select substr ( s, 1, instr ( s, 'MED' ) - 1 ) st,
substr ( s, instr ( s, 'MED' ) + length ( 'MED' ) ) en
from   strs;

ST     EN
WIKI   IA  ```

And use these to call the LCS algorithm again on each starting/ending pair.

### Using recursive subquery factoring for the LCS

Stew Ashton, August 21, 2020 - 4:15 pm UTC

This should be faster when there is a long common substring and slower when there is no match at all. It is better when "B" is the shorter string. "STRT" and "LEN" give you a head start for breaking off the unmatched bits from "B".

```with input(a, b) as (
select 'antidisestablishmentarianism', 'disentanglement' from dual
)
select strt, len, substr(b,strt,len) lcs
from input i,
lateral(select length(i.b) + 1 - level len from dual connect by length(i.b) + 1 - level > 0) s,
lateral(select level strt from dual connect by level + s.len - 1 <= length(i.b))
where instr(a, substr(b,strt,len)) > 0
and rownum = 1;

LCS
----
dise```

### Slight improvement

Stew Ashton, August 21, 2020 - 7:17 pm UTC

Actually, this solution calculates for both strings the starting position of the LCS, so I should have provided both. This should make it easier to extract the extra bits before and after the LCS.
```with input(a, b) as (
select 'antidisestablishmentarianism', 'disentanglement' from dual
)
select instr(a, substr(b,b_strt,len)) a_strt, b_strt, len, substr(b, b_strt, len) lcs
from input i,
lateral(select length(i.b) + 1 - level len from dual connect by length(i.b) + 1 - level > 0) s,
lateral(select level b_strt from dual connect by level + s.len - 1 <= length(i.b))
where instr(a, substr(b,b_strt,len)) > 0
and rownum = 1;

A_STRT  B_STRT  LEN  LCS
------  ------  ---  ----
5       1       4    dise```

### Follow-up

Alex, August 22, 2020 - 6:22 am UTC

@Stew: nice use of lateral!
@Chris:
```with strs as (
select 'WIKIMEDIA' s from dual
UNION
select 'REMEDIYA' from dual
)
select substr ( s, 1, instr ( s, 'MED' ) - 1 ) left_Str,
substr ( s, instr ( s, 'MED' ) + length ( 'MED' ) ) right_Str
from   strs;```

which outputs:

```LEFT_STR RIGHT_STR
RE IYA
WIKI IA
```

Now to the part of making the recursive loop: i was thinking of calling
`lcs((1, 1), (2,1))`

and
`lcs((1,2), (2,2))`

You guys have a better idea ?

### follow-up

Alex, August 22, 2020 - 9:22 am UTC

Something like this:
```TYPE left_string is TABLE OF VARCHAR2 INDEX BY PLS_INTEGER;
TYPE right_string is TABLE OF VARCHAR2 INDEX BY PLS_INTEGER;
left_string(1) := 'WIKIMEDIA'; --string length: 9
right_string(1) := 'REMEDIYA'; --string length: 8
lcs(left_string(1), right_string(1)) := 'MEDI';

left_string(2) := 'WIKI';
right_string(2) := 'RE';

left_string(3) := 'IA';
right_string(3) := 'YA';

lcs(left_string(2), right_string(2)) := ''; --dead end
lcs(left_string(3), right_string(3)) := 'A';

left_string(4) := 'I';
left_string(4) := 'Y';

left_string(5) := ''; --right side of 'A' from 'IA', dead end, stop computing right_string(5).

lcs(left_string(5), right_string(5)) := ''; --last string chunks to compare, exit loop

--We now compute the length of all lcs: 4 + 1 = 5 and considering the length of the initial strings 17 (9 + 8),
--this makes the Ratcliff-Obershelp similarity score:
2 * 5 / 17 = 0.588
```

The challenge is to make this recursion in (pl)sql..

### Attempt at full solution

Stew Ashton, August 22, 2020 - 12:31 pm UTC

```create or replace function gestalt(
l in varchar2,
r in varchar2
) return number is
function lcs (
l in varchar2,
r in varchar2
) return number is
ll varchar2(4000);
rl varchar2(4000);
lr varchar2(4000);
rr varchar2(4000);
begin
for rec in (
select instr(l, substr(r,r_strt,len)) l_strt, r_strt, len
from dual,
lateral(select length(r) + 1 - level len from dual connect by length(r) + 1 - level > 0) s,
lateral(select level r_strt from dual connect by level + s.len - 1 <= length(r)) e
where instr(l, substr(r,e.r_strt,len)) > 0
and rownum <= 1
) loop
ll := substr(l, 1, rec.l_strt - 1);
rl := substr(r, 1, rec.r_strt - 1);
lr := substr(l, rec.l_strt + rec.len);
rr := substr(r, rec.r_strt + rec.len);
return rec.len + lcs(ll, rl) + lcs(lr, rr);
end loop;
return 0;
end lcs;
begin
return 2 * lcs(l, r) / ( length(l) + length(r) );
end gestalt;
/
select round(gestalt('WIKIMEDIA', 'REMEDIYA'),3) from dual;

0.588```

### old fashion way using recursive sql

Koen Lostrie, August 22, 2020 - 3:46 pm UTC

```WITH strings (string1, string2) as
(select 'antidisestablishmentarianism', 'disentanglement' from dual)
,string1_data(stringcount, stringlen, start_pos_counter, start_pos ) AS
(
SELECT (LENGTH(string1)/2)*(LENGTH(string1)+1) r, LENGTH(string1), LENGTH(string1), 1  FROM strings t
UNION ALL
SELECT stringcount-1
, stringlen
, case when start_pos >= least(stringlen,start_pos_counter) then start_pos_counter - 1 else start_pos_counter end
, case when start_pos >= least(stringlen,start_pos_counter) then 1 else start_pos + 1 end
FROM string1_data WHERE stringcount > 1
),string1_occurences (substring) as
(
SELECT substr(string1,start_pos,stringlen - start_pos_counter + 1)  FROM string1_data cross join strings
),string2_data(stringcount, stringlen, start_pos_counter, start_pos ) AS
(
SELECT (LENGTH(string2)/2)*(LENGTH(string2)+1) r, LENGTH(string2), LENGTH(string2), 1  FROM strings t
UNION ALL
SELECT stringcount-1
, stringlen
, case when start_pos >= least(stringlen,start_pos_counter) then start_pos_counter - 1 else start_pos_counter end
, case when start_pos >= least(stringlen,start_pos_counter) then 1 else start_pos + 1 end
FROM string2_data WHERE stringcount > 1
),string2_occurences (substring) as
(
SELECT substr(string2,start_pos,stringlen - start_pos_counter + 1)  FROM string2_data cross join strings
)
select str1.substring
from string1_occurences str1
join string2_occurences str2 on str1.substring = str2.substring order by length(str1.substring) desc fetch first 1 rows only; ``` August 24, 2020 - 2:13 pm UTC

Good stuff Koen :)

### Follow-up: many thanks

Alex, August 22, 2020 - 4:14 pm UTC

@Stew: you're amazing!
I was struggling with building a strings array with all lcs (iteratively, divide&conquer style):
https://livesql.oracle.com/apex/livesql/s/kj0kjr5p5o47fh3pohybch14w

Thank you very much, fantastic community!

### Follow-up - performance

Alex, August 22, 2020 - 6:03 pm UTC

Can i have one last wish ?:)
Using Stew's SQL approach/function i get to process 70K strings in ~20 sec on my box.
I am assuming this is due to PL/SQL - SQL context switch.
Ideally, the results should come in 1-3 seconds for 100K strings.
I'm thinking of a PL/SQL approach for the (recursive) lcs function could perform better.
Any thoughts ?

### Progress reporting

Alex, August 23, 2020 - 4:43 am UTC

I'm comparing the PL/SQL and SQL version to compute Ratcliff-Obershelp string similarity.

Scenario:

min_threshold: 0.65 (empiric value), testing string similarity for 'PRIVIL' against 105314 rows (all_objects * 2).

(1) PL/SQL version
https://livesql.oracle.com/apex/livesql/s/kj3h8q55suglbbyy0edrnmieq
Result:
2 strings selected out of pool of  in 13.42 seconds.

(2) SQL version
https://livesql.oracle.com/apex/livesql/s/kj3pcfcl4hlqh6db9cfr45mjf
Result:
ORA-00040: active time limit exceeded

Can i make the PL/SQL version drop to max 3 seconds for same workload (100K records) ? Or the SQL version (that is currently dropped by the resource manager before finishing) ?

### Another recursive variant

Iudith Mentzel, August 23, 2020 - 10:53 am UTC

Borrowing the so elegant first step of Stew Ashton, here is another "divide and conquer" solution,

```WITH
FUNCTION f_lcs (p_str1 IN VARCHAR2, p_str2 IN VARCHAR2) RETURN VARCHAR2
AS
l_lcs  VARCHAR2(2000);
BEGIN
--
if p_str1 is null or p_str2 is null then
return null;
end if;
--
select substr(i.b, strt.b_strt, slen.len) lcs
into l_lcs
from
( select p_str1 a, p_str2 b from dual ) i,
lateral(select length(i.b) + 1 - level len
from dual
connect by length(i.b) + 1 - level > 0
) slen,
lateral(select level b_strt
from dual
connect by level + slen.len - 1 <= length(i.b)
) strt
where instr(i.a, substr(i.b,strt.b_strt,slen.len)) > 0
order by slen.len desc
fetch first 1 row only;
--
return l_lcs;
EXCEPTION
WHEN NO_DATA_FOUND THEN
RETURN NULL;
END;
--
input(str1, str2) as (
select 'antidisestablishmentarianism', 'disentanglement' from dual
),
--
recurs(str1, str2, lcs, lvl) as (
select str1, str2, f_lcs(str1, str2), 1
from input
union all
select case d.rn
when 1 then substr(r.str1, 1, instr(r.str1, r.lcs) - 1)
else substr(r.str1, instr(r.str1, r.lcs) + length(r.lcs))
end str1
, case d.rn
when 1 then substr(r.str2, 1, instr(r.str2, r.lcs) - 1)
else substr(r.str2, instr(r.str2, r.lcs) + length(r.lcs))
end str2
, f_lcs(
case d.rn
when 1 then substr(r.str1, 1, instr(r.str1, r.lcs) - 1)
else substr(r.str1, instr(r.str1, r.lcs) + length(r.lcs))
end,
case d.rn
when 1 then substr(r.str2, 1, instr(r.str2, r.lcs) - 1)
else substr(r.str2, instr(r.str2, r.lcs) + length(r.lcs))
end
)  lcs
, r.lvl + 1
from
recurs r,
( select rownum rn from dual connect by level <= 2 )  d
where
r.lcs is not null
)
--
select *
from recurs
/

STR1    STR2  LCS LVL
----------------------------------------------------------
antidisestablishmentarianism disentanglement dise 1
anti     -    -  2
stablishmentarianism  ntanglement ment 2
stablish   ntangle  ta 3
arianism    -    -  3
s    n   -  4
blish    ngle  l 4
b    ng   -  5
ish    e   -  5

9 rows selected.

```

Still using a single SELECT, you can extend it to process several pairs by replacing the INPUT query.

Here the output shows the recursion step by step, for the final result you should aggregate to get a single row
for each input entry.
Cheers :) August 24, 2020 - 2:12 pm UTC

Nice stuff Iudith

### Solution with fewer context switches

Stew Ashton, August 23, 2020 - 2:56 pm UTC

```SQL> create table t as select object_name from all_objects;

Table T created.

SQL> ALTER SESSION SET PLSQL_CODE_TYPE='NATIVE';

Session altered.

SQL> create or replace function gestalt2(
2    l in varchar2,
3    r in varchar2
4  ) return number is
5    function lcs (
6      l in varchar2,
7      r in varchar2
8    ) return number is
9      l_start pls_integer := 0;
10      r_start pls_integer := 0;
11      lr_length pls_integer := least(length(l),length(r));
12      ll varchar2(128);
13      rl varchar2(128);
14      lr varchar2(128);
15      rr varchar2(128);
16    begin
17      if l is null or r is null then
18        return 0;
19      end if;
20      <<len_loop>>
21      loop
22        l_start := 1;
23        loop
24          r_start := instr(r,substr(l,l_start,lr_length));
25          exit len_loop when r_start > 0;
26          exit when l_start = length(l) - lr_length + 1;
27          l_start :=  l_start + 1;
28        end loop;
29        exit when lr_length = 1;
30        lr_length := lr_length - 1;
31      end loop;
32      if r_start = 0 then
33        return 0;
34      else
35        ll := substr(l, 1, l_start - 1);
36        rl := substr(r, 1, r_start - 1);
37        lr := substr(l, l_start + lr_length);
38        rr := substr(r, r_start + lr_length);
39        return lr_length + lcs(ll, rl) + lcs(lr, rr);
40      end if;
41    end lcs;
42  begin
43    return 2 * lcs(l, r) / ( length(l) + length(r) );
44  end gestalt2;
45  /

Function GESTALT2 compiled

SQL> set timing on
SQL> declare
2    l_gestalt number;
3  begin
4    for rec in (select * from t) loop
5      l_gestalt := gestalt2('PRIVIL', rec.object_name);
6    end loop;
7  end;
8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.709
```
Note: T has 80306 rows August 24, 2020 - 2:12 pm UTC

Great work Stew!

### Follow-up

Alex, August 23, 2020 - 5:23 pm UTC

@Stew: Chapeau!

### Because I promised a MODEL solution

Kim Berg Hansen, August 24, 2020 - 2:56 pm UTC

It's ugly and slow, but here it is anyway ;-)

For one word-pair:

```select
w1, w2, 2 * lmx / (length(w1) + length(w2)) ro_score
from (
select *
from (
select 'WIKIMEDIA' as w1, 'REMEDIYA' as w2 from dual
)
model
partition by (w1, w2)
dimension by (0 as i, 0 as p1, 0 as p2)
measures (upper(w1) as m1, upper(w2) as m2, cast(null as varchar2(1)) as ch, 0 as mx, 0 as lmx, 0 as l1, 0 as l2)
ignore nav
rules sequential order iterate(10) until iteration_number >= l2[-1, 0, 0] (
ch[iteration_number, for p1 from 1 to length(m1[iteration_number,0,0]) increment 1, 0] = substr(m1[cv(i),0,0], cv(p1), 1)
, ch[iteration_number, 0, for p2 from 1 to length(m2[iteration_number,0,0]) increment 1] = substr(m2[cv(i),0,0], cv(p2), 1)
, mx[iteration_number, for p1 from 1 to length(m1[iteration_number,0,0]) increment 1, for p2 from 1 to length(m2[iteration_number,0,0]) increment 1]
= case ch[cv(i), cv(p1), 0] when ch[cv(i), 0, cv(p2)] then mx[cv(i), cv(p1)-1, cv(p2)-1] + 1 end
, lmx[iteration_number, 0, 0] = max(mx)[cv(i), any, any]
, l1[iteration_number, any, any] = case mx[cv(), cv(), cv()] when lmx[cv(i), 0 ,0] then cv(p1) end
, l2[iteration_number, any, any] = case mx[cv(), cv(), cv()] when lmx[cv(i), 0 ,0] then cv(p2) end
, l1[iteration_number, 0, 0] = min(l1)[cv(i), any, any]
, l2[iteration_number, 0, 0] = min(l2)[cv(i), any, any]
, lmx[-1, 0, 0] = lmx[-1, 0, 0] + lmx[iteration_number, 0, 0]
, m1[l2[-1, 0, 0]+1, 0, 0] = case when lmx[iteration_number, 0, 0] > 0 then substr(m1[iteration_number, 0, 0], 1, l1[iteration_number, 0, 0] - lmx[iteration_number, 0, 0]) end
, m2[l2[-1, 0, 0]+1, 0, 0] = case when lmx[iteration_number, 0, 0] > 0 then substr(m2[iteration_number, 0, 0], 1, l2[iteration_number, 0, 0] - lmx[iteration_number, 0, 0]) end
, l2[-1, 0, 0] = l2[-1, 0, 0] + case when m1[l2[-1, 0, 0]+1, 0, 0] is not null and m2[l2[-1, 0, 0]+1, 0, 0] is not null then 1 else 0 end
, m1[l2[-1, 0, 0]+1, 0, 0] = case when lmx[iteration_number, 0, 0] > 0 then substr(m1[iteration_number, 0, 0], l1[iteration_number, 0, 0] + 1) end
, m2[l2[-1, 0, 0]+1, 0, 0] = case when lmx[iteration_number, 0, 0] > 0 then substr(m2[iteration_number, 0, 0], l2[iteration_number, 0, 0] + 1) end
, l2[-1, 0, 0] = l2[-1, 0, 0] + case when m1[l2[-1, 0, 0]+1, 0, 0] is not null and m2[l2[-1, 0, 0]+1, 0, 0] is not null then 1 else 0 end
)
)
where (i, p1, p2) = ((-1,0,0))
order by ro_score desc nulls last;
```

```select
w1, w2, 2 * lmx / (length(w1) + length(w2)) ro_score
from (
select *
from (
select distinct 'PRIVIL' as w1, object_name as w2 from all_objects
where rownum <= 1000
)
model
partition by (w1, w2)
dimension by (0 as i, 0 as p1, 0 as p2)
measures (upper(w1) as m1, upper(w2) as m2, cast(null as varchar2(1)) as ch, 0 as mx, 0 as lmx, 0 as l1, 0 as l2)
ignore nav
rules sequential order iterate(10) until iteration_number >= l2[-1, 0, 0] (
ch[iteration_number, for p1 from 1 to length(m1[iteration_number,0,0]) increment 1, 0] = substr(m1[cv(i),0,0], cv(p1), 1)
, ch[iteration_number, 0, for p2 from 1 to length(m2[iteration_number,0,0]) increment 1] = substr(m2[cv(i),0,0], cv(p2), 1)
, mx[iteration_number, for p1 from 1 to length(m1[iteration_number,0,0]) increment 1, for p2 from 1 to length(m2[iteration_number,0,0]) increment 1]
= case ch[cv(i), cv(p1), 0] when ch[cv(i), 0, cv(p2)] then mx[cv(i), cv(p1)-1, cv(p2)-1] + 1 end
, lmx[iteration_number, 0, 0] = max(mx)[cv(i), any, any]
, l1[iteration_number, any, any] = case mx[cv(), cv(), cv()] when lmx[cv(i), 0 ,0] then cv(p1) end
, l2[iteration_number, any, any] = case mx[cv(), cv(), cv()] when lmx[cv(i), 0 ,0] then cv(p2) end
, l1[iteration_number, 0, 0] = min(l1)[cv(i), any, any]
, l2[iteration_number, 0, 0] = min(l2)[cv(i), any, any]
, lmx[-1, 0, 0] = lmx[-1, 0, 0] + lmx[iteration_number, 0, 0]
, m1[l2[-1, 0, 0]+1, 0, 0] = case when lmx[iteration_number, 0, 0] > 0 then substr(m1[iteration_number, 0, 0], 1, l1[iteration_number, 0, 0] - lmx[iteration_number, 0, 0]) end
, m2[l2[-1, 0, 0]+1, 0, 0] = case when lmx[iteration_number, 0, 0] > 0 then substr(m2[iteration_number, 0, 0], 1, l2[iteration_number, 0, 0] - lmx[iteration_number, 0, 0]) end
, l2[-1, 0, 0] = l2[-1, 0, 0] + case when m1[l2[-1, 0, 0]+1, 0, 0] is not null and m2[l2[-1, 0, 0]+1, 0, 0] is not null then 1 else 0 end
, m1[l2[-1, 0, 0]+1, 0, 0] = case when lmx[iteration_number, 0, 0] > 0 then substr(m1[iteration_number, 0, 0], l1[iteration_number, 0, 0] + 1) end
, m2[l2[-1, 0, 0]+1, 0, 0] = case when lmx[iteration_number, 0, 0] > 0 then substr(m2[iteration_number, 0, 0], l2[iteration_number, 0, 0] + 1) end
, l2[-1, 0, 0] = l2[-1, 0, 0] + case when m1[l2[-1, 0, 0]+1, 0, 0] is not null and m2[l2[-1, 0, 0]+1, 0, 0] is not null then 1 else 0 end
)
)
where (i, p1, p2) = ((-1,0,0))
order by ro_score desc nulls last;
```

But takes a couple seconds for 988 word pairs for me, so definitely not better than Stew's solution.
And this is also a lot uglier and hard to understand - but was fun to make ;-) August 24, 2020 - 5:28 pm UTC

### Apologies

Stew Ashton, October 22, 2020 - 4:16 pm UTC

I enjoyed spending an "Office Hour" on this topic with Chris and several attendees. While explaining the "pure PL/SQL" solution, I was thinking to myself "why is the looping logic so complicated???". Afterwards, I realized that plain old FOR loops could do the job just as well!

Anyway, here is a solution that puts the function in a WITH clause (as Iudith Wentzel did previously). The function is easier to understand. I do apologize for inflicting some unnecessary complications.
```with function gestalt4(l in varchar2, r in varchar2) return number is

function lcs (l in varchar2, r in varchar2) return number is
l_start pls_integer;
ll varchar2(128);
rl varchar2(128);
lr varchar2(128);
rr varchar2(128);
begin
if l is null or r is null then
return 0;
end if;
for lr_length in reverse 1..least(length(l),length(r)) loop
for r_start in 1..length(r) - lr_length + 1 loop
l_start := instr(l,substr(r,r_start,lr_length));
if l_start > 0 then
dbms_output.put_line(substr(r,r_start,lr_length));
ll := substr(l, 1, l_start - 1);
rl := substr(r, 1, r_start - 1);
lr := substr(l, l_start + lr_length);
rr := substr(r, r_start + lr_length);
return lr_length + lcs(ll, rl) + lcs(lr, rr);
end if;
end loop;
end loop;
return 0;
end lcs;

begin
return 2 * lcs(l, r) / ( nvl(length(l),0) + nvl(length(r),0) );
end gestalt4;

select gestalt4('gestalt practice', 'gestalt pattern matching') diff from dual;
/

DIFF
----------
0.6

gestalt p
a
t
e``` October 27, 2020 - 1:33 am UTC

thanks for the update

# More to Explore

##### SQL

The Oracle documentation contains a complete SQL reference.