I have some hierarchical data (potentially ~80 million rows, if that's important):
drop table test;
create table test (id number,pid number,t varchar2(20));
insert into test values (1,NULL,'Animalia');
insert into test values (2,1,'Mammalia');
insert into test values (3,2,'Soricidae');
insert into test values (4,3,'Sorex cinereus');
insert into test values (5,3,'Sorex ugyunak');
insert into test values (6,1,'Insecta');
insert into test values (7,6,'Danaus');
I need to return the full hierarchy; I'll always start with parent is null:
SELECT LPAD(' ', 2 * LEVEL - 1) || t
FROM test
START WITH pid is null
CONNECT BY PRIOR id = pid;
Animalia
Mammalia
Soricidae
Sorex cinereus
Sorex ugyunak
Insecta
Danaus
But I need to filter for, example, shrews:
SELECT LPAD(' ', 2 * LEVEL - 1) || t
FROM test
START WITH pid is null
CONNECT BY PRIOR id = pid
some_sort_of_magic LIKE 'Sorex%';
should return
Animalia
Mammalia
Soricidae
Sorex cinereus
Sorex ugyunak
I cannot find a value for "some_sort_of_magic" which both includes the entire hierarchy and orders terms from "biggest" to "smallest."
Here's one way to do it:
Use sys_connect_by_path to create a comma separated string of the values and their position in the hierarchy. Filter the results leaving you with just the rows that match the values you're interested in:
create table test (id number,pid number,t varchar2(20));
insert into test values (1,NULL,'Animalia');
insert into test values (2,1,'Mammalia');
insert into test values (3,2,'Soricidae');
insert into test values (4,3,'Sorex cinereus');
insert into test values (5,3,'Sorex ugyunak');
insert into test values (6,1,'Insecta');
insert into test values (7,6,'Danaus');
select sys_connect_by_path(t || '##' || level , ',') || ',' pth
from test
where t like 'Sorex%'
start with pid is null
connect by prior id = pid;
PTH
,Animalia##1,Mammalia##2,Soricidae##3,Sorex cinereus##4,
,Animalia##1,Mammalia##2,Soricidae##3,Sorex ugyunak##4,
Now you have this, you can convert the comma separate strings to rows with your favourite method. Throw in a bit of string manipulation to get the value and its position in the hierarchy and you can get your tree back:
with rws as (
SELECT SYS_CONNECT_BY_PATH(t || '##' || level , ',') || ',' pth
FROM test
where t like 'Sorex%'
START WITH pid is null
CONNECT BY PRIOR id = pid
), vals as (
select
substr(pth,
instr(pth, '##', 1, column_value) + 2,
( instr(pth, ',', 1, column_value + 1) - instr(pth, '##', 1, column_value) - 2 )
) - 1 lev,
substr(pth,
instr(pth, ',', 1, column_value) + 1,
( instr(pth, '##', 1, column_value) - instr(pth, ',', 1, column_value) - 1 )
) val
from rws, table ( cast ( multiset (
select level l
from dual
connect by level <= length(pth) - length(replace(pth, ','))
) as sys.odcinumberlist)) t
)
select distinct lpad(' ', lev * 2) || val, lev
from vals
where val is not null
order by lev;
LPAD('',LEV*2)||VAL LEV
Animalia 0
Mammalia 1
Soricidae 2
Sorex cinereus 3
Sorex ugyunak 3