Skip to Main Content

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question.

Asked: December 13, 2016 - 12:30 am UTC

Last updated: December 14, 2016 - 2:44 pm UTC

Version: 11.2.0.1.0

Viewed 1000+ times

You Asked

I have some hierarchical data (potentially ~80 million rows, if that's important):

drop table test;
create table test (id number,pid number,t varchar2(20));
insert into test values (1,NULL,'Animalia');
insert into test values (2,1,'Mammalia');
insert into test values (3,2,'Soricidae');
insert into test values (4,3,'Sorex cinereus');
insert into test values (5,3,'Sorex ugyunak');
insert into test values (6,1,'Insecta');
insert into test values (7,6,'Danaus');


I need to return the full hierarchy; I'll always start with parent is null:

SELECT  LPAD(' ', 2 * LEVEL - 1) || t 
FROM test  
 START WITH pid is null 
 CONNECT BY PRIOR id = pid;

 Animalia
   Mammalia
     Soricidae
       Sorex cinereus
       Sorex ugyunak
   Insecta
     Danaus


But I need to filter for, example, shrews:

SELECT  LPAD(' ', 2 * LEVEL - 1) || t 
FROM test  
 START WITH pid is null 
 CONNECT BY PRIOR id = pid
 some_sort_of_magic LIKE 'Sorex%';


should return

 Animalia
   Mammalia
     Soricidae
       Sorex cinereus
       Sorex ugyunak


I cannot find a value for "some_sort_of_magic" which both includes the entire hierarchy and orders terms from "biggest" to "smallest."

and Chris said...

Here's one way to do it:

Use sys_connect_by_path to create a comma separated string of the values and their position in the hierarchy. Filter the results leaving you with just the rows that match the values you're interested in:

create table test (id number,pid number,t varchar2(20));
insert into test values (1,NULL,'Animalia');
insert into test values (2,1,'Mammalia');
insert into test values (3,2,'Soricidae');
insert into test values (4,3,'Sorex cinereus');
insert into test values (5,3,'Sorex ugyunak');
insert into test values (6,1,'Insecta');
insert into test values (7,6,'Danaus');

select sys_connect_by_path(t || '##' || level , ',') || ',' pth
from   test  
where  t like 'Sorex%'
start with pid is null 
connect by prior id = pid;

PTH                                                       
,Animalia##1,Mammalia##2,Soricidae##3,Sorex cinereus##4,  
,Animalia##1,Mammalia##2,Soricidae##3,Sorex ugyunak##4, 


Now you have this, you can convert the comma separate strings to rows with your favourite method. Throw in a bit of string manipulation to get the value and its position in the hierarchy and you can get your tree back:

with rws as (
SELECT SYS_CONNECT_BY_PATH(t || '##' || level , ',') || ',' pth
FROM test  
where t like 'Sorex%'
START WITH pid is null 
CONNECT BY PRIOR id = pid
), vals as (
  select 
  substr(pth, 
    instr(pth, '##', 1, column_value) + 2,
    ( instr(pth, ',', 1, column_value + 1) - instr(pth, '##', 1, column_value) - 2 )
  ) - 1 lev, 
  substr(pth, 
    instr(pth, ',', 1, column_value) + 1,
    ( instr(pth, '##', 1, column_value) - instr(pth, ',', 1, column_value) - 1 )
  ) val
  from rws, table ( cast ( multiset (
    select level l
    from   dual
    connect by level <= length(pth) - length(replace(pth, ','))
  ) as sys.odcinumberlist)) t
)
  select distinct lpad(' ', lev * 2) || val, lev
  from   vals
  where  val is not null
  order  by lev;

LPAD('',LEV*2)||VAL   LEV  
Animalia              0    
  Mammalia            1    
    Soricidae         2    
      Sorex cinereus  3    
      Sorex ugyunak   3

Rating

  (6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Not what I wanted to hear, but...

A reader, December 13, 2016 - 5:37 pm UTC

...thank you for the answer!
Chris Saxon
December 13, 2016 - 5:44 pm UTC

Thanks! But... what did you want to hear?

Something like this?

Duke Ganote, December 13, 2016 - 6:02 pm UTC

Caveat: not a general solution, but I'd guess hoping for something like this:

 WITH test (id ,pid, t) AS (
 SELECT 1,NULL,'Animalia' FROM DUAL UNION ALL
 SELECT 2,1,'Mammalia' FROM DUAL UNION ALL
 SELECT 3,2,'Soricidae' FROM DUAL UNION ALL
 SELECT 4,3,'Sorex cinereus' FROM DUAL UNION ALL
 SELECT 5,3,'Sorex ugyunak' FROM DUAL UNION ALL
 SELECT 6,1,'Insecta' FROM DUAL UNION ALL
 SELECT 7,6,'Danaus' FROM DUAL
)
 SELECT unique LPAD(' ', 2 * 5 - LEVEL) || t  AS resultSet
   FROM test
  START WITH t like 'Sorex%'
  CONNECT BY PRIOR pid = id
  ORDER BY 1 desc;

RESULTSET
-------------------------------------
  Animalia
   Mammalia
    Soricidae
     Sorex ugyunak
     Sorex cinereus

Chris Saxon
December 14, 2016 - 9:07 am UTC

Yes. This only works if the string your searching for appears in one place in the hierarchy:

 WITH test (id ,pid, t) AS (
 SELECT 1,NULL,'Animalia' FROM DUAL UNION ALL
 SELECT 2,1,'Mammalia' FROM DUAL UNION ALL
 SELECT 3,2,'Soricidae' FROM DUAL UNION ALL
 SELECT 4,3,'Sorex cinereus' FROM DUAL UNION ALL
 SELECT 5,3,'Sorex ugyunak' FROM DUAL UNION ALL
 SELECT 6,1,'Insecta' FROM DUAL UNION ALL
 SELECT 7,6,'Danaus' FROM DUAL UNION ALL
 SELECT 8,1,'Sorex testus' FROM DUAL
)
 SELECT unique LPAD(' ', 2 * 5 - LEVEL) || t  AS resultSet
   FROM test
  START WITH t like 'Sorex%'
  CONNECT BY PRIOR pid = id
  ORDER BY 1 desc;

RESULTSET                
      Animalia           
       Mammalia          
        Soricidae        
        Animalia         
         Sorex ugyunak   
         Sorex testus    
         Sorex cinereus  

unicorns

A reader, December 13, 2016 - 6:08 pm UTC

I wanted something that doesn't seem to exist: a SQL solution that the front end can execute. I think your approach wrapped in a pipelined function will substitute, it's just more infrastructure than I'd like to maintain. And I'm a bit concerned about performance (hopefully for no good reason) in a production environment, where the function (or whatever it ends up being) will have to deal with queries which return thousands of "rows" each containing potentially dozens of "columns."
Chris Saxon
December 14, 2016 - 9:10 am UTC

It's just a SQL statement, so anything that can execute SQL can run it! Duke has another example in the comment below you could try if you want.

When it comes to performance, test on your data. If you're finding it's too slow and you want help get the execution plan for the statement share it with us.

If you're not sure how to do this, read:

https://blogs.oracle.com/sql/entry/how_to_create_an_execution

I think this is general

Duke Ganote, December 13, 2016 - 7:13 pm UTC

Performant? that's another issue on ~80 million rows:

WITH 
  test (id ,pid, t) AS (
SELECT 1,NULL,'Animalia' FROM DUAL UNION ALL
SELECT 2,1,'Mammalia' FROM DUAL UNION ALL
SELECT 3,2,'Soricidae' FROM DUAL UNION ALL
SELECT 4,3,'Sorex cinereus' FROM DUAL UNION ALL
SELECT 5,3,'Sorex ugyunak' FROM DUAL UNION ALL
SELECT 6,1,'Insecta' FROM DUAL UNION ALL
SELECT 7,6,'Danaus' FROM DUAL)
, use_only AS (
SELECT sys_connect_by_path(id,',')||',' x
  FROM test
 START WITH t like 'Sor%'
 CONNECT BY PRIOR pid = id
)
, filtered AS (
select unique id ,pid, t
  from  test
  join use_only
    on use_only.X LIKE '%,'||test.id||',%'
)
SELECT  LPAD(' ', 2 * LEVEL - 1) || t AS resultset
  FROM filtered
 START WITH pid is null
 CONNECT BY PRIOR id = pid;


RESULTSET
---------------------
 Animalia
   Mammalia
     Soricidae
       Sorex cinereus
       Sorex ugyunak

Chris Saxon
December 14, 2016 - 9:14 am UTC

Yes, the classic walk up the tree and back down again :)

Solves the "string appearing and multiple points in the tree" problem:

WITH 
  test (id ,pid, t) AS (
SELECT 1,NULL,'Animalia' FROM DUAL UNION ALL
SELECT 2,1,'Mammalia' FROM DUAL UNION ALL
SELECT 3,2,'Soricidae' FROM DUAL UNION ALL
SELECT 4,3,'Sorex cinereus' FROM DUAL UNION ALL
SELECT 5,3,'Sorex ugyunak' FROM DUAL UNION ALL
SELECT 6,1,'Insecta' FROM DUAL UNION ALL
SELECT 7,6,'Danaus' FROM DUAL UNION ALL
 SELECT 8,1,'Sorex testus' FROM DUAL)
, use_only AS (
SELECT sys_connect_by_path(id,',')||',' x
  FROM test
 START WITH t like 'Sor%'
 CONNECT BY PRIOR pid = id
)
, filtered AS (
select unique id ,pid, t
  from  test
  join use_only
    on use_only.X LIKE '%,'||test.id||',%'
)
SELECT  LPAD(' ', 2 * LEVEL - 1) || t AS resultset
  FROM filtered
 START WITH pid is null
 CONNECT BY PRIOR id = pid;

RESULTSET              
 Animalia              
   Mammalia            
     Soricidae         
       Sorex cinereus  
       Sorex ugyunak   
   Sorex testus  

Using indexes

Stew Ashton, December 14, 2016 - 9:38 am UTC

Here is a variant that avoids a full table scan, provided there is an index on ID and an index on T:
WITH test (id ,pid, t) AS (
  SELECT 1,NULL,'Animalia' FROM DUAL UNION ALL
  SELECT 2,1,'Mammalia' FROM DUAL UNION ALL
  SELECT 3,2,'Soricidae' FROM DUAL UNION ALL
  SELECT 4,3,'Sorex cinereus' FROM DUAL UNION ALL
  SELECT 5,3,'Sorex ugyunak' FROM DUAL UNION ALL
  SELECT 6,1,'Insecta' FROM DUAL UNION ALL
  SELECT 7,6,'Danaus' FROM DUAL UNION ALL
   SELECT 8,1,'Sorex testus' FROM DUAL
)
, filtered AS (
  select distinct * from test a
  start with t like 'Sorex%'
  connect by id = prior pid
)
SELECT LPAD(' ', 2 * LEVEL - 1) || t AS resultset
FROM filtered
START WITH pid is null
CONNECT BY PRIOR id = pid;

RESULTSET
---------
 Animalia
   Mammalia
     Soricidae
       Sorex cinereus
       Sorex ugyunak
   Sorex testus

Chris Saxon
December 14, 2016 - 10:17 am UTC

... and your search condition is selective enough to use the index ;)

But thanks for sharing.

@unicorns

GJ, December 14, 2016 - 2:05 pm UTC

Since you mentioned about having a front end that would be using this hierarchial data, perhaps you can have a look at usign the /*+RESULT_CACHE*/ hint in the hierarchial query. This would mean in the web interface the cached data is used for similar kind of query clause.
Chris Saxon
December 14, 2016 - 2:44 pm UTC

That's only helpful if the underlying data remains the same and you're executing the same query over and over.

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:9532815400346593859