Skip to Main Content
  • Questions
  • Find all the Fully Connected Subgraphs

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Siva .

Asked: July 08, 2019 - 11:27 am UTC

Last updated: July 22, 2019 - 10:05 am UTC

Version: all versions

Viewed 1000+ times

You Asked

Hi Team ,

with   data ( p1 , p2 ) as 
( select 'a' ,'b' from dual union all
  select 'a' ,'c' from dual union all
  select 'a' ,'d' from dual union all

  select 'b' ,'c' from dual union all
  select 'b' ,'d' from dual union all

  select 'c' ,'d' from dual union all
  select 'c' ,'e' from dual union all
  select 'c' ,'f' from dual union all

  select 'd' ,'e' from dual union all

  select 'e' ,'f' from dual 
)
select * from data ;


my data is like above .

Assume those columns data is nothing but persons info .relation between these two columns is knows each other ( a knows b and b knows a i.e) bi-directional ).

with this data i want to find out the groups based on below conditions .
1) each group must contain more than 2 persons.
2) one person should know , remaining persons in the same group.


expected o/p is :
-----------------

(a,b,c)
(a,b,c,d)
(b,c,d)
(c,d,e)

i missed to add below o/p in previous post

(c,e,f)
(a,c,d)
(a,b,d)


and Chris said...

An interesting problem!

I reached out on Twitter for solutions to this. You can find the community answers in the following Live SQL script:

https://livesql.oracle.com/apex/livesql/file/content_IOHVI2K4AYK6GOMOAVG93YBF6.html

For brevity here, I'll just discuss Stew Ashton's optimized solution.

This works by:

- Using recursive with to traverse the graph
- For each new vertex, it:
- Adds it to a nested table array of vertices
- Builds a table with an edge from the new vertex to all the existing vertices in the subgraph
- For these edges, checks there are NO edges which are NOT in the existing edge set. i.e. every generated edge is in the original graph.

So this is saying, for each new vertex:

Does this have an edge to all the existing vertices in the subgraph?

If no, this does NOT form a fully connected subgraph. If yes, it does!

In SQL, this looks something like:

create table relations (  
  p1 varchar2(1), p2 varchar2(1), 
  primary key ( p1, p2 ), 
  check ( p2 > p1 ) 
);

insert into relations   
  with rws ( p1 , p2 ) as (   
    select 'a' ,'b' from dual union all  
    select 'a' ,'c' from dual union all  
    select 'a' ,'d' from dual union all  
    
    select 'b' ,'c' from dual union all  
    select 'b' ,'d' from dual union all  
    
    select 'c' ,'d' from dual union all  
    select 'c' ,'e' from dual union all  
    select 'c' ,'f' from dual union all  
    
    select 'd' ,'e' from dual union all  
    
    select 'e' ,'f' from dual   
  )  
    select * from rws;

create or replace type ntt as table of varchar2(1);
/

with data(p, p1, p2, coll) as (   
  select a.p1, b.p1, b.p2, ntt(a.p1, a.p2, b.p2)   
  from   relations a    
  join   relations b   
  on     a.p2 = b.p1   
  and    (a.p1, b.p2) in (select * from relations)   
  union all   
  select a.p1, b.p1, b.p2, a.coll multiset union ntt(b.p2)   
  from   data a    
  join   relations b   
  on     a.p2 = b.p1   
  and (   
    select count(*)   
    from   table ( a.coll )   
    where  ( column_value, b.p2 ) not in (   
      select * from relations   
    )   
  ) = 0   
) cycle p, p1, p2 set is_cycle to '1' default '0'   
  select ps   
  from   data, lateral(   
    select '(' || listagg ( column_value, ',' )   
             within group ( order by column_value ) || ')' ps   
    from   table(coll)   
 );
 
PS          
(a,b,c)      
(a,b,d)      
(a,c,d)      
(b,c,d)      
(c,d,e)      
(c,e,f)      
(a,b,c,d)  


Note: for this problem, the worst-case number of subgraphs is 2^number of vertices. So for a trivial, 100 vertex graph the number of possible subgraphs is a number with thirty digits!

This could be exceptionally slow!

To resolve this, you may want to calculate the subgraphs on write. See the package & trigger in the Live SQL script for this method. Note this moves the calculation to insert/delete time. Which brings its own issues...

Or, it's worth asking: do you really need all the fully connected subgraphs?

I'm guessing no. In which case you may be looking for strongly connected components (or similar). This is more feasible to run on large graphs :)

You can use our graph technologies to do this. Hans Viehmann covered these in our SQL Office Hours session on 16th July, where we discussed this problem.

The recording for this will be available soon. When it's live you can access it at:

https://asktom.oracle.com/pls/apex/f?p=100:551:::NO:551:P551_CLASS_ID,P551_STUDENT_ID:6204&cs=11921B2B179A7F0F930F234DE1BF1DCFC

Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Great work

A reader, July 17, 2019 - 2:06 pm UTC

Hi Chris ,

Thanks for the solution .
Thanks for the people who helped to achieve the desired result.

I was not able to attend the #AskTOMOfficeHours . Sure i will check it offline .

Really you guys( Great Minds ) did a fantastic job .

I would like to see more answers from readers/experts in other ways ( by using MODEL clause , Match_Recognize ...) to show how good ORACLE is .


I need to add one more point here .
If (a,b,c,d) exists , then no need to show subgroups (a,b,c) ,(a,b,d), (a,c,d) and (b,c,d).



Chris Saxon
July 22, 2019 - 10:05 am UTC

This is starting to sound more like a "find all the strongly connected components" problem. In which case you may be better off using graph technologies.

Iudith's submitted a couple of solutions below; I'm sure others will too in time.

Additional requirement

Iudith Mentzel, July 19, 2019 - 5:52 pm UTC

Hello All,

You can extend the previous solution as follows:

create or replace view v_graph as
with data(p, p1, p2, coll) as (
select a.p1, b.p1, b.p2, ntt(a.p1, a.p2, b.p2)
from relations a
join relations b
on a.p2 = b.p1
and (a.p1, b.p2) in (select * from relations)
union all
select a.p1, b.p1, b.p2, a.coll multiset union ntt(b.p2)
from data a
join relations b
on a.p2 = b.p1
and (
select count(*)
from table ( a.coll )
where ( column_value, b.p2 ) not in (
select * from relations
)
) = 0
) cycle p, p1, p2 set is_cycle to '1' default '0'
--
select * from data
/

View created.

select ps
from v_graph d,
lateral(
select '(' || listagg ( column_value, ',' )
within group ( order by column_value ) || ')' ps
from table(coll)
)
where not exists (
select 'x' from v_graph d2
where d.coll submultiset of d2.coll
and d.coll <> d2.coll
)
/

PS
-------------------
(c,d,e)
(c,e,f)
(a,b,c,d)

3 rows selected.


But, it is interesting to note that the CREATE VIEW step is essential here !

If we just try to add another WITH clause, with the additional condition,
we get the following error:

with data(p, p1, p2, coll) as (
select a.p1, b.p1, b.p2, ntt(a.p1, a.p2, b.p2)
from relations a
join relations b
on a.p2 = b.p1
and (a.p1, b.p2) in (select * from relations)
union all
select a.p1, b.p1, b.p2, a.coll multiset union ntt(b.p2)
from data a
join relations b
on a.p2 = b.p1
and (
select count(*)
from table ( a.coll )
where ( column_value, b.p2 ) not in (
select * from relations
)
) = 0
) cycle p, p1, p2 set is_cycle to '1' default '0'
, max_data as (
select * from data
where not exists (
select 'x' from data d2
where data.coll submultiset of d2.coll
and data.coll <> d2.coll
)
)
--
select ps
from max_data, lateral(
select '(' || listagg ( column_value, ',' )
within group ( order by column_value ) || ')' ps
from table(coll)
)
/

ORA-32036: unsupported case for inlining of query name in WITH clause


And, if I try to force materializing the recursive query result, I get an even stranger error:

with data(p, p1, p2, coll) as (
select a.p1, b.p1, b.p2, ntt(a.p1, a.p2, b.p2)
from relations a
join relations b
on a.p2 = b.p1
and (a.p1, b.p2) in (select * from relations)
union all
select a.p1, b.p1, b.p2, a.coll multiset union ntt(b.p2)
from data a
join relations b
on a.p2 = b.p1
and (
select count(*)
from table ( a.coll )
where ( column_value, b.p2 ) not in (
select * from relations
)
) = 0
) cycle p, p1, p2 set is_cycle to '1' default '0'
, mat_data (p, p1, p2, coll) as (
select /*+ MATERIALIZE */ p, p1, p2, coll
from data
)
, max_data (p, p1, p2, coll) as (
select p, p1, p2, coll
from mat_data d
where not exists (
select 'x' from mat_data d2
where d.coll submultiset of d2.coll
and d.coll <> d2.coll
)
)
--
select * from max_data
/

ORA-32039: recursive WITH clause must have column alias list


The so-called "naive" solution does NOT have any of these problems.


I think that it is a little simplistic to expect ANY Oracle feature to be "easily throwable"
at ANY problem, be it MATCH_RECOGNIZE, or MODEL clause or anything else,
just to show "how good Oracle is" !

As Chris underlined, there exists a separate product, probably more fitted for solving
such kind of problems, so Oracle DOES offer a solution !

If I'm allowed a personal opinion, when anybody submits a question to the AskTOM team,
I would like to see the poster presenting his own endeavors in solving the problem,
rather than just expecting from others to supply the largest variety of imaginable solutions ...

Thanks a lot & Best Regards,
Iudith














Another variant for the maximal graph

Iudith Mentzel, July 22, 2019 - 8:49 am UTC

Hello All,

Here is another variant for finding the maximal fully connected subgraphs,
in which is recursive WITH query behaves "more compliantly" versus using it in subsequent conditions, without requiring the view creation as a mid-step to avoid further errors:

with nodes as (
select p1 p from relations
union
select p2 from relations
),
data(p, p1, p2, coll) as (
select a.p1, b.p1, b.p2, ntt(a.p1, a.p2, b.p2)
from relations a
join relations b
on a.p2 = b.p1
and (a.p1, b.p2) in (select * from relations)
union all
select a.p1, b.p1, b.p2, a.coll multiset union ntt(b.p2)
from data a
join relations b
on a.p2 = b.p1
and ( select count(*)
from relations r
where r.p1 member of a.coll
and r.p2 = b.p2 ) = cardinality( a.coll )
) cycle p, p1, p2 set is_cycle to '1' default '0'
--
select ps
from data, lateral(
select '(' || listagg ( column_value, ',' )
within group ( order by column_value ) || ')' ps
from table(coll)
)
where not exists (
select 'x' from nodes n
where n.p not member of data.coll
and ( select count(*)
from table( data.coll ) g,
relations r
where r.p1 = least(n.p, g.column_value)
and r.p2 = greatest(n.p, g.column_value)
) = cardinality( data.coll )
)
/

PS
---------------
(c,d,e)
(c,e,f)
(a,b,c,d)

3 rows selected.


Looks like using SUBMULTISET OF conditions on data.coll columns is "too restrictive",
probably requiring internally a join of the DATA result set to itself,
while using just conditions like "MEMBER OF data.coll" and unnesting with TABLE (data.coll)
are "lighter" to apply without encountering errors.

If anyone could find a way to "enforce materialization" of the recursive query
in a single SELECT (without using a VIEW), it would be nice if he/she could post it here.

Thanks a lot & Best Regards,
Iudith

Chris Saxon
July 22, 2019 - 9:55 am UTC

Nice solutions Iudith!

I've not found a way to avoid the ORA-32039 issue in your previous solution.

This method is coming out faster for me though, so it's probably best sticking with this ;)

Though any NOT EXISTS solutions will further slow processing down. I'm not sure there's a way to avoid this in a single statement. Pre-computing the result or using graph technologies may be the way to solve this problem.

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.