Skip to Main Content
  • Questions
  • Automatic Selection of Quality Limited Data (row) for Export

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Cem.

Asked: October 20, 2016 - 5:15 am UTC

Last updated: October 24, 2016 - 12:34 pm UTC

Version: 10g 10.2.0

Viewed 1000+ times

You Asked

I would like to use an existing functionality or function or by what other means for automatic selection of quality data for all tables in the tablespace/schema that follows through the foreign keys relation until the end of the chain (the most complete ones rather than incomplete rows) as I will be only exporting a limited number of rows (e.g. 100).

For example, I have 3 tables:

Table 1 Employees: Emp_Id, Emp_Name (with rows: 1, Tom Brown | 2, Mike Lane | 3, Jo Goodwing)

Table 2 Orders: Order_Id, Order_Desc, Emp_Id (with rows: 1, 'Printer', 2 | 2, 'USB Cable', 3)

Table 3 Ship: Ship_Id, Order_Id (with rows: 1, 3)

For argument sake, I want to export only 1 row of quality data (as described above), then the automatic selection would choose:

Row 3 from Employees table

Row 2 from Orders table

Row 1 from Ship table

..as the chain is complete (or it would select the rows that have the most complete chain in the whole schema/tablespace)

Any suggestions how I could export 100 rows of each table in the tablespace/schema that automatically selects only the rows that have the most complete relational chain?

and Chris said...

So you want to extract rows from a parent table and its children where a row exists in every possible child table?

For example, if you have the following three tables, you only want to extract the rows with the value 1. Because that's the only value from T1 which has a child all the way down to T3:

create table t1 (
  t1_id int primary key
);
create table t2 (
  t2_id int unique,
  t1_fk int references t1(t1_id)
);
create table t3 (
  t3_id int primary key,
  t2_fk int references t2(t2_id)
);

insert into t1 values (1);
insert into t1 values (2);
insert into t1 values (3);

insert into t2 values (1, 1);
insert into t2 values (2, 2);

insert into t3 values (1, 1);


If so, I'm not aware of a way to do this "automatically". You'll need to work through the FK relationships, building up queries as you go. The easiest way to do this is start with the last child and work back towards the parent.

Each time, you want to add a subquery, finding all the rows in the previous child that exist the current table.

This will give you a series of queries like:

select * from T3 where rownum <= 10;
select * from T2 where T2_ID in ( select T2_FK from T3 ) and rownum <= 10;
select * from T1 where T1_ID in ( select T1_FK from T2 where T2_ID in (select T2_FK from T3) ) and rownum <= 10;


The starting point for this could be:

declare
  sql_stmt    varchar2(4000);
  sub_sql_stmt varchar2(4000);
  child_table user_tables.table_name%type;
  child_cols  varchar2(100);
begin
  select p.table_name, ucc.column_name
  into   child_table, child_cols 
  from   user_constraints p
  join   user_cons_columns ucc
  on     p.constraint_name = ucc.constraint_name
  where  p.table_name like 'T_'
  and    p.constraint_type in ('P', 'U')
  -- tables with no children
  and    not exists (
    select * from user_constraints c
    where  p.constraint_name = c.r_constraint_name
    and    p.owner = c.r_owner
  )
  -- that do have a parent
  and    exists (
    select * from user_constraints gp
    where  p.table_name = gp.table_name
    and    gp.constraint_type = 'R'
  );
  
  for tabs in (
    select t.table_name, pc.column_name par_col, cc.column_name child_col, level l
    from   user_tables t
    join   user_constraints p
    on     t.table_name = p.table_name 
    and    p.constraint_type in ('U', 'P')
    join   user_cons_columns pc
    on     p.constraint_name = pc.constraint_name
    left join user_constraints c
    on     t.table_name = c.table_name 
    and    c.constraint_type='R'
    left join user_cons_columns cc
    on     c.constraint_name = cc.constraint_name
    start with t.table_name = child_table
    connect by p.constraint_name = prior c.r_constraint_name
  ) loop

    child_cols := tabs.child_col;
    if tabs.l = 1 then 
      sql_stmt := 'select * from ' || tabs.table_name || ' where rownum <= 10';
      sub_sql_stmt := 'select ' || child_cols || ' from ' || tabs.table_name;
    else
      sql_stmt := 'select * from ' || tabs.table_name || 
      ' where ' || tabs.par_col || ' in ( ' || sub_sql_stmt || ' ) and rownum <= 10';
      sub_sql_stmt := 'select ' || child_cols || ' from ' || tabs.table_name || 
        ' where ' || tabs.par_col || ' in (' || sub_sql_stmt || ')';
    end if;
  
    dbms_output.put_line (sql_stmt || ';');

  end loop;
end;
/

select * from T3 where rownum <= 10;
select * from T2 where T2_ID in ( select T2_FK from T3 ) and rownum <= 10;
select * from T1 where T1_ID in ( select T1_FK from T2 where T2_ID in (select T2_FK from T3) ) and rownum <= 10;


The general solution to this requires a lot more work though. For example, say you add another table, T4, which is a child of T1:

create table t4 (
  t4_id int primary key,
  t1_fk int references t1(t1_id)
);

insert into t4 values (2, 2);


There are no rows in T3 & T4 with a common parent in T1. So what does "most complete" mean in this context? Are the chains T3 -> T2 -> T1 and T4 -> T1 both complete or neither?

In real world schemas your trees will be far messier that this. So it gets painful to validate if they all trees need a common parent. Say you have three trees, one 9 deep, one 7 deep and the last 5 deep. How do you ensure all have the same (great great great ... ) grandparent?

For a general solution you'll also need to deal with:

- Multi-column primary and foreign keys
- Self-referencing foreign keys (e.g. the classic manager_id -> emp_id FK on employees)
- Larger cycles in FKs
- Multiple PK and UCs on the same table, both with children
- ... and probably some other stuff I've forgotten

So you're probably better off manually crafting the queries you want!

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thanks

Cem Sultan, October 24, 2016 - 12:18 pm UTC

Thank you very much for your detailed response - very useful. I also thought that it would be easier to do it manually, but I asked the question just in case.
Chris Saxon
October 24, 2016 - 12:34 pm UTC

Thanks, happy to help.

More to Explore

DBMS_OUTPUT

More on PL/SQL routine DBMS_OUTPUT here