I would like to use an existing functionality or function or by what other means for automatic selection of quality data for all tables in the tablespace/schema that follows through the foreign keys relation until the end of the chain (the most complete ones rather than incomplete rows) as I will be only exporting a limited number of rows (e.g. 100).
For example, I have 3 tables:
Table 1 Employees: Emp_Id, Emp_Name (with rows: 1, Tom Brown | 2, Mike Lane | 3, Jo Goodwing)
Table 2 Orders: Order_Id, Order_Desc, Emp_Id (with rows: 1, 'Printer', 2 | 2, 'USB Cable', 3)
Table 3 Ship: Ship_Id, Order_Id (with rows: 1, 3)
For argument sake, I want to export only 1 row of quality data (as described above), then the automatic selection would choose:
Row 3 from Employees table
Row 2 from Orders table
Row 1 from Ship table
..as the chain is complete (or it would select the rows that have the most complete chain in the whole schema/tablespace)
Any suggestions how I could export 100 rows of each table in the tablespace/schema that automatically selects only the rows that have the most complete relational chain?
So you want to extract rows from a parent table and its children where a row exists in every possible child table?
For example, if you have the following three tables, you only want to extract the rows with the value 1. Because that's the only value from T1 which has a child all the way down to T3:
create table t1 (
t1_id int primary key
);
create table t2 (
t2_id int unique,
t1_fk int references t1(t1_id)
);
create table t3 (
t3_id int primary key,
t2_fk int references t2(t2_id)
);
insert into t1 values (1);
insert into t1 values (2);
insert into t1 values (3);
insert into t2 values (1, 1);
insert into t2 values (2, 2);
insert into t3 values (1, 1);
If so, I'm not aware of a way to do this "automatically". You'll need to work through the FK relationships, building up queries as you go. The easiest way to do this is start with the last child and work back towards the parent.
Each time, you want to add a subquery, finding all the rows in the previous child that exist the current table.
This will give you a series of queries like:
select * from T3 where rownum <= 10;
select * from T2 where T2_ID in ( select T2_FK from T3 ) and rownum <= 10;
select * from T1 where T1_ID in ( select T1_FK from T2 where T2_ID in (select T2_FK from T3) ) and rownum <= 10;
The starting point for this could be:
declare
sql_stmt varchar2(4000);
sub_sql_stmt varchar2(4000);
child_table user_tables.table_name%type;
child_cols varchar2(100);
begin
select p.table_name, ucc.column_name
into child_table, child_cols
from user_constraints p
join user_cons_columns ucc
on p.constraint_name = ucc.constraint_name
where p.table_name like 'T_'
and p.constraint_type in ('P', 'U')
-- tables with no children
and not exists (
select * from user_constraints c
where p.constraint_name = c.r_constraint_name
and p.owner = c.r_owner
)
-- that do have a parent
and exists (
select * from user_constraints gp
where p.table_name = gp.table_name
and gp.constraint_type = 'R'
);
for tabs in (
select t.table_name, pc.column_name par_col, cc.column_name child_col, level l
from user_tables t
join user_constraints p
on t.table_name = p.table_name
and p.constraint_type in ('U', 'P')
join user_cons_columns pc
on p.constraint_name = pc.constraint_name
left join user_constraints c
on t.table_name = c.table_name
and c.constraint_type='R'
left join user_cons_columns cc
on c.constraint_name = cc.constraint_name
start with t.table_name = child_table
connect by p.constraint_name = prior c.r_constraint_name
) loop
child_cols := tabs.child_col;
if tabs.l = 1 then
sql_stmt := 'select * from ' || tabs.table_name || ' where rownum <= 10';
sub_sql_stmt := 'select ' || child_cols || ' from ' || tabs.table_name;
else
sql_stmt := 'select * from ' || tabs.table_name ||
' where ' || tabs.par_col || ' in ( ' || sub_sql_stmt || ' ) and rownum <= 10';
sub_sql_stmt := 'select ' || child_cols || ' from ' || tabs.table_name ||
' where ' || tabs.par_col || ' in (' || sub_sql_stmt || ')';
end if;
dbms_output.put_line (sql_stmt || ';');
end loop;
end;
/
select * from T3 where rownum <= 10;
select * from T2 where T2_ID in ( select T2_FK from T3 ) and rownum <= 10;
select * from T1 where T1_ID in ( select T1_FK from T2 where T2_ID in (select T2_FK from T3) ) and rownum <= 10;
The general solution to this requires a lot more work though. For example, say you add another table, T4, which is a child of T1:
create table t4 (
t4_id int primary key,
t1_fk int references t1(t1_id)
);
insert into t4 values (2, 2);
There are no rows in T3 & T4 with a common parent in T1. So what does "most complete" mean in this context? Are the chains T3 -> T2 -> T1 and T4 -> T1 both complete or neither?
In real world schemas your trees will be far messier that this. So it gets painful to validate if they all trees need a common parent. Say you have three trees, one 9 deep, one 7 deep and the last 5 deep. How do you ensure all have the same (great great great ... ) grandparent?
For a general solution you'll also need to deal with:
- Multi-column primary and foreign keys
- Self-referencing foreign keys (e.g. the classic manager_id -> emp_id FK on employees)
- Larger cycles in FKs
- Multiple PK and UCs on the same table, both with children
- ... and probably some other stuff I've forgotten
So you're probably better off manually crafting the queries you want!