Skip to Main Content
  • Questions
  • Doing joins on denormalized data in tables

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, Sagar.

Asked: August 20, 2018 - 5:05 am UTC

Last updated: August 22, 2018 - 3:50 am UTC

Version: 18.3.0.0.0

Viewed 1000+ times

You Asked

Hi Team

RDBMS version: 12.2
OEL: 7.5

Trying to evaluate new table structure to query data, which I've heard is very widely used these days in e-commerce companies at scale. Though most of those companies might be using NoSQL solutions for the same.

Brief desc about tables:
Tab1: Data for storing Orgs
Tab2: Data for storing all entities (Eg:- User, Role) in Orgs
Tab3: Data for storing all columns for all entities (Eg:- User) in Orgs
Tab4: Data for storing no of instances for all entities in Orgs (Eg:- How many users in Org)
Tab5: Data for storing values of all columns for all entities (Eg:- User) in Orgs

In normalized scheme of things, you would have individual tables for the respective data in TAB2.NAME column with:
- TAB3.NAME containing the corresponding column names for those tables
- TAB5.VALUE containing the corresponding values for those table columns

LiveSQL script contains the re-producible use case I am trying to run.
Issue: Not able to perform join operation using two entities to get desired result.

Thanks in advance..

with LiveSQL Test Case:

and Connor said...

Wow.... I cannot tell you how much I *hate* those kind of models. You're basically going to be doomed here. I know that sounds dramatic, but history is littered with these disasters, eg

https://www.red-gate.com/simple-talk/opinion/opinion-pieces/bad-carma/

https://asktom.oracle.com/pls/asktom/asktom.search?tag=query-on-design

https://asktom.oracle.com/pls/asktom/asktom.search?tag=dynamic-query-containing-blob-which-contains-different-field

The easiest proof here is the very question you've asked - it should be a simple requirement, yet you already are into effectively a 10 table join ( 2 x 5) and you still don't have exactly the answer you are after.

I can't stress enough - stop here. That data model won't succeed. Period.

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Excellent and job-saving advice!

Stew Ashton, August 22, 2018 - 10:53 am UTC

I would just add that even your horrible data model is not perfect: you put the User's name in the Role>Assigned-to field instead of putting the Id.

Also, in your query you don't even need the Role, since you have the User's name as input in the first place. I realize you were trying to come up with a less simple query. How about starting from the Role Id, not the assigned_to name?

I am going to try to come up with a comprehensible answer to your question, only because I think it gives you a stronger argument to give up this approach. If you just say "I don't know how to do this" the deciders will blame you and not the approach.

To simplify, I'm going to create views on each table to keep the columns of interest and avoid duplicate column names.
create or replace view v1 as
select id tab1id, name name1 from tab1;
create or replace view v2 as
select id tab2id, name name2, tab1id from tab2;
create or replace view v3 as
select id tab3id, tab1id, name name3, type type3, tab2id from tab3;
create or replace view v4 as
select id tab4id, tab1id, tab2id from tab4;
create or replace view v5 as
select TAB1ID, TAB2ID, TAB3ID, TAB4ID, id tab5id, value value5 from tab5;
Now here's a first approach that just joins all the views, filters to get only the data needed, then follows the path from the Role Id to the managername.
with joined_data as (
  select NAME2, NAME3, TAB4ID, VALUE5
  from v1 join v2 using(tab1id)
  join v3 using(tab1id, tab2id)
  join v4 using(tab1id, tab2id)
  join v5 using(tab1id, tab2id, tab3id, tab4id)
  where name1 = 'Org1'
  and (name2, name3) in (
    ('User','name'), ('User','managername'), ('Role','id'), ('Role','assigned_to')
  )
)
select value5 from joined_data
where (name2, name3) = (('User','managername'))
and tab4id = (
  select tab4id from joined_data
  where (name2, name3) = (('User','name'))
    and value5 = (
      select value5 from joined_data
      where (name2, name3) = (('Role','assigned_to'))
        and tab4id = (
          select tab4id from joined_data
          where (name2, name3) = (('Role','id'))
          and value5 = '1'
        )
    )
);

VALUE5
------------
ManagerTest1
Now here is an alternative approach based on views that "normalize" the data.
create or replace view v_orgs as
select id org_id, name org_name from tab1;

create or replace view v_users as
select tab1id org_id,
  to_number(id) id,
  name,
  to_date(begindate, 'dd-Mon-rr') begindate,
  managername
from (
  select TAB1ID, TAB4ID, NAME3, VALUE5
  from v2
  join v3 using(tab1id, tab2id)
  join v4 using(tab1id, tab2id)
  join v5 using(tab1id, tab2id, tab3id, tab4id)
  where name2 = 'User'
)
pivot(max(value5) for name3 in (
  'id' id,'name' name,'begindate' begindate,'managername' managername)
);

create or replace view v_roles as
select tab1id org_id,
  to_number(id) id,
  rolename,
  assigned_to,
  to_date(startdate, 'dd-Mon-rr') startdate  
from (
  select TAB1ID, TAB4ID, NAME3, VALUE5
  from v2
  join v3 using(tab1id, tab2id)
  join v4 using(tab1id, tab2id)
  join v5 using(tab1id, tab2id, tab3id, tab4id)
  where name2 = 'Role'
)
pivot(max(value5) for name3 in (
  'id' id,'rolename' rolename,'assigned_to' assigned_to,'startdate' startdate)
);

select ORG_ID, ROLENAME, ASSIGNED_TO, MANAGERNAME
from v_roles r
join v_users u using(org_id)
where r.assigned_to = u.name;

ORG_ID   ROLENAME    ASSIGNED_TO   MANAGERNAME    
1        TestRole1   TestUser1     ManagerTest1   
1        TestRole2   TestUser2     ManagerTest2

What you should do is create tables instead of views, then get rid of the old tables and use the new ones. Your query becomes just four lines of code.

Best regards,
Stew Ashton

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.