Skip to Main Content
  • Questions
  • Group multiple rows from Oracle Table where two rows have few matching column values along with having similar related rows

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Sudhee.

Asked: August 28, 2023 - 3:02 pm UTC

Last updated: September 12, 2023 - 1:02 pm UTC

Version: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0

Viewed 1000+ times

You Asked

Hi All,
I am a new developer to Oracle SQL and PL/SQL. I have the below scenario, if you could please help me find the most optimal solution for this I would be very grateful. If there is any other thread already which shows the solution to similar scenario please let me know. Thanks in advance!

I have a Table with the below set of Rows (Link for Create table and Insert data into table present in the LiveSQL link).

Post Insertion table will look as shown below:
----------------------------------------------
Asset_ID Item_Name Category Vendor  Location_ID Unit_Price Quantity Parent_Or_Child
-------- --------- -------- ------  ----------- ---------- -------- ----------------
100.000  CPU  Hardware 10001  12345  1000  1  Parent
100.100  Mouse  Hardware 10001  12345  50  2  Child
100.100  Keyboard Hardware 10001  12345  100  1  Child 
100.100  OS  Software 10001  12345  200  1  Child
100.100  Assemble Install  10001  12345  150  1  Child
100.100  Shipping Freight  10001  12345  100  1  Child

101.000  Laptop  Hardware 10002  14567  800  1    Parent 
101.101  Mouse  Hardware 10002  14567  50  2  Child
101.101  Keyboard Hardware 10002  14567  100  1  Child
101.101  OS  Software 10002  14567  200  1  Child

102.000  CPU  Hardware 10001  12345  1000  1  Parent
102.102  Mouse  Hardware 10001  12345  50  2  Child 
102.102  Keyboard Hardware 10001  12345  100  1  Child
102.102  OS  Software 10001  12345  200  1  Child
102.102  Assemble Install  10001  12345  150  1  Child
102.102  Shipping Freight  10001  12345  100  1  Child

103.000  Laptop  Hardware 10001  12345  1000  1  Parent
103.103  Mouse  Hardware 10001  12345  50  2  Child
103.103  Keyboard Hardware 10001  12345  100  1  Child
103.103  OS  Software 10001  12345  200  1  Child
103.103  Assemble Install  10001  12345  150  1  Child
103.103  Shipping Freight  10001  12345  100  1  Child


Any row having an Asset ID ending with .000 is a Parent and items ending with .100, .101 etc. are the child records. So in the above data set, 100.000 is the Parent and 100.100 are its Child records. There is also another column 'Parent_Or_Child' which identifies whether the row is a Parent or Child. The Asset_Id column indicates the row is a child of which parent (ex 100.100 is a child of 100.000, 101.101 is a child of of 101.000, etc.). All this setup is done by another system and we get the table with these details filled.

There will be only one level of Parent - Child, i.e. a child wont have another Child record of its own.

The requirement is to combine multiple Parent and Child rows into a single Parent and its corresponding Child lines where
1. The Parents have same values of:
Item_Name
Category
Vendor
Location_ID
Unit_Price

2. The Parents should have the same number of Child records.

3. The Child record under one Parent should have a corresponding Child record in the other Parent having the same values of the below columns:
Item_Name
Category
Vendor
Location_ID
Unit_Price

The Quantity of such matching rows should be summed up when being combined.

So in the above data set, Asset_ID 100.000 and 102.000 satisfy Point # 1,2 and 3. They should be considered a single combined row as shown in the below final data set.
The Quantity is summed up when combining the matching Asset lines.

I tried to do an Inner Join on the table and identify the matching records, but after this I wasnt able to figure out how can I combine the matching records, sum up the quantities and find a way to link the Parent and Child of the newly combined rows.

Could you please let me know how Point#1, 2 and 3 can be done either through SQL or PL/SQL? Even if the final data needs to be inserted into another fresh table for the solution that would be fine.


Final Data Set Required:-
===========================

Asset_ID Item_Name Category Vendor Location_ID Unit_Price Quantity Parent_Or_Child
--------- --------- -------- ------- ----------- ---------- -------- ---------------
--Below is how the combined row of Asset_ID 100.000 and 102.000 along with its Child rows should come up in the final output since these 2 sets of rows satisfy the criteria given in Point # 1 and 2. I have given a new Asset_ID for this combined row that can be used to link the Parent with the Child
200.000 CPU Hardware 10001 12345 1000 2 Parent

200.200 Mouse Hardware 10001 12345 50 4 Child
200.200 Keyboard Hardware 10001 12345 100 2 Child
200.200 OS Software 10001 12345 200 2 Child
200.200 Assemble Install 10001 12345 150 2 Child
200.200 Shipping Freight 10001 12345 100 2 Child


--Below is how the rest of the rows that werent able to be combined should come up in the final output since they dont match the criteria and need to be shown/fetched in their original state
101.000 Laptop Hardware 10002 14567 800 1 Parent
101.101 Mouse Hardware 10002 14567 50 2 Child
101.101 Keyboard Hardware 10002 14567 100 1 Child
101.101 OS Software 10002 14567 200 1 Child

103.000 Laptop Hardware 10001 12345 1000 1 Parent
103.103 Mouse Hardware 10001 12345 50 2 Child
103.103 Keyboard Hardware 10001 12345 100 1 Child
103.103 OS Software 10001 12345 200 1 Child
103.103 Assemble Install 10001 12345 150 1 Child
103.103 Shipping Freight 10001 12345 100 1 Child


Insert statements for the Final Data Set Required:
--------------------------------------------------
I am adding the Insert statements for the above shown Final Data set for reference if it helps visualize better by selecting from the table as I wasnt able to add it to the LiveSQL Link.

DELETE FROM ASSETS;

COMMIT;

insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('200.000', 'CPU', 'Hardware', 10001, 12345, 1000, 2,'Parent');
insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('200.200', 'Mouse', 'Hardware', 10001, 12345, 50, 4, 'Child');
insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('200.200', 'Keyboard', 'Hardware', 10001, 12345, 100, 2, 'Child');
insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('200.200', 'OS', 'Software', 10001, 12345, 200, 2, 'Child');
insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('200.200', 'Assemble', 'Install', 10001, 12345, 150, 2, 'Child');
insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('200.200', 'Shipping', 'Freight', 10001, 12345, 100, 2, 'Child');


insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('101.000', 'Laptop', 'Hardware', 10002, 14567, 800, 1,'Parent');
insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('101.101', 'Mouse', 'Hardware', 10002, 14567, 50, 2, 'Child');
insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('101.101', 'Keyboard', 'Hardware', 10002, 14567, 100, 1, 'Child');
insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('101.101', 'OS', 'Software', 10002, 14567, 200, 1, 'Child');


insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('103.000', 'Laptop', 'Hardware', 10001, 12345, 1000, 1,'Parent');
insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('103.103', 'Mouse', 'Hardware', 10001, 12345, 50, 2, 'Child');
insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('103.103', 'Keyboard', 'Hardware', 10001, 12345, 100, 1, 'Child');
insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('103.103', 'OS', 'Software', 10001, 12345, 200, 1, 'Child');
insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('103.103', 'Assemble', 'Install', 10001, 12345, 150, 1, 'Child');
insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('103.103', 'Shipping', 'Freight', 10001, 12345, 100, 1, 'Child');

Commit;

SELECT * FROM ASSETS;


with LiveSQL Test Case:

and Chris said...

So you want to:

- Find the asset id groups that are identical
- Combine the identical groups into one set of rows
- If there's more than one group for an asset set, return the total quantity for each item and assign new asset ids

Correct?

If so, here's one way to approach it:

- Use a recursive query to group together rows by asset
- Combine the rows for each asset into one JSON array, with each item as a JSON object within it
- Match identical groups by group by these JSON arrays

Recursive query

Start each group with the parent row. Then build the tree by checking if:

The parent asset ID equals the number after the dot for each child. Assuming these are always in XXX.YYY format, you can use this formula:

( asset_id - trunc ( asset_id ) ) * 1000

If not, you change this as needed.

Add the parent asset ID to every row in each group. For example:

select connect_by_root asset_id rt, a.*
from   assets a
start with parent_or_child = 'Parent'
connect by ( asset_id - trunc ( asset_id ) ) * 1000 = prior asset_id


JSON array

Group the result of this query by the parent asset and use JSON_arrayagg to combine the rows into one array:

with tree as (
  select connect_by_root asset_id rt, a.*
  from   assets a
  start with parent_or_child = 'Parent'
  connect by ( asset_id - trunc ( asset_id ) ) * 1000 = prior asset_id
)
  select rt, json_arrayagg (
           json_object ( item_name, category, vendor, location_id, unit_price )
           order by item_name, category, vendor, location_id, unit_price
         ) grp
  from   tree
  group  by rt


Combine identical groups

Join the generated tree to the generated groups on the parent asset ID. Combine rows from identical sets by grouping by the generated JSON array and the column values.

Sum up the quantities.

I've generated a new asset ID by summing the distinct parent IDs. So you'll need to do some work to get these back in XXX.YYY format; hopefully there's enough here to get you going:

with tree as (
  select connect_by_root asset_id rt, a.*
  from   assets a
  start with parent_or_child = 'Parent'
  connect by ( asset_id - trunc ( asset_id ) ) * 1000 = prior asset_id
), grps as (
  select rt, json_arrayagg (
           json_object ( item_name, category, vendor, location_id, unit_price )
           order by item_name, category, vendor, location_id, unit_price
         ) grp
  from   tree
  group  by rt
)
  select sum ( distinct t.rt ),
         item_name, category, vendor, location_id, unit_price,
         sum ( quantity ) quantity
  from   grps g
  join   tree t
  on     t.rt = g.rt
  group  by grp, item_name, category, vendor, location_id, unit_price
  order  by min ( t.rt );
  
SUM(DISTINCTT.RT)    ITEM_NAME    CATEGORY       VENDOR    LOCATION_ID    UNIT_PRICE    QUANTITY    
                 202 OS           Software        10001          12345           200           2 
                 202 CPU          Hardware        10001          12345          1000           2 
                 202 Mouse        Hardware        10001          12345            50           4 
                 202 Assemble     Install         10001          12345           150           2 
                 202 Keyboard     Hardware        10001          12345           100           2 
                 202 Shipping     Freight         10001          12345           100           2 
                 
                 101 Laptop       Hardware        10002          14567           800           1 
                 101 Keyboard     Hardware        10002          14567           100           1 
                 101 Mouse        Hardware        10002          14567            50           2 
                 101 OS           Software        10002          14567           200           1 
                 
                 103 Keyboard     Hardware        10001          12345           100           1 
                 103 Shipping     Freight         10001          12345           100           1 
                 103 Assemble     Install         10001          12345           150           1 
                 103 Laptop       Hardware        10001          12345          1000           1 
                 103 Mouse        Hardware        10001          12345            50           2 
                 103 OS           Software        10001          12345           200           1 


This does make some assumptions - most importantly that the JSON array for each asset group will be small enough to fit in a VARCHAR2 (4,000 bytes by default). You can't group by LOBs, so if they're bigger than this you'll need another way.

Table and sample data copied from Live SQL:

create table assets (
  asset_id        varchar2 (30),
  item_name       varchar2 (240),
  category        varchar2 (30),
  vendor          number,
  location_id     number,
  unit_price      number,
  quantity        number,
  parent_or_child varchar2 (30)
);

begin
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('100.000', 'CPU', 'Hardware', 10001, 12345, 1000, 1,'Parent');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('100.100', 'Mouse', 'Hardware', 10001, 12345, 50, 2, 'Child');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('100.100', 'Keyboard', 'Hardware', 10001, 12345, 100, 1, 'Child');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('100.100', 'OS', 'Software', 10001, 12345, 200, 1, 'Child');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('100.100', 'Assemble', 'Install', 10001, 12345, 150, 1, 'Child');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('100.100', 'Shipping', 'Freight', 10001, 12345, 100, 1, 'Child');

 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('101.000', 'Laptop', 'Hardware', 10002, 14567, 800, 1,'Parent');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('101.101', 'Mouse', 'Hardware', 10002, 14567, 50, 2, 'Child');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('101.101', 'Keyboard', 'Hardware', 10002, 14567, 100, 1, 'Child');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('101.101', 'OS', 'Software', 10002, 14567, 200, 1, 'Child');

 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('102.000', 'CPU', 'Hardware', 10001, 12345, 1000, 1,'Parent');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('102.102', 'Mouse', 'Hardware', 10001, 12345, 50, 2, 'Child');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('102.102', 'Keyboard', 'Hardware', 10001, 12345, 100, 1, 'Child');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('102.102', 'OS', 'Software', 10001, 12345, 200, 1, 'Child');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('102.102', 'Assemble', 'Install', 10001, 12345, 150, 1, 'Child');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('102.102', 'Shipping', 'Freight', 10001, 12345, 100, 1, 'Child');

 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('103.000', 'Laptop', 'Hardware', 10001, 12345, 1000, 1,'Parent');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('103.103', 'Mouse', 'Hardware', 10001, 12345, 50, 2, 'Child');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('103.103', 'Keyboard', 'Hardware', 10001, 12345, 100, 1, 'Child');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('103.103', 'OS', 'Software', 10001, 12345, 200, 1, 'Child');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('103.103', 'Assemble', 'Install', 10001, 12345, 150, 1, 'Child');
 insert into Assets (Asset_ID, Item_Name, Category, Vendor, Location_ID, Unit_Price, Quantity, Parent_Or_Child) values ('103.103', 'Shipping', 'Freight', 10001, 12345, 100, 1, 'Child');
end;
/

Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Alternate solution for data more than 4000 characters

Sudhee, August 31, 2023 - 2:34 pm UTC

Hi Tom,
Thanks for your response and also for understanding the problem so succintly!

I will check on getting the Asset IDs back in XXX.YYY format.

Is there any alternate solution or can the current solution be modified to handle the 4000 Character limitation of the JSON Array? Performance or space is not an issue and if there is any solution through PLSQL etc. which might take longer or needs to store data in a temp table is also ok.

Also in the current solution that you had provided, I couldn't identify which is the parent line in the final combined set of rows. So I modified your query to add the Parent_Or_Child Column as shown below, please let me know if thats fine.

with tree as (
  select connect_by_root asset_id rt, a.*
  from   assets a
  start with parent_or_child = 'Parent'
  connect by ( asset_id - trunc ( asset_id ) ) * 1000 = prior asset_id
), grps as (
  select rt, json_arrayagg (
           json_object ( item_name, category, vendor, location_id, unit_price, parent_or_child )
           order by item_name, category, vendor, location_id, unit_price, parent_or_child
         ) grp
  from   tree
  group  by rt
)
  select sum ( distinct t.rt ),
         item_name, category, vendor, location_id, unit_price,
         sum ( quantity ) quantity, parent_or_child
  from   grps g
  join   tree t
  on     t.rt = g.rt
  group  by grp, item_name, category, vendor, location_id, unit_price, parent_or_child
  order  by min ( t.rt ); 

Chris Saxon
September 01, 2023 - 12:53 pm UTC

I'm Chris, but thanks anyway! ;)

If you have enabled extended data types the max string length is 32,767; as long as there's relatively few rows in each group this should be plenty. If not you'll need another solution!

The answer is just an example to get you started; they're your requirements so update however you need to get your desired output!

Thanks!

A reader, September 11, 2023 - 3:15 pm UTC

Sorry about that Chris, that was a brain fart! :)

Thanks for your detailed inputs, appreciate it!

We dont have Extended Data Types enabled, I will try to verify what size data would we get in the transactions and if it can go above the limit. I will also try to check for some alternate solutions too if available.
Chris Saxon
September 12, 2023 - 1:02 pm UTC

This is a form of relational division - searching for that will throw up alternatives. Many of these solutions use subqueries => reading the same rows many times => slower.

It's worth looking into extended data types. Besides enabling you to have longer VARCHAR2, this is a prerequisite for column-based collation. Use this to make all comparisons on a column case-insensitive by default.

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.