Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Chris Saxon

Thanks for the question, Richard.

Asked: August 05, 2020 - 12:34 pm UTC

Last updated: August 06, 2020 - 4:40 pm UTC

Version: 19c

Viewed 1000+ times

You Asked

I have 1000 cubes of weight between 360 and 430 grams. I need to choose a set of cubes which will have weigh 50.000 grams. Do you have any idea how to do it in SQL or PL/SQL?

CREATE TABLE CUBES
(
  ID      VARCHAR2(100 CHAR),
  WEIGHT  NUMBER(10,3)
);

BEGIN
  FOR i IN 1..1000 LOOP
    INSERT INTO cubes VALUES (i, round(dbms_random.value( 360, 430), 3));
  END LOOP;
  COMMIT;
END;

and Chris said...

So you want to find all sets of rows where the sum of the weights is 50 thousand?

If so, a few things to bear in mind:

- There's no guarantee there is such a group in your data!
- This is a form of bin packing problem, which is NP-complete. Which means there are no known solutions that are both fast and optimal

If you're willing to take some shortcuts, you can use match_recognize to find close values quickly and easily. With one pass through the data, you can get reasonably close:

select *
from   cubes match_recognize (
  measures 
    sum ( weight ) as total,
    match_number () as grp
  pattern ( total+ )
  define 
    total as sum ( weight ) <= 50000
)
order  by total desc;

TOTAL           GRP   
   49943.371      6 
   49929.287      3 
    49878.49      5 
   49785.969      4 
    49770.08      1 
   49763.964      2 
   49636.947      7 
    47082.75      8


Close, but 50+ short of the goal. Let's try and get closer. It'd be nice to see which cubes comprise this total too.

We can address both of these by adding these clauses:

all rows per match
after match skip to next row


This means we'll see every matched row in the output. And revisit each row many times, so it could be in many groups.

To find the cubes in the group with the total weight closest to 50,000 include:

max ( grp ) keep (
 dense_rank first order by total desc
) over () max_grp


In a subquery, the filter the rows with the same group as this max in the outer query. Giving:

with rws as ( 
  select grps.*,
         max ( grp ) keep (
           dense_rank first order by total desc
         ) over () max_grp
  from   cubes match_recognize (
    measures 
      final sum ( weight ) as total,
      match_number () as grp
    all rows per match
    after match skip to next row
    pattern ( total+ )
    define 
      total as sum ( weight ) <= 50000
  ) grps
)
  select * from rws
  where  grp = max_grp;
  
TOTAL           GRP ID         WEIGHT    MAX_GRP   
   49999.505    541 35        369.375        541 
   49999.505    541 36        367.181        541 
   ...
   49999.505    541 160       389.137        541 
   49999.505    541 161       373.928        541 

127 rows selected. 


Now we're much closer. And the query is still fast. But we're walking through the rows in the same order every time.

We can see if we can get (a bit) closer by first sorting the table by a random value. Then re-run the query a few times. For example:

with crand as (
  select * from cubes
  order  by dbms_random.value
), rws as ( 
  select ...
  from   crand match_recognize (
  ...
)
  select * from rws
  where  grp = max_grp;


This is still quick. But there's no guarantee you'll get any groups with a total of 50k or the closest if there isn't one.

I don't know of a way to find the closest groups in a reasonable time frame. The brute force approach has factorial complexity - even with a small data set of 1,000 rows this is incredibly SLLLOOOOOOOOOOOOOOOOOOOOOOOOOOW!

Other may know some better algorithms you can use to get more accurate results.

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.