## Question and Answer

## You Asked

I have 1000 cubes of weight between 360 and 430 grams. I need to choose a set of cubes which will have weigh 50.000 grams. Do you have any idea how to do it in SQL or PL/SQL?

CREATE TABLE CUBES ( ID VARCHAR2(100 CHAR), WEIGHT NUMBER(10,3) ); BEGIN FOR i IN 1..1000 LOOP INSERT INTO cubes VALUES (i, round(dbms_random.value( 360, 430), 3)); END LOOP; COMMIT; END;

## and we said...

So you want to find all sets of rows where the sum of the weights is 50 thousand?

If so, a few things to bear in mind:

- There's no guarantee there is such a group in your data!

- This is a form of bin packing problem, which is NP-complete. Which means there are no known solutions that are both fast and optimal

If you're willing to take some shortcuts, you can use match_recognize to find close values quickly and easily. With one pass through the data, you can get reasonably close:

Close, but 50+ short of the goal. Let's try and get closer. It'd be nice to see which cubes comprise this total too.

We can address both of these by adding these clauses:

This means we'll see every matched row in the output. And revisit each row many times, so it could be in many groups.

To find the cubes in the group with the total weight closest to 50,000 include:

In a subquery, the filter the rows with the same group as this max in the outer query. Giving:

Now we're much closer. And the query is still fast. But we're walking through the rows in the same order every time.

We can see if we can get (a bit) closer by first sorting the table by a random value. Then re-run the query a few times. For example:

This is still quick. But there's no guarantee you'll get any groups with a total of 50k or the closest if there isn't one.

I don't know of a way to find the closest groups in a reasonable time frame. The brute force approach has factorial complexity - even with a small data set of 1,000 rows this is incredibly SLLLOOOOOOOOOOOOOOOOOOOOOOOOOOW!

Other may know some better algorithms you can use to get more accurate results.

If so, a few things to bear in mind:

- There's no guarantee there is such a group in your data!

- This is a form of bin packing problem, which is NP-complete. Which means there are no known solutions that are both fast and optimal

If you're willing to take some shortcuts, you can use match_recognize to find close values quickly and easily. With one pass through the data, you can get reasonably close:

select * from cubes match_recognize ( measures sum ( weight ) as total, match_number () as grp pattern ( total+ ) define total as sum ( weight ) <= 50000 ) order by total desc; TOTAL GRP 49943.371 6 49929.287 3 49878.49 5 49785.969 4 49770.08 1 49763.964 2 49636.947 7 47082.75 8

Close, but 50+ short of the goal. Let's try and get closer. It'd be nice to see which cubes comprise this total too.

We can address both of these by adding these clauses:

all rows per match after match skip to next row

This means we'll see every matched row in the output. And revisit each row many times, so it could be in many groups.

To find the cubes in the group with the total weight closest to 50,000 include:

max ( grp ) keep ( dense_rank first order by total desc ) over () max_grp

In a subquery, the filter the rows with the same group as this max in the outer query. Giving:

with rws as ( select grps.*, max ( grp ) keep ( dense_rank first order by total desc ) over () max_grp from cubes match_recognize ( measures final sum ( weight ) as total, match_number () as grp all rows per match after match skip to next row pattern ( total+ ) define total as sum ( weight ) <= 50000 ) grps ) select * from rws where grp = max_grp; TOTAL GRP ID WEIGHT MAX_GRP 49999.505 541 35 369.375 541 49999.505 541 36 367.181 541 ... 49999.505 541 160 389.137 541 49999.505 541 161 373.928 541 127 rows selected.

Now we're much closer. And the query is still fast. But we're walking through the rows in the same order every time.

We can see if we can get (a bit) closer by first sorting the table by a random value. Then re-run the query a few times. For example:

with crand as ( select * from cubes order by dbms_random.value ), rws as ( select ... from crand match_recognize ( ... ) select * from rws where grp = max_grp;

This is still quick. But there's no guarantee you'll get any groups with a total of 50k or the closest if there isn't one.

I don't know of a way to find the closest groups in a reasonable time frame. The brute force approach has factorial complexity - even with a small data set of 1,000 rows this is incredibly SLLLOOOOOOOOOOOOOOOOOOOOOOOOOOW!

Other may know some better algorithms you can use to get more accurate results.