Given a manual business process when customers accept a minimum dollar value for shipments, I need to write a query that will consolidate the order lines for the customer such that we show the date their order lines accumulate the minimum shipment value. If the customer has order lines that do not meet the minimum amount, show the date as 30 days after the last ship date for the lines. Following is a description of the manual business process using sample data.
In the sample data on LiveSQL, for example, Cust A has 3 orders with 5 lines and their minimum shipment value is 350. Their total order value is 5360. Cust B has 3 orders with 5 lines and their minimum shipment value is 750. Their total order value is 1875.
Cust A, when reviewed manually, should have the following orders consolidated:
for the first shipment:
Order 1, lines 1 and 2, total value 350, ship date 2017-02-11
second shipment:
order 1 line 3, order 2 line 1, total value 1010, ship date 2017-02-21
third shipment:
order 3 line 1, total value 4000, ship date 2017-03-31
Cust A has no un-consolidated lines.
Cust B, when reviewed manually, should have the following orders consolidated:
for first shipment:
order 4 lines 1 and 2, order 5 line 1, total value 775, ship date 2017-02-11
second shipment:
order 5 line 2, total value 1000, ship date 2017-02-21
Cust B has one un-consolidated line, order 6 line 1, value 100, ship date 2017-04-30.
This is a case for match recognize!
What you're looking for is any number of orders until the total exceeds the min value for the customer, right?
So zero or more under the min, followed by at most one over. Which in regular expressions is:
under_min* over_min{0,1}
Note you need zero or one matches for over_min so you get unconsolidated lines at the end.
Ok so how do you define these variables?
Well under_min is the sum of the lines less than the min, over_min is the sum of those greater than this!
define
under_min as sum(line_amt) < min_ship_value,
over_min as sum(line_amt) >= min_ship_value
In this case sum works like an analytic, giving you the running total.
You want to split the rows up by customer and sort by shipment date. So put these in the partition by and order by clauses respectively.
All that's left is to figure out your columns.
To get the group, use match_number(). This assigns a number to all the rows in the same group. It increments for each new group and resets to 1 for each new customer.
To get the date of the last shipment in a group, you need the "final last" shipment date for over_min. This gives your shipment date of the last matching row for this variable. For unconsolidated orders there is no over_min shipment date. So this is null. So nvling this with last shipment date + 30 gives you the desired result!
Put this all together and you have:
alter session set nls_date_format = 'DD-MON-YYYY';
create table cust (
cust_id number not null primary key, cust_name varchar2(30) not null,
min_ship_value number not null
);
create table cust_orders (
cust_id number not null, order_no number not null, order_line number not null,
line_amt number not null, ship_date date not null,
constraint fk_orders_cust_id foreign key (cust_id) references cust,
constraint pk_cust_orders primary key (order_no, order_line)
);
insert into cust values (1,'Cust A',350);
insert into cust values (2,'Cust B',750);
insert into cust_orders values (1,1,1,100,date'2017-01-31');
insert into cust_orders values (1,1,2,250,date'2017-02-11');
insert into cust_orders values (1,1,3,10,date'2017-02-10');
insert into cust_orders values (1,2,1,1000,date'2017-02-21');
insert into cust_orders values (1,3,1,4000,date'2017-03-31');
insert into cust_orders values (2,4,1,175,date'2017-01-31');
insert into cust_orders values (2,4,2,500,date'2017-02-11');
insert into cust_orders values (2,5,1,100,date'2017-02-10');
insert into cust_orders values (2,5,2,1000,date'2017-02-21');
insert into cust_orders values (2,6,1,100,date'2017-03-31');
commit;
select * from (
select co.*, c.min_ship_value from cust c
join cust_orders co
on c.cust_id = co.cust_id
)
match_recognize (
partition by cust_id
order by ship_date, order_line
measures
match_number() as mno,
classifier() as cls,
sum(line_amt) as tot,
nvl(final last(over_min.ship_date), last(ship_date)+30) as last_ship_date
all rows per match
pattern (under_min* over_min{0,1})
define
under_min as sum(line_amt) < min_ship_value,
over_min as sum(line_amt) >= min_ship_value
);
CUST_ID SHIP_DATE ORDER_LINE MNO CLS TOT LAST_SHIP_DATE ORDER_NO LINE_AMT MIN_SHIP_VALUE
1 31-JAN-2017 1 1 UNDER_MIN 100 11-FEB-2017 1 100 350
1 10-FEB-2017 3 1 UNDER_MIN 110 11-FEB-2017 1 10 350
1 11-FEB-2017 2 1 OVER_MIN 360 11-FEB-2017 1 250 350
1 21-FEB-2017 1 2 OVER_MIN 1,000 21-FEB-2017 2 1,000 350
1 31-MAR-2017 1 3 OVER_MIN 4,000 31-MAR-2017 3 4,000 350
2 31-JAN-2017 1 1 UNDER_MIN 175 11-FEB-2017 4 175 750
2 10-FEB-2017 1 1 UNDER_MIN 275 11-FEB-2017 5 100 750
2 11-FEB-2017 2 1 OVER_MIN 775 11-FEB-2017 4 500 750
2 21-FEB-2017 2 2 OVER_MIN 1,000 21-FEB-2017 5 1,000 750
2 31-MAR-2017 1 3 UNDER_MIN 100 30-APR-2017 6 100 750
If you want to know more about pattern matching, I recommend reading Keith Laker's deep dive series on this. You can find the first article at:
http://oracle-big-data.blogspot.co.uk/2016/03/sql-pattern-matching-deep-dive-part-1.html PS - Thanks for providing a clear test case. Though I don't understand why you're consolidating order lines 1 & 2 for customer 1. Three ships before two, so surely you'd consolidate these three?
In any case, to change how these are combined, adjust the order by in match_recognize appropriately.