Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Prakash.

Asked: August 23, 2016 - 3:09 pm UTC

Last updated: September 02, 2016 - 3:10 pm UTC

Version: 4.0.1

Viewed 1000+ times

You Asked

Tool : SQL Developer
Data

CustId TransactionDate Transaction
C001 01-JAN-2016 Inquiry
C001 10-JAN-2016 Sales
C001 21-JAN-2016 Repeat Inquiry
C001 10-FEB-2016 Repeat Inquiry
C001 01-MAY-2016 Sales
C001 06-MAY-2016 Sales
C001 10-JUN-2016 Inquiry 1
C001 01-SEP-2016 Inquiry 2
C002 01-FEB-2016 Inquiry 1
C002 25-FEB-2016 Inquiry 2
C002 04-MAR-2016 Inquiry 4
C002 03-MAY-2016 Inquiry 3
C003 01-FEB-2016 Inquiry 2
C003 10-FEB-2016 Sales
C003 10-MAR-2016 Inquiry 2
C004 15-APR-2016 Sales
C005 01-JUN-2016 Inquiry 1

Background
Each customer is able to perform various transactions as above. I am trying to create a journey for customer sales. So as soon as there is Sales transaction, sequencing is triggered and it will end after 40 days.Sequencing has to go backward with negative numbers for past 10 days before Sales transaction. If a customer does not perform Sales transaction, Sequence will be 0.

Expected Output


CustId TransactionDate Transaction Seq
C001 01-JAN-2016 Inquiry -1
C001 10-JAN-2016 Sales 1
C001 21-JAN-2016 Repeat Inquiry 2
C001 10-FEB-2016 Repeat Inquiry 3
C001 01-MAY-2016 Sales 1
C001 06-MAY-2016 Sales 2
C001 10-JUN-2016 Inquiry 1 3
C001 01-SEP-2016 Inquiry 2 0
C002 01-FEB-2016 Inquiry 1 0
C002 25-FEB-2016 Inquiry 2 0
C002 04-MAR-2016 Inquiry 4 0
C002 03-MAY-2016 Inquiry 3 0
C003 01-FEB-2016 Inquiry 2 -1
C003 10-FEB-2016 Sales 1
C003 10-MAR-2016 Inquiry 2 2
C004 15-APR-2016 Sales 1
C005 01-JUN-2016 Inquiry 1 0

Sorry: I posted this question earlier but was unable to make changes so posting again. I apologize.
Let me know if you need more information.
Thank you

and Chris said...

Interesting question! I'm going to use the model clause to calculate this. But I'm sure there's other ways.

Ready? Here goes:

There are three cases to consider:

1. The current transaction is Sales. And there are no other sales transactions within the 40 previous days. This starts the sequence at 1.
2. There is a sales transaction in the previous 40 days. Increment the sequence counter.
3. Neither 1 or 2 apply. And there's a sales transaction in the 10 following days.

If all these fail the sequence is zero.

First up, we need to setup the model. The calculations are per customer. So you need to partition by this. And you want to sort by transactiondate. So let's calculate a row number based on this.

  partition by (custid)
  dimension by (
    row_number() over (partition by custid order by transactiondate ) rn
  )


The dimension by enables us to reference the rows.

In the output, as well as the custid, you want to return the sequence, transactiondate and type. So let's add these to the measures clause:

  measures (0 seq, transactiondate, trans)


Now we need to supply the calculations for seq. These come from the steps above. We'll put these in a case statement. This will assign values to all the rows. So the rules clause starts with:

  seq[any] = case ...


We then place the calculations above in the when clauses.

The first part of 1 is easy. Simply check:

  trans[cv()] = 'Sales'


i.e. the current value (cv()) of the transaction type is Sales. But we only want to consider the first sales transaction in a 40 day period. So we need to exclude any after in this timeframe. We can use the first_value analytic to do this:

first_value(
  case when trans = 'Sales' then 'Sales' end) ignore nulls over (
    order by transactiondate range between 40 preceding and 1 preceding 
)


The window clause "range between 40 preceding and 1 preceding" considers transactions in the previous 40 days. Note its range, not rows!

We only care about sales transactions. We can ignore the others by mapping them to null and using the "ignore nulls" clause. This what "case when trans = 'Sales' then 'Sales' end) ignore nulls" does.

Then we just need to check that this value isn't sales! This gives the start of each sales sequence. So the first clause is:

when trans[cv()] = 'Sales' and 
     coalesce(
       first_value(
         case when trans = 'Sales' then 'Sales' end) ignore nulls over (
           order by transactiondate range between 40 preceding and 1 preceding 
        ), 'not sales'
     ) != 'Sales' then 
  1


For the second step, we can reuse the last_value calculation above. If this is sales, then we want to increment the sequence counter:

seq[cv()-1]+1


This finds the value of seq for the previous row. Then adds one!

So the next clause is:

when first_value(case when trans = 'Sales' then 'Sales' end) ignore nulls over (
  order by transactiondate range between 40 preceding and 1 preceding
) = 'Sales' then
  seq[cv()-1]+1


Now we need to find the transactions in the 10 days before a sale. Again, we can use first_value. But instead of the "40 preceding and 1 preceding", we want to look at the rows from the current onward. This is "range between current row and 10 following".

Now we can start counting down. We can do this by looking at the value for seq in the next row. Then subtract 1 from it.

The row immediately before the sales transaction will see a "next seq" of 1. Take one from this and we get zero. But you want to skip this. So if the next value is 1, we need to map it to -1.

This gives:

when first_value(case when trans = 'Sales' then 'Sales' end) ignore nulls over (
  order by transactiondate range between current row and 10 following
) = 'Sales' then
  case when seq[cv()+1] = 1 then -1 else seq[cv()+1]-1 end


Note to look forward like this you need to state "rules automatic order". Otherwise you'll get an "ORA-32637: Self cyclic rule in sequential order MODEL" error.

Finally, we need the "nothing matched" case. This is simple an else clause that returns zero.

Put it all together and you get:

drop table t purge;
CREATE TABLE t (CustId varchar2(4), TransactionDate date, Trans varchar2(14) );

alter session set nls_date_format = 'yyyy-mm-dd';
    
INSERT INTO t VALUES ('C001', '2016-01-01', 'Inquiry');
INSERT INTO t VALUES ('C001', '2016-01-10', 'Sales');
INSERT INTO t VALUES ('C001', '2016-01-21', 'Repeat Inquiry');
INSERT INTO t VALUES ('C001', '2016-02-10', 'Repeat Inquiry');
INSERT INTO t VALUES ('C001', '2016-05-01', 'Sales');
INSERT INTO t VALUES ('C001', '2016-05-06', 'Sales');
INSERT INTO t VALUES ('C001', '2016-06-10', 'Inquiry 1');
INSERT INTO t VALUES ('C001', '2016-09-01', 'Inquiry 2');
INSERT INTO t VALUES ('C002', '2016-02-01', 'Inquiry 1');
INSERT INTO t VALUES ('C002', '2016-02-25', 'Inquiry 2');
INSERT INTO t VALUES ('C002', '2016-03-04', 'Inquiry 4');
INSERT INTO t VALUES ('C002', '2016-05-03', 'Inquiry 3');
INSERT INTO t VALUES ('C003', '2016-02-01', 'Inquiry 2');
INSERT INTO t VALUES ('C003', '2016-02-10', 'Sales');
INSERT INTO t VALUES ('C003', '2016-03-10', 'Inquiry 2');
INSERT INTO t VALUES ('C004', '2016-04-15', 'Sales');
INSERT INTO t VALUES ('C005', '2016-06-01', 'Inquiry 1');

select custid, transactiondate, trans, seq from t
model 
  partition by (custid)
  dimension by (
    row_number() over (partition by custid order by transactiondate ) rn
  ) 
  measures (0 seq, transactiondate, trans)
  rules automatic order (
    seq[any] = case
      when trans[cv()] = 'Sales' and 
           coalesce(
             first_value(
               case when trans = 'Sales' then 'Sales' end) ignore nulls over (
                 order by transactiondate range between 40 preceding and 1 preceding 
              ), 'not sales'
           ) != 'Sales' then 
        1 
      when first_value(case when trans = 'Sales' then 'Sales' end) ignore nulls over (
        order by transactiondate range between 40 preceding and 1 preceding
      ) = 'Sales' then
        seq[cv()-1]+1
      when 
        first_value(case when trans = 'Sales' then 'Sales' end) ignore nulls over (
          order by transactiondate range between current row and 10 following
      ) = 'Sales' then
        case when seq[cv()+1] = 1 then -1 else seq[cv()+1]-1 end
      else 0
    end
  )
order by 1, 2;

CUSTID  TRANSACTIONDATE  TRANS           SEQ  
C001    2016-01-01       Inquiry         -1   
C001    2016-01-10       Sales           1    
C001    2016-01-21       Repeat Inquiry  2    
C001    2016-02-10       Repeat Inquiry  3    
C001    2016-05-01       Sales           1    
C001    2016-05-06       Sales           2    
C001    2016-06-10       Inquiry 1       3    
C001    2016-09-01       Inquiry 2       0    
C002    2016-02-01       Inquiry 1       0    
C002    2016-02-25       Inquiry 2       0    
C002    2016-03-04       Inquiry 4       0    
C002    2016-05-03       Inquiry 3       0    
C003    2016-02-01       Inquiry 2       -1   
C003    2016-02-10       Sales           1    
C003    2016-03-10       Inquiry 2       2    
C004    2016-04-15       Sales           1    
C005    2016-06-01       Inquiry 1       0

Rating

  (6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Same Date Issue

Prakash Panchal, August 31, 2016 - 12:57 am UTC

Thank you Chris,


Never had chance to use model before. It is nice to see below example. Only one challenge i see is if i have one than one Sale on same day they both are assigned "1" sequence. It should pick the first one as 1 and second 2 and so on.


Connor McDonald
August 31, 2016 - 9:41 am UTC

I had a feeling you would ask that! ;)

To avoid this problem check whether the previous row is a sales transaction in cases 1 & 2. You can do this with lag:

lag(trans, 1, 'not sales') over (order by rn)


In case 1, you want to check that this value is not equal to sales. For case 2, you want the rule to apply if the previous row is sales or there has been an earlier transaction.

This gives:

drop table t purge;
CREATE TABLE t (CustId varchar2(4), TransactionDate date, Trans varchar2(14) );

alter session set nls_date_format = 'yyyy-mm-dd';
    
INSERT INTO t VALUES ('C004', '2016-04-15', 'Sales');
INSERT INTO t VALUES ('C004', '2016-04-15', 'Sales');
INSERT INTO t VALUES ('C004', '2016-04-15', 'Sales');

select custid, transactiondate, trans, seq from t
model 
  partition by (custid)
  dimension by (
    row_number() over (partition by custid order by transactiondate ) rn
  ) 
  measures (0 seq, transactiondate, trans)
  rules automatic order (
    seq[any] = case
      when trans[cv()] = 'Sales' and 
           coalesce(
             first_value(
               case when trans = 'Sales' then 'Sales' end) ignore nulls over (
                 order by transactiondate range between 40 preceding and 1 preceding 
              ), 'not sales'
           ) != 'Sales' 
           and  lag(trans, 1, 'not sales') over (order by rn) <> 'Sales' 
           then 
        1 
      when first_value(case when trans = 'Sales' then 'Sales' end) ignore nulls over (
        order by transactiondate range between 40 preceding and 1 preceding
      ) = 'Sales' 
        or  lag(trans, 1, 'not sales') over (order by rn) = 'Sales' then
        seq[cv()-1]+1
      when 
        first_value(case when trans = 'Sales' then 'Sales' end) ignore nulls over (
          order by transactiondate range between current row and 10 following
      ) = 'Sales' then
        case when seq[cv()+1] = 1 then -1 else seq[cv()+1]-1 end
      else 0
    end
  )
order by 1, rn ,2;

CUSTID  TRANSACTIONDATE  TRANS  SEQ  
C004    2016-04-15       Sales  1    
C004    2016-04-15       Sales  2    
C004    2016-04-15       Sales  3

Worked Perfect - Thank you

Prakash Panchal, August 31, 2016 - 1:00 pm UTC

Chris,

Not just you provided solution to my query but you have opened door of model for me to explore.

Really appreciate your help.

Thank you

Prakash
Chris Saxon
August 31, 2016 - 4:03 pm UTC

Thanks :)

Power of model

Rajeshwaran Jeyabal, September 02, 2016 - 2:42 am UTC

Chris,

Thanks for model clause and the step by step explanation to it.

Thats the flexibilty of model, procedural processing in a non-procedural language.
I hope this can be achieved using "Pipelined Table function" in plsql.

But on a huge datasets, this model clause face tremendous performance issues ( https://community.oracle.com/ideas/13061 )

Any other way to handle this apart from Model ?

Any help/support to improve the “Model” clause in future database release will help much.
Connor McDonald
September 02, 2016 - 9:34 am UTC

I'm sure it's possible to do with standard analytics. All the clauses to identify the different case in the model are analytics. So you could use similar to do something like:

- Find all the rows in the 10 days before and 40 days after a sales transaction
- Assign an identifier for each of these groups so you can separate them
- Use row_number() partitioning by the identifier above, ordered date asc to count up from the first sales transaction in the group
- Use row_number() partitioning by the identifier above, ordered date desc to count down from the transaction before the sale in each group. And return the minus of it.

Easy. Um. Perhaps not. Figuring out exactly how makes my brain hurt too much! ;)

standard analytics

Rajeshwaran Jeyabal, September 02, 2016 - 10:17 am UTC

Yes Chris, Agree with you.

But, multiple layers of analytic can be easily solved by using rules in Model clause. but model doesn't scale up in huge data sets.

so what steps do we need to take up, to help to improve this model clause?
Chris Saxon
September 02, 2016 - 3:10 pm UTC

If you can put together a solid business explaining why you need to use the model clause and what business benefit you'll get by improving its performance will go a long way.

i.e. something more concrete than "I think the model clause is cool and want it to be faster"

To Rajeshwaran Jeyabal

Stew Ashton, September 04, 2016 - 7:42 am UTC

You asked "Any other way to handle this apart from Model ?". Yes there is, in version 12c.
select * from t
match_recognize(
  partition by custid order by transactiondate, trans
  measures
    case
      when classifier() = 'PREV_INQ'
        and transactiondate >= final first(first_sale.transactiondate) - 10
      then count(prev_inq.*) - final count(prev_inq.*) - 1
      else count(in_seq.*)
    end seq
  all rows per match
  pattern ( prev_inq* first_sale{0,1} (other_sale|other_inq)* )
  subset in_seq = (first_sale, other_sale, other_inq)
  define prev_inq as trans != 'Sales',
    first_sale as trans = 'Sales',
    other_sale as trans = 'Sales' and transactiondate <= first_sale.transactiondate + 40,
    other_inq as trans != 'Sales' and transactiondate <= first_sale.transactiondate + 40
);

The hard parts were:

- properly defining the pattern (zero or more previous inquiries, followed by zero or one sale, followed by zero or more transactions within 40 days of the sale)

- using FINAL to "look ahead" from the previous inquiries to the first sale in order to do the negative sequencing.

> and a belated Happy Birthday to askTeam :-)

Oops: simplifying previous post

Stew Ashton, September 04, 2016 - 8:51 am UTC

I just realized that within the 40 day window I don't need to distinguish between 'Sales' and other transaction types.
select * from t
match_recognize(
  partition by custid order by transactiondate, trans
  measures
    case
      when classifier() = 'PREV_INQ'
        and transactiondate >= final first(first_sale.transactiondate) - 10
      then count(prev_inq.*) - final count(prev_inq.*) - 1
      else count(*) - count(prev_inq.*)
    end seq
  all rows per match
  pattern ( prev_inq* first_sale{0,1} more_trans* )
  define prev_inq as trans != 'Sales',
    first_sale as trans = 'Sales',
    more_trans as transactiondate <= first_sale.transactiondate + 40
);

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.