Skip to Main Content
  • Questions
  • Logic to get the first row with one column value of the last duplicate row among all the duplicates

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question.

Asked: November 22, 2016 - 3:38 pm UTC

Last updated: November 23, 2016 - 5:20 am UTC

Version: 11g

Viewed 1000+ times

You Asked

I have a table A with following data.

ID Acc_no start_date verify_date completed_date
=====================================================================================================================================================
8ZA123456789 Account1 24-OCT-16 25-OCT-16 26-OCT-16
8ZD123456789 Account1 26-OCT-16 30-OCT-16 31-OCT-16
9Z2123456789 Account1 30-OCT-16 31-OCT-16 01-NOV-16
9Z3123456789 Account1 31-OCT-16 01-NOV-16 02-NOV-16

In the above table last 3 rows should be considered as duplicates [because 2nd row verify_date,completed_date are equal to 3rd row start_date and verify_date respectively and 3rd row verify_date,completed_date are equal to 4th row start_date and verify_date respectively] and hence I should consider as one transaction only for last 3rows and need to get the 2nd row(1st row among the duplicates) only but with completed_date of 4th row(last duplicate entry among the duplicates). So final out put should be as..

ID Acc_no final_start_date verify_date final_completed_date
=====================================================================================================================================================
8ZA123456789 Account1 24-OCT-16 25-OCT-16 26-OCT-16
8ZD123456789 Account1 26-OCT-16 30-OCT-16 02-NOV-16

Please give me logic for this as soon as possible

and Chris said...

Here's one way to do it:

- Compare the start_date to the previous verify_date and verify_date to the previous complete_date using lag()
- If these aren't the same, return the start_date
- "Fill down" the missing start_date values using last_value() ignore nulls
- Group by this calculation to get the min start and verify dates and max completion:

create table t (
  account_id int,
  start_date date,
  verify_date date,
  complete_date date
);

insert into t values (1, date'2016-10-24', date'2016-10-25', date'2016-10-26');
insert into t values (1, date'2016-10-26', date'2016-10-30', date'2016-10-31');
insert into t values (1, date'2016-10-30', date'2016-10-31', date'2016-11-01');
insert into t values (1, date'2016-10-31', date'2016-11-01', date'2016-11-02');
commit;

with rws as (
  select start_date, verify_date,  complete_date, account_id,
       case 
         when not (
           start_date = lag(verify_date, 1, date'1900-01-01') over (order by start_date) and 
           verify_date = lag(complete_date, 1, date'1900-01-01') over (order by start_date)
         )
         then
           start_date
       end grp
from   t
), grps as (
  select start_date, verify_date, complete_date, account_id, 
         last_value(grp) ignore nulls over (order by start_date) lv
  from   rws
)
  select account_id, 
         min(start_date), min(verify_date), max(complete_date)
  from   grps
  group  by account_id, lv
  order  by 1;

ACCOUNT_ID  MIN(START_DATE)       MIN(VERIFY_DATE)      MAX(COMPLETE_DATE)    
1           24-OCT-2016 00:00:00  25-OCT-2016 00:00:00  26-OCT-2016 00:00:00  
1           26-OCT-2016 00:00:00  30-OCT-2016 00:00:00  02-NOV-2016 00:00:00 


In your complete solution I suspect you'll need to partition by account_id in the analytic clauses too.

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, November 22, 2016 - 10:43 pm UTC

It is working for the current account hopefully it works for the rest. Thank you for the quick response.
Connor McDonald
November 23, 2016 - 5:20 am UTC

glad we could help

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.