Skip to Main Content
  • Questions
  • Logic to get the first row with one column value of the last duplicate row among all the duplicates

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question.

Asked: November 22, 2016 - 3:38 pm UTC

Last updated: November 23, 2016 - 5:20 am UTC

Version: 11g

Viewed 1000+ times

You Asked

I have a table A with following data.

ID Acc_no start_date verify_date completed_date
=====================================================================================================================================================
8ZA123456789 Account1 24-OCT-16 25-OCT-16 26-OCT-16
8ZD123456789 Account1 26-OCT-16 30-OCT-16 31-OCT-16
9Z2123456789 Account1 30-OCT-16 31-OCT-16 01-NOV-16
9Z3123456789 Account1 31-OCT-16 01-NOV-16 02-NOV-16

In the above table last 3 rows should be considered as duplicates [because 2nd row verify_date,completed_date are equal to 3rd row start_date and verify_date respectively and 3rd row verify_date,completed_date are equal to 4th row start_date and verify_date respectively] and hence I should consider as one transaction only for last 3rows and need to get the 2nd row(1st row among the duplicates) only but with completed_date of 4th row(last duplicate entry among the duplicates). So final out put should be as..

ID Acc_no final_start_date verify_date final_completed_date
=====================================================================================================================================================
8ZA123456789 Account1 24-OCT-16 25-OCT-16 26-OCT-16
8ZD123456789 Account1 26-OCT-16 30-OCT-16 02-NOV-16

Please give me logic for this as soon as possible

and Chris said...

Here's one way to do it:

- Compare the start_date to the previous verify_date and verify_date to the previous complete_date using lag()
- If these aren't the same, return the start_date
- "Fill down" the missing start_date values using last_value() ignore nulls
- Group by this calculation to get the min start and verify dates and max completion:

create table t (
  account_id int,
  start_date date,
  verify_date date,
  complete_date date
);

insert into t values (1, date'2016-10-24', date'2016-10-25', date'2016-10-26');
insert into t values (1, date'2016-10-26', date'2016-10-30', date'2016-10-31');
insert into t values (1, date'2016-10-30', date'2016-10-31', date'2016-11-01');
insert into t values (1, date'2016-10-31', date'2016-11-01', date'2016-11-02');
commit;

with rws as (
  select start_date, verify_date,  complete_date, account_id,
       case 
         when not (
           start_date = lag(verify_date, 1, date'1900-01-01') over (order by start_date) and 
           verify_date = lag(complete_date, 1, date'1900-01-01') over (order by start_date)
         )
         then
           start_date
       end grp
from   t
), grps as (
  select start_date, verify_date, complete_date, account_id, 
         last_value(grp) ignore nulls over (order by start_date) lv
  from   rws
)
  select account_id, 
         min(start_date), min(verify_date), max(complete_date)
  from   grps
  group  by account_id, lv
  order  by 1;

ACCOUNT_ID  MIN(START_DATE)       MIN(VERIFY_DATE)      MAX(COMPLETE_DATE)    
1           24-OCT-2016 00:00:00  25-OCT-2016 00:00:00  26-OCT-2016 00:00:00  
1           26-OCT-2016 00:00:00  30-OCT-2016 00:00:00  02-NOV-2016 00:00:00 


In your complete solution I suspect you'll need to partition by account_id in the analytic clauses too.

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, November 22, 2016 - 10:43 pm UTC

It is working for the current account hopefully it works for the rest. Thank you for the quick response.
Connor McDonald
November 23, 2016 - 5:20 am UTC

glad we could help

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.