I have a table A with following data.
ID Acc_no start_date verify_date completed_date
=====================================================================================================================================================
8ZA123456789 Account1 24-OCT-16 25-OCT-16 26-OCT-16
8ZD123456789 Account1 26-OCT-16 30-OCT-16 31-OCT-16
9Z2123456789 Account1 30-OCT-16 31-OCT-16 01-NOV-16
9Z3123456789 Account1 31-OCT-16 01-NOV-16 02-NOV-16
In the above table last 3 rows should be considered as duplicates [because 2nd row verify_date,completed_date are equal to 3rd row start_date and verify_date respectively and 3rd row verify_date,completed_date are equal to 4th row start_date and verify_date respectively] and hence I should consider as one transaction only for last 3rows and need to get the 2nd row(1st row among the duplicates) only but with completed_date of 4th row(last duplicate entry among the duplicates). So final out put should be as..
ID Acc_no final_start_date verify_date final_completed_date
=====================================================================================================================================================
8ZA123456789 Account1 24-OCT-16 25-OCT-16 26-OCT-16
8ZD123456789 Account1 26-OCT-16 30-OCT-16 02-NOV-16
Please give me logic for this as soon as possible
Here's one way to do it:
- Compare the start_date to the previous verify_date and verify_date to the previous complete_date using lag()
- If these aren't the same, return the start_date
- "Fill down" the missing start_date values using last_value() ignore nulls
- Group by this calculation to get the min start and verify dates and max completion:
create table t (
account_id int,
start_date date,
verify_date date,
complete_date date
);
insert into t values (1, date'2016-10-24', date'2016-10-25', date'2016-10-26');
insert into t values (1, date'2016-10-26', date'2016-10-30', date'2016-10-31');
insert into t values (1, date'2016-10-30', date'2016-10-31', date'2016-11-01');
insert into t values (1, date'2016-10-31', date'2016-11-01', date'2016-11-02');
commit;
with rws as (
select start_date, verify_date, complete_date, account_id,
case
when not (
start_date = lag(verify_date, 1, date'1900-01-01') over (order by start_date) and
verify_date = lag(complete_date, 1, date'1900-01-01') over (order by start_date)
)
then
start_date
end grp
from t
), grps as (
select start_date, verify_date, complete_date, account_id,
last_value(grp) ignore nulls over (order by start_date) lv
from rws
)
select account_id,
min(start_date), min(verify_date), max(complete_date)
from grps
group by account_id, lv
order by 1;
ACCOUNT_ID MIN(START_DATE) MIN(VERIFY_DATE) MAX(COMPLETE_DATE)
1 24-OCT-2016 00:00:00 25-OCT-2016 00:00:00 26-OCT-2016 00:00:00
1 26-OCT-2016 00:00:00 30-OCT-2016 00:00:00 02-NOV-2016 00:00:00
In your complete solution I suspect you'll need to partition by account_id in the analytic clauses too.