Skip to Main Content
  • Questions
  • Validate sets of "consecutive" data entry sets (no overlaps).

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Matt.

Asked: February 06, 2004 - 6:09 pm UTC

Last updated: February 21, 2008 - 7:11 am UTC

Version: 9.2.0

Viewed 1000+ times

You Asked

This sprang out of the following post:

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:474221407101 <code>

you asked for more info and it started to look like an entirely new question so I have posted it as such. Here we go...


[My first Post]
I need to load lots of data very quickly from an external table the data looks
like:

id,value,from_date,to_date

I need to load this data subject to the following requirements:

1) There can be no overlaps for a particular id (so one records to_date is equal
to the next records from_date.
2) There must be a full set of data from a given historic date date (roughly
three years in the past).
3) All the data for each id must span the period from the historic date as close
to the current date as possible (given the data available).

For example:

If historic date is 01/01/2003 (DD/MM/YYYY), this is the data that I want to
load (I need to discard data that does not marry the previous records to_date to
the next records from date:

id value from_date to_date
1 10 01/01/2003 01/02/2003
1 12 01/02/2003 21/05/2003
1 19 21/05/2003 01/06/2003
1 25 01/06/2003 01/12/2003
1 25 01/12/2003 01/02/2004

How would one generate a select to pull out data that marries like this?

Also, I need to constrain the schema so that only the data that satifies these
rules will be accepted (similar to the original post). Does the method described
in this post still apply to 9.2 or is there a better way?

Best Regards,
[END]

[Your Response]
tell me, what if the inputs do have overlaps. what logic decides what rows are
actually kept (and why)

what does "there must be a full set of data" mean actually?
[END]

[My final post, with some additional information provided to the original]

(I'll answer your Q's in a sec).

I'm working at this problem from two ends.

1) a query to pull out the valid data from the external table subject to my
requirements in order to create a table

2) a loader to go through the data that was not retreived by the query. The
loader will bulk load these values and constraints will identify what the
problem was with the data (so it can be investigated and fixed)

So, I think I need

a) A suitable query to gather the data according to the requirements (more
detail will follow).

b) A mechanism to constrain the target table so that 'bad' data cannot get in
(and a constraint or trigger will raise an error that may be caught and logged)

The logic for keeping records is this:

a) The record with the earliest from_date for a particular id must have a
from_date (GREATER THEN OR EQUAL TO) 01/01/2003 (DD/MM/YYYY) (the historic date in the example above).
b) The set of records for a particular id must completely span the period (with
no gaps - ie: it fishtails so that current from_date=previous to_date) that
starts on 01/01/2003 (the historic date) and continues up to the present.
However;
provided that the data fishtails throughout the period it is allowable for
the most recent record to finish on a date before the current date.
d) there can be no duplicate data (even if the rows are the same eg: two records
with the same from and to dates.)
e) no to_date can be strictly greater than the current date.

So, you can see (I hope) that the logic requires me to valiodate "sets" of rows
(defined by the id) such that the following are "sets" are accepted:

HistDate Now
| |
ID: 1 |-----|
ID: 1 |-----------|
ID: 1 |------|
ID: 1 |----------|
ID: 2 |---------|
ID: 2 |---|
ID: 2 |-----|
ID: 2 |---------|
ID: 3 |---------------------------------|
ID: 4 |-----------------------------------|
ID: 5 |------------------|
ID: 5 |-|


and the following sets would be rejected:

HistDate Now
| |
ID: 6 |-----| <- before HistDate
ID: 7 |-----------| <- a gap
ID: 7 |------|
ID: 7 |------------|
ID: 8 |---------|
ID: 8 |-----| <- overlap
ID: 8 |-----|
ID: 8 |---------|
ID: 9 |-------------------------------------| <- greater than current date
ID: 9 |---------|
ID: 9 |--------------------|
ID: 9 |--------------------| <- duplicate
ID: 9 |----|

A "full set of data" means a consistent set of data for the id.

When loading the data I could order the loads by id and from_date to simplify
the validation. In that the data would be loaded in a (from_date) time ordered
fashion, which would satify the logic described above.


Best Regards,
[END]

"Is there only one possible "covering set" per id?"

-->Yes, there must be only one such covering set.

"what I mean -- is for id 1 -- does that represent ALL of the rows in the input set or did we have to filter out some/many of the rows with id=1 in order to find this nice covering set?"

-->Exactly. For id 1. you see the results of filtering out rows for id 1 that do not satisfy the described constraints. THe result is the covering set.

and Tom said...

this is pretty easy with LAG and LEAD functionality. The following demonstrates the approach:

ops$tkyte@ORA920PC> select id, d1, d2, max(what) over (partition by id)
2 from (
3 select id, d1, d2, last_d1, last_d2, next_d1, next_d2,
4 case when d1 < to_date( '01-jan-2003' )
5 then 'Before Hist Date'
6 when d2 > trunc(sysdate)
7 then 'Greater than current date'
8 when max_d2 != trunc(sysdate)
9 then 'Not upto End date'
10 when d2 < next_d1
11 then 'Gap'
12 when d2 != next_d1
13 then 'Overlap'
14 end what
15 from ( select id, d1, d2,
16 lag(d1) over ( partition by id order by d1) last_d1,
17 lag(d2) over ( partition by id order by d2) last_d2,
18 lead(d1) over ( partition by id order by d1) next_d1,
19 lead(d2) over ( partition by id order by d2) next_d2,
20 max(d2) over (partition by id ) max_d2
21 from t
22 )
23 )
24 order by 1, 2, 3
25 /

ID D1 D2 MAX(WHAT)OVER(PARTITIONBY
---------- --------- --------- -------------------------
1 01-JAN-03 15-JAN-03
15-JAN-03 29-JAN-03
29-JAN-03 29-JAN-04
29-JAN-04 09-FEB-04

6 31-DEC-02 15-JAN-03 Before Hist Date
15-JAN-03 09-FEB-04 Before Hist Date

7 01-JAN-03 14-JAN-03 Gap
15-JAN-03 29-JAN-03 Gap
29-JAN-03 29-JAN-04 Gap
29-JAN-04 09-FEB-04 Gap

8 01-JAN-03 15-JAN-03 Overlap
15-JAN-03 29-JAN-03 Overlap
20-JAN-03 29-JAN-04 Overlap
29-JAN-04 09-FEB-04 Overlap

9 01-JAN-03 15-JAN-03 Not upto End date
15-JAN-03 29-JAN-03 Not upto End date
29-JAN-03 29-JAN-04 Not upto End date
29-JAN-04 10-FEB-04 Not upto End date

10 01-JAN-03 15-JAN-03 Overlap
15-JAN-03 29-JAN-03 Overlap
29-JAN-03 29-JAN-04 Overlap
29-JAN-03 29-JAN-04 Overlap
29-JAN-04 09-FEB-04 Overlap

11 01-JAN-03 15-JAN-03 Not upto End date
15-JAN-03 29-JAN-03 Not upto End date
29-JAN-03 29-JAN-04 Not upto End date
29-JAN-04 08-FEB-04 Not upto End date


27 rows selected.



I have not 100% completed the CASE statement -- you'll need to fine tune that -- but you see the idea...

Rating

  (15 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

I have the idea..

Matt, February 09, 2004 - 5:02 pm UTC

The query looks good, thanks - I think this does what I need (or at least gets me 90% of the way there). Are you still considering writing that book about analytics?

I also need to constrain the schema so that only the data that satifies these rules will be accepted into the table. Is the approach in 9.2 the same as in </code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:474221407101 <code>?


Tom Kyte
February 09, 2004 - 8:30 pm UTC

yes it would be similar -- you remember what ID's are put in and run this "query" against that id looking for a non-null "what"




Getting ORA-04091: table X.Y is mutating, trigger/function may not see it

Matt, March 02, 2004 - 6:18 pm UTC

So, I implemented the query as a "before insert on <table> for each row trigger". Basically before inserting the data into the table the trigger carries out the select (similar to above) to validate the data. If a no-null value is returned - I raise an exception.

I was surprised to encounter ORA-04091. Reading,

</code> http://download-west.oracle.com/docs/cd/B10501_01/appdev.920/a96590/adg13trg.htm#786

I understand that this is causes by the table inserts occuring in parallel whilst the query in the trigger is running (my inserts occur in bulk).

I also read "Avoiding Mutating Tables":

http://asktom.oracle.com/~tkyte/Mutate/index.html <code>

It seems to me that I fall into "Case 1 - you only need to access the :new values". However, I am only concerned with one table and so I am a little confused as to how I get this approach to work.

Thinking out load...I can't see a different way to implement my requirement. I need to query the table into which I am inserting in order to verify that the :new values will not break my validation 'rules'. I need to validate the :new data before it gets into the table ie: a before insert trigger....I also can't see how I can use the approach in "Avoiding Mutating Tables".

Can you please summarise how this could work....?

Below is the trigger code:

create or replace trigger mre_validate_fishtail_trg
before insert on grms_mig_meter_reading_engy
for each row
declare
v_classification varchar2(50);
v_historic_start_date constant date := globals.get_hsd;
v_extract_date constant date := globals.get_extract_date;
begin

select classification
into v_classification
from ( select max(record_classification) over (partition by mirn) classification
-- ,mirn
-- ,mirn_chksum
-- ,reading_type
-- ,start_date
-- ,end_date
-- ,energy_value
from ( select mirn
-- ,mirn_chksum
-- ,reading_type
-- ,start_date
-- ,end_date
-- ,energy_value
,case when start_date < trunc(v_historic_start_date) then 'Start Date Before Historical Start Date'
when end_date > trunc(v_extract_date) then 'End Date Greater than Extract Date' -- extract date
when max_ed > trunc(v_extract_date) then 'Max End Date Greater than Extract Date' -- extract date
when end_date < next_sd then 'Data Set has Gap'
when end_date != next_sd then 'Data Set has Overlap'
end record_classification
from ( select mirn
-- ,mirn_chksum
-- ,reading_type
,start_date
,end_date
-- ,energy_value
-- ,lag(start_date) over ( partition by id order by start_date) last_sd
-- ,lag(end_date) over ( partition by id order by end_date) last_ed
,lead(start_date) over ( partition by mirn order by start_date) next_sd
-- ,lead(end_date) over ( partition by id order by end_date) next_ed
,max(end_date) over (partition by mirn ) max_ed
from ( select mirn
-- ,mirn_chksum
-- ,reading_type
,start_date
,end_date
-- ,energy_value
from grms_mig_meter_reading_engy --extern_meter_reading_engy
where mirn = :new.mirn
union all
select :new.mirn
-- ,:new.mirn_chksum
-- ,:new.reading_type
,:new.start_date
,:new.end_date
-- ,:new.energy_value
from dual
)
)
)
) where rownum = 1;

if v_classification = 'Start Date Before Historical Start Date' then
-- raise an exception
exception_pkg.throw(exception_pkg.startdate_too_early);
elsif v_classification = 'End Date Greater than Extract Date' then
-- raise an exception
exception_pkg.throw(exception_pkg.enddate_too_late);
elsif v_classification = 'Max End Date Greater than Extract Date' then
-- raise an exception
exception_pkg.throw(exception_pkg.max_enddate_too_late);
elsif v_classification = 'Data Set has Gap' then
-- raise an exception
exception_pkg.throw(exception_pkg.reading_makes_gap);
elsif v_classification = 'Data Set has Overlap' then
-- raise an exception
exception_pkg.throw(exception_pkg.reading_makes_overlap);
elsif v_classification is null then
-- record is to be accepted
null;
else
-- nothing else can be returned
null;
end if;

exception
when others then
raise;
end gi_populate_gatepoint_id_trg;


Tom Kyte
March 02, 2004 - 7:31 pm UTC

well, it would not work row by row at all -- think about it.

How many GAPS would you see "artificially" cause the data comes in "some order". Unless the data came in perfectly (something you really cannot control), you would see gaps.

the mutating table error is there to protect you really -- it is saying "hey, the table is totally inconsistent and violates RI at this point, you cannot see it"

it is not until all rows have been inserted by the insert can you begin to validate.


Suggest you log the "mirn" value in a plsql table in the row trigger (as shown in the workaround) and in an AFTER trigger, validate the MIRNS either in one big sql or mirn by mrin.

Response

Matt, March 02, 2004 - 8:02 pm UTC

I understand what you are saying - but as the data is loaded in bulk from a file (via an external table) I can 'sort' the input and control how the data arrives to avoid errors. There will be no ongoing requirement to keep the data consistent once this load has occurred (I am loading into an intermediate, "temporary" area that will not be maintained after the data has been loaded and validated - once validated the data is moved elsewhere).

So, in terms of the mutating table problem and your solution, let me clarify. You suggest:

1) create a "before insert or update on <tab> for each row trigger" to log the mirn (and the rest of the records data) into a temp or plsql table.
2) create an "after insert or update on <tab>" statement level trigger to validate the records in the table that were created by step 1)

So, if I understand correctly the bulk insert into my table will instead insert into another "temp" table (in the before trigger) before doing all the validation in the after trigger. This "temp" table would need to be a copy of the target table excluding check constraints etc (any constraints on the target table woudl be verified when the after trigger tries to insert the data).

Is the above correct?

I am concerned that if the after trigger is to process the data in bulk then it will need to handle constraint violation errors raised by the bulk insert and this may cause rows to be lost (and as I also log the errors and the data that caused the error, I may lose these too).

Is this a vaild concern?

Thanks.



Tom Kyte
March 03, 2004 - 7:56 am UTC

1) and 2) are correct but the conclusion is wrong.

You will insert into the real table. Instead of trying to validate row by row, you'll just remember the mirn's you wish to validate.

AFTER all rows have been inserted, you'll process the mirn's against the live data (the AFTER "not for each row" trigger can read the table, the data is consistent)

By doing it in an AFTER trigger, you'll still have ACID properties (all rows loaded, or no rows loaded).

What about the errors?

Matt, March 03, 2004 - 4:43 pm UTC

Please bear with me. I just about have this...

1) create a "before insert or update on <tab> for each row trigger" to log the mirn (and the rest of the records data) into a temp or plsql table.
2) create an "after insert or update on <tab>" statement level trigger to validate the records in the table that were created ( in the temp or plsql table by step 1)

When the "after insert or update on <tab>" trigger has fired, the rows will be in the "real" table and I will be able to validate them. I will identify the records to validate using my "temp or pl/sql table" which will store the info I need to identify the data I need to validate.

I assume I must store all the data records I need to validate in my plsql table (since there will be no rowids to log from the before trigger)?

My after trigger must then validate the data I have loaded and either reject the whole set of data or individual rows (according to how I decide to implement this). The after trigger can then delete any rows that fail my validation and raise an appropriate exception.

I think this is correct. I'm just struggling a bit conceptualising the order that triggers fire and what can be "seen" from each trigger.

Tom Kyte
March 03, 2004 - 4:55 pm UTC

1) just mirn, that is all you need

2) mirn by mirm, just as you did in the row by row attempt


You just need to store the unique MIRNs as that is what you were validating, all records associated with a MIRN.

The after trigger, if it "fails" -- raises an exception -- will automagically rollback the work done by the insert. you need do NOTHING other then

a) succeed (data stays)
b) fail (data goes away)




Same scenario

Muhammad Ibrahim, March 04, 2004 - 4:03 am UTC

Dear Tom,

My table is

Sql> Create Table t ( Block_Id, Master_Id, Start_Date, End_Date );

INSERT INTO T ( MASTER_ID, BLOCK_ID, START_DATE, END_DATE ) VALUES (
1, 1, TO_Date( '07/01/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '12/31/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO T ( MASTER_ID, BLOCK_ID, START_DATE, END_DATE ) VALUES (
1, 1, TO_Date( '01/01/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '12/31/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO T ( MASTER_ID, BLOCK_ID, START_DATE, END_DATE ) VALUES (
2, 1, TO_Date( '01/01/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '12/31/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO T ( MASTER_ID, BLOCK_ID, START_DATE, END_DATE ) VALUES (
3, 1, TO_Date( '04/01/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '03/31/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO T ( MASTER_ID, BLOCK_ID, START_DATE, END_DATE ) VALUES (
4, 1, TO_Date( '01/01/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '12/31/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO T ( MASTER_ID, BLOCK_ID, START_DATE, END_DATE ) VALUES (
1, 1, TO_Date( '09/01/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '12/31/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
COMMIT;


My result set should look like below ( Simply i need to split the intervals for a
given Block_Id and get all the intervals )

Start_Date End_Date

01/01/2003 31/3/2003 -- First Interval because of MAster_Id 1 start_date (first start date)
-- and since Master id 3 starts in 01/04/2003 so the end date will
-- be one day before that.
01/04/2003 30/06/2003 -- Second Interval because of Master Id 3 start_date (first start_date)
01/07/2003 31/08/2003 -- Third Interval because of MAster_Id 1 start_date ( second start_date)
01/09/2003 31/12/2003 -- Fourth Interval because of MAster_Id 1 Start_date( third start_date)
01/01/2004 31/3/2004 -- Fivth Interval because of Master_Id 4 Start_Date ( First start_date ) and
and Master_Id 3 end_date which ends in 31/3/2004 so the next interval
starts as 01/04/2004
01/04/2004 31/12/2004 -- There is no other start or end date falls in between any of these intervals
-- if either start or end date falls i need to split again

I tried to use lead but i didnt get the exact result

below is my query

Select Dates.Start_Date,
Lead( Dates.Start_Date ) Over ( Partition By Dates.Block_Id Order By Dates.Start_Date ) - 1 End_Date
From ( Select Distinct Start_Date Start_Date, Block_Id
From t
Union
Select Distinct End_Date Start_Date, Block_Id
From t
Order By 1 ) Dates
Order By 1;

the output is

1/1/2003 3/31/2003
1/4/2003 30/6/2003
1/7/2003 31/8/2003
1/9/2003 30/12/2003 -- this is wrong instead i need 01/09/2003 31/12/2003
31/12/2003 31/12/2003 -- i dont need this
1/1/2004 30/3/2004 -- this is wrong i need 01/01/2004 31/3/2004
then i need final interval as
1/4/2004 31/12/2004

Please let me know how can i acheive this.

Thanks & Regards,
Ibrahim.







Tom Kyte
March 04, 2004 - 11:12 am UTC

Not really following (UNION does a sort distinct -- you should drop the distinct in inline views with a union and the order by inside of there isn't relevant either)

but -- i don't understand what you are doing with the union in the first place.

"Simply i need to split the intervals for a given Block_Id and get all the intervals"

might be "simply" to you -- but it doesn't make sense to me.

Exact Scenarios

Muhammad Ibrahim, March 04, 2004 - 10:19 pm UTC

Well, i can give a real time example. This is for reinsurance retention split
of all the affected policies in the given block(for block accumulation ie to
control the treaty capacity)

Here is a fine example:

In our datamodel each Insurance policy will have a Contract_Id and if we do
endorsement will be a next version of that particular contract_Id.

For each fire policy there will be a location in turn will be attached to a block
identified by Block_Id.

I have a table to store the reinsurance information by situation wise. Here is an
example.

Ri_Table:
---------

Product_Id Number; -- Part of Primary key
Block_Id Number; -- Part of Primary key
Contract_Id Number; -- Part of Primary key
Start_Date Date; -- Part of Primary key
End_Date Date; -- Part of Primary key
Status Varchar2(1); --


1) If suppose Policy1( Contract_Id 1) enters the block A first (assume there is no other policies in the block) and assume the inception date of the policy1 starts at 01/01/2003 and ends at 31/12/2003.

the data will be :

Product_Id Block_Id Contract_Id Start_Date End_Date

3 A 1 01/01/2003 31/12/2003


2) When second policy enters in the same block 'A' eg: Policy2, Contract_Id 2 incepts at
01/04/2003 and ends at 31/03/2004.

Now the situation will be split as

Product_Id Block_Id Contract_Id Start_Date End_Date Status

3 A 1 01/01/2003 31/12/2003 D
-- the old entry becomes invalid!

3 A 1 01/01/2003 31/03/2003 A
-- Only Contract_Id = 1 is active


3 A 1 01/04/2003 31/12/2003 A
3 A 2 01/04/2003 31/12/2003 A
-- both Contract_Id 1,2 are active
-- at this situation because of second policy enters in
-- the same block

3 A 2 01/01/2004 31/03/2004 A
-- Only Contract_Id = 2 is active



3) Now there might be situation for back dated policies means there is a posibility
another policy might enter into the same block with back dated inception date for
example it might incepts in 01/08/2002 and ends in 31/07/2003 say Contract_Id 3


then the split should take place accordingly.

Product_Id Block_Id Contract_Id Start_Date End_Date Status

3 A 3 01/08/2002 31/12/2002 A
-- Only Contract_Id 3 is active at this time.

3 A 1 01/01/2003 31/03/2003 A
3 A 3 01/01/2003 31/03/2003 A
-- Splits one day before Contract_Id 2 starts!
-- only Contract_Id 1 and 3 are active

3 A 1 01/04/2003 31/07/2003 A
3 A 2 01/04/2003 31/07/2003 A
3 A 3 01/04/2003 31/07/2003 A

-- both Contract_Id 1,2 and 3 are active
-- at this situation because of second policy enters in
-- the same block
-- only Contract_Id 1,2 and 3 are active but if you
-- compare this link
-- with the above scenario the end date will be different
-- because Contract_Id 3 ends in 31/07/2003! So i need to
-- invalidate those

-- old above links from point (2) then create this new
-- situation. the below will be second split.

3 A 1 01/08/2003 31/12/2003 A
3 A 2 01/08/2003 31/12/2003 A
-- Only Cotract_Id 1 and 2 are active!

3 A 2 01/01/2004 31/03/2004 A
-- Only Contract_Id = 2 is active because first policy
-- expires.

Before i do the above splitting i need to invalidate the old entries for the same block and the period falls.


4) This might happen for future dated policies which enters in the same block
eg: start date as 25/03/2004 and 24/03/2005. Then the above split should take place accordingly.




Thanks alot for your kind help. I hope atleast or finally i make somebody to understand
my situation! Please give me a good solution. I know only people like Tom can give me a good
solution to solve these kind of complicated situation to do it in oracle.




Great stuff

Arian Stijf, March 12, 2004 - 1:13 pm UTC

Hi Tom,

Thanks to this and some of other posts, I could answer a requirement without using pl/sql (my first idea).
For social security, we need to know everybody who has been on sick-leave for 13 weeks. In calculating the 13 weeks, a recovery for <4 weeks is considered ill. So 5 weeks ill, 1 week recovered, 7 weeks ill, would be considered 13 weeks of sick-leave.

I solved it by using a GTT, inserting:
insert into test_absence
select person_id
, date_end
, case when last_end > date_start-28
then last_end+1
else
date_start+0
end date_start
from (select person_id
, date_start
, date_end
, lag(date_end) over (partition by person_id order by date_end) last_end
from per_absence_attendances paa1)

and finally selecting:
select distinct person_id
, (select min(date_start)
from test_absence paa1
start with paa1.person_id=paa.person_id
and paa1.date_start=paa.date_start
connect by prior date_start-1=date_end
and person_id=paa.person_id) st
, (select max(date_end)
from test_absence paa2
start with paa2.person_id=paa.person_id
and paa2.date_start=paa.date_start
connect by prior date_end+1=date_start
and person_id=paa.person_id) end
from test_absence paa
where person_id=2385
order by 1

However, can you give any clues to more efficient methods?
And do you know how to avoid errors, on overlapping periods?

Thanks,

Arian

Tom Kyte
March 12, 2004 - 7:42 pm UTC

well, without ripping your data model apart and trying to reverse engineer it - I'd suggest looking into range windows and just counting? Not sure tho -- lots of complex things happening there (could just 12 lags in the same row tell you what you need to know? you can lag back a row, 2 rows, N rows whatever)



Regarding three trigger implementation to avoid mutating table

Matt, March 31, 2004 - 6:18 pm UTC

Looking more closely at the implementation described earlier in this post...

Can you please explain what happens when there is a BI, BIFER, and AI trigger is used to avoid the mutating table error and a bulk insert takes place? I am trying to understand what happens in two cases:

1) When a bulk insert is successful
2) When a bulk insert fails for each row.

Imagine we have an array size of 5 for the bulk insert.

In case 1) I understand the following order of events:

* the BI trigger fires once only
* All 5 rows are inserted into the table
* the AIFER trigger fires 5 times
* the AI trigger fires validates each of the 5 rows and the data is loaded

In case 2) I understand the following order of events:

* the BI trigger fires once only
* All 5 rows are inserted into the table
* the AIFER trigger fires 5 times
* the AI trigger fires validates row 1 and raises an exception
* Oracle rollsback the entire 5 row insert
* The last 4 rows are re-inserted
* the BI trigger fires once only
* The remainig 4 rows are inserted into the table
* the AIFER trigger fires 4 times
* the AI trigger fires validates row 1 and raises an exception
* Oracle rollsback the entire 4 row insert
..oracle continues with the remaing 3..then 2 rows...etc until there is no more data to process.

Is my understanding correct? Can you please clarify what does actually happen in these circumstances?

THanks again,



Tom Kyte
March 31, 2004 - 6:27 pm UTC

ops$tkyte@ORA9IR2> create table t ( x int );
 
Table created.
 
ops$tkyte@ORA9IR2> create or replace trigger t_bi before insert on t
  2  begin
  3          dbms_output.put_line( 'before' );
  4  end;
  5  /
 
Trigger created.
 
ops$tkyte@ORA9IR2> create or replace trigger t_bifer before insert on t for each row
  2  begin
  3          dbms_output.put_line( 'before for each row' );
  4  end;
  5  /
 
Trigger created.
 
ops$tkyte@ORA9IR2> create or replace trigger t_ai after insert on t
  2  begin
  3          dbms_output.put_line( 'after' );
  4          raise_application_error( -20001 , 'you lose' );
  5  end;
  6  /
 
Trigger created.
 
ops$tkyte@ORA9IR2> declare
  2          type array is table of number;
  3          l_data array := array(1,2,3,4,5);
  4  begin
  5          forall i in 1 .. l_data.count
  6                  insert into t values ( l_data(i) );
  7  end;
  8  /
before
before for each row
before for each row
before for each row
before for each row
before for each row
after
declare
*
ERROR at line 1:
ORA-20001: you lose
ORA-06512: at "OPS$TKYTE.T_AI", line 3
ORA-04088: error during execution of trigger 'OPS$TKYTE.T_AI'
ORA-06512: at line 5
 
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from t;
 

it is atomic, you either succeed or fail.
 

This is not what I am seeing happening

Matt, March 31, 2004 - 6:57 pm UTC

I am seeing the inserts restart, as I described. I think that this may be a bug in my code. Does the following pseudo-coded make sense?

I am trying to catch errors and then log them in a table. However, I think that I may have misunderstood how sql%rowcount is populated in the case of errors. I seem to be re-starting the insert at the point just after the failure occurred. ie: I am re-processing data that I should have thrown out.

The code does a bulk fetch into arrays that then feeds into the following.

loop
begin
--New Template
forall i in l_start .. l_source_colid1.count
insert into t values ( ... );

dbms_output.put_line( 'This insert processed rows = ' || to_char(sql%rowcount));
exit;
exception
when others then
-- dbms_output.put_line(to_char(l_start));
dbms_output.put_line('');
dbms_output.put_line( 'This insert error number = ' || to_char(l_start));
dbms_output.put_line( 'This insert rows processed = ' || to_char(sql%rowcount));
dbms_output.put_line( 'This insert bad row index = ' || (l_start+sql%rowcount));

dbms_output.put_line( 'Bad row index = ' || (l_start+sql%rowcount) || ' ' || sqlerrm );

l_start := l_start + sql%rowcount + 1;
dbms_output.put_line( 'This insert restarting at ' || to_char(l_start)||' ,finishing at '||to_char(l_source_colid1.count));
-- log errors here
end;
end loop;
dbms_output.put_line('Outer loop');
l_start := 1;
l_fetch_count := l_fetch_count + 1;
exit when c%notfound;
end loop;

Tom Kyte
March 31, 2004 - 7:10 pm UTC

give me a concrete, concise, yet 100% complete example to work with please (like I give you :)

A concrete example

Matt, March 31, 2004 - 7:20 pm UTC

Here you go..using your test above. I have added comments to identify what happens in the real code (a loop that fetches data and stops when no more data to process). THe output is as I described.

declare
-- a cursor defined here
type array is table of number;
l_data array := array(1,2,3,4,5);
l_start number := 1;
begin
-- loop -- a loop to keep getting data from a cursor
-- a fetch in here
loop
begin
forall i in l_start .. l_data.count
insert into t values ( l_data(i) );
exit;
exception
when others then
l_start := l_start + sql%rowcount + 1;
-- log errors here
end;
end loop;
l_start := 1;
-- l_fetch_count := l_fetch_count + 1; -- keep count of how many fetches
-- exit when c%notfound; -- to exit when fetch returns no rows.
-- end loop; -- a loop to keep getting data from the cursor (until no more data
end;
/

before
before for each row
before for each row
before for each row
before for each row
before for each row
after
before
before for each row
before for each row
before for each row
before for each row
after
before
before for each row
before for each row
before for each row
after
before
before for each row
before for each row
after
before
before for each row
after



Tom Kyte
March 31, 2004 - 7:33 pm UTC

ahh, here you are getting an error and sql%rowcount is really "zero" so you are just saying "first row is in error" -- when in fact it could be ANY of the rows since it is a trigger erroring out.

So, what you might do is

exception
when others then
if (sqlcode=-20001) -- or whatever you raise in your trigger
then
log the batch, they are not going in if the after trigger failed
else
l_start := l_start + sql%rowcount + 1;
......

More questions...

Matt, April 01, 2004 - 12:59 am UTC

I just re-read this in light of the recent posts:

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:15457509245121#16128347343143 <code>

The behaviour I require is the following:

1) Data (from a bulk fetch) is bulk inserted into the table. This data may contain many "mirn" records.
2) I want to validate the data for each "mirn" and throw out only the data for the mirn that fails the validation.

For example if the first "mirn" is in the array insert considered "inconsistent" then it should be thrown out, but the second "mirn" in the array insert may be valid, and so should be accepted into the table.

I have just come to realise (after the most recent posts) that in the example above both sets of "mirn" data would be thrown out by the trigger (since all data is thrown out after the first exception is raised).

If my bulk insert contains lots of mirn then any failure may mean that valid data is discarded. Worse still is that all the data will be thrown out based on the same exception (some data might fail for a different reason but this reason would be lost).

Do I understand all of this correctly?

If so, is there a better way to meet my requirements?

Thanks.


Tom Kyte
April 01, 2004 - 10:02 am UTC

instead of have the AFTER trigger "fail", have the after trigger "delete" the bad data and log it.

Another problem...

Matt, April 04, 2004 - 6:35 pm UTC

I am seeing a row get into my state_pkg pl/sql table which should not be there. The row in question should be failing on an FK constraint violation. However, it appears in this table (despite not existing in the heap table into which I am loading). As a result I get a no data found when I carry out any validation for the row.

I am trying to get together a stand alone test case to identify the problem.

In the mean time, any ideas?

Thanks.

Tom Kyte
April 05, 2004 - 9:27 am UTC

lots of debug code :)

Similar problem

Sanjay, February 20, 2008 - 8:57 pm UTC

Tom,
I have a similar problem and am struggling to come up with a solution.

My source data looks like this:
ID START_DT END_DT
1 01/10/2004 01/12/2004
1 01/11/2004 01/11/2004
1 01/13/2004 01/15/2004
1 01/14/2004 01/18/2004
1 03/01/2004 03/05/2004
1 12/10/2006 12/11/2006
2 05/03/2005 05/03/2005
2 05/04/2005 05/06/2005
2 11/01/2007 11/10/2007
3 09/08/2006 09/10/2006
3 02/04/2008 02/05/2008

Within a given ID if the START_DT is less than or equal to the prior record's END_DT+1 then it is considered a continuous period and I need the min START_DT and max END_DT for the continuous period. So the result set should look as below (see MIN_START_DT and MAX_END_DT fields that need to be calculated):

ID START_DT END_DT MIN_START_DT MAX_END_DT
1 01/10/2004 01/12/2004 01/10/2004 01/18/2004
1 01/11/2004 01/11/2004 01/10/2004 01/18/2004
1 01/13/2004 01/15/2004 01/10/2004 01/18/2004
1 01/14/2004 01/18/2004 01/10/2004 01/18/2004
1 03/01/2004 03/05/2004 03/01/2004 03/05/2004
1 12/10/2006 12/11/2006 12/10/2006 12/11/2006
2 05/03/2005 05/03/2005 05/03/2005 05/06/2005
2 05/04/2005 05/06/2005 05/03/2005 05/06/2005
2 11/01/2007 11/10/2007 11/01/2007 11/10/2007
3 09/08/2006 09/10/2006 09/08/2006 09/10/2006
3 02/04/2008 02/05/2008 02/04/2008 02/05/2008

Is there a way to do this in SQL without resorting to PL/SQL? I am sure there is a solution using LEAD and LAG but just can't figure out how. Thanks for your valuable time.

Sunny
Tom Kyte
February 20, 2008 - 9:02 pm UTC

https://www.oracle.com/technetwork/issue-archive/2014/14-mar/o24asktom-2147206.html

analytics to the rescue

that exact same technique can be employed by you. I would have shown you, but I don't have your table or data - and you didn't share it with me in the form of create table/insert.

but you have the technique, you can do this - just understand what I was doing there and this will be easy.

Sunny, February 20, 2008 - 11:35 pm UTC

Thank you Tom.

I tried that approach but ran into some snags. Here is the create table and insert script.

create table dt_test(
id number,
subid number,
strt_dt date,
end_dt date,
sub varchar2(10)
);

alter session set nls_date_format = 'yyyy-mm-dd;

insert into dt_test values(1,1,'2008-01-01','2008-01-10','S1');
insert into dt_test values(1,2,'2008-01-01','2008-01-10','S1');
insert into dt_test values(1,3,'2008-01-01','2008-01-10','S1');
insert into dt_test values(2,1,'2008-01-09','2008-01-20','S1');
insert into dt_test values(2,2,'2008-01-09','2008-01-20','S1');
insert into dt_test values(2,3,'2008-01-09','2008-01-20','S1');
insert into dt_test values(2,4,'2008-01-09','2008-01-20','S1');
insert into dt_test values(4,1,'2008-01-21','2008-01-25','S1');
insert into dt_test values(4,2,'2008-01-21','2008-01-25','S1');
insert into dt_test values(3,1,'2008-02-01','2008-03-31','S1');
insert into dt_test values(3,2,'2008-02-01','2008-03-31','S1');
insert into dt_test values(5,1,'2008-04-01','2008-05-31','S1');
insert into dt_test values(5,2,'2008-04-01','2008-05-31','S1');
insert into dt_test values(6,1,'2008-04-01','2008-05-31','S2');
insert into dt_test values(6,2,'2008-04-01','2008-05-31','S2');
insert into dt_test values(6,3,'2008-04-01','2008-05-31','S2');

Now I need to calculate the Min STRT_DT and Max END_DT for each continuous range. So the result set
needs to be - see the MIN_STRT_DT and MAX_END_DT fields that are to be calculated. basically for each Sub I need to find continuous ranges (defined as START_DT -1 is less than or equal to prior END_DT)


ID SUBID STRT_DT END_DT SUB MIN_STRT_DT MAX_END_DT
1 1 2008-01-01 2008-01-10 S1 2008-01-01 2008-01-25
1 2 2008-01-01 2008-01-10 S1 2008-01-01 2008-01-25
1 3 2008-01-01 2008-01-10 S1 2008-01-01 2008-01-25
2 1 2008-01-09 2008-01-20 S1 2008-01-01 2008-01-25
2 2 2008-01-09 2008-01-20 S1 2008-01-01 2008-01-25
2 3 2008-01-09 2008-01-20 S1 2008-01-01 2008-01-25
2 4 2008-01-09 2008-01-20 S1 2008-01-01 2008-01-25
4 1 2008-01-21 2008-01-25 S1 2008-01-01 2008-01-25
4 2 2008-01-21 2008-01-25 S1 2008-01-01 2008-01-25
3 1 2008-02-01 2008-03-31 S1 2008-02-01 2008-05-31
3 2 2008-02-01 2008-03-31 S1 2008-02-01 2008-05-31
5 1 2008-04-01 2008-05-31 S1 2008-02-01 2008-05-31
5 2 2008-04-01 2008-05-31 S1 2008-02-01 2008-05-31
6 1 2008-04-01 2008-05-31 S2 2008-04-01 2008-05-31
6 2 2008-04-01 2008-05-31 S2 2008-04-01 2008-05-31
6 3 2008-04-01 2008-05-31 S2 2008-04-01 2008-05-31

Thanks.
Tom Kyte
February 21, 2008 - 7:11 am UTC

elaborate on this:

I tried that approach but ran into some snags.

what snags did you run into.


your test data stinks, look at it. Not only does the example not run (missing quote), but the DATES DO NOT CHANGE IN A GROUP.

start playing with this query

ops$tkyte%ORA11GR1> select id, subid, strt_dt, end_dt, sub,
  2         lag(end_dt) over (partition by id order by strt_dt) last_end_dt,
  3             case when strt_dt-1 <= lag(end_dt) over (partition by id order by strt_dt) then '<<<====' end flag
  4    from dt_test
  5  /


If you understand the article you read and you get some good data - you can answer this yourself.

If you do not understand the article/technique - play with it until you do.

SQL as a language

Robert, February 20, 2008 - 11:38 pm UTC

Queries like these demonstrate why SQL is such a beautiful and powerful language! :)

Once you've developed in SQL, it's almost impossible to go back to just a 3GL language.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.