Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, John.

Asked: November 23, 2016 - 2:34 pm UTC

Last updated: November 26, 2016 - 3:03 am UTC

Version: 11.2.0.4

Viewed 1000+ times

You Asked

Hi,
I have a question on data warehousing and am looking for guidance how this can be best achieved using features of the database.
Assuming we have a very large table in one instance and are looking to retrieve all the new records from it each evening.

Create Table xxdemo As
  Select 'Record ' || To_Char(level, 'fm9999') myrecord, Sysdate -1 last_update_date
    From dual
  Connect By Level < 10000;
  
Insert Into xxdemo(myrecord, last_update_date)
  Select 'Rec ' || To_Char(level, 'fm9999') myrecord, Sysdate last_update_date
    From dual
  Connect By Level < 10000;
  
Create Index xxdemo_n1 On xxdemo(last_update_date);



I often see processes which do this:

Select * 
  From xxdemo
 Where last_update_date >= :last_extract_date


So periodically (possibly each evening) the process runs and retrieves all records which were updated/created since the last run. At that point the last_extract_date is set to the current date in readiness for the next run.

My issue with this is that it can lead to loss of data if we have transactions which have started prior to the "extract" query being run and commit afterwards.

For example, assume the following dummy timestamps:

14:20:00. Begin Transaction A
14:20:01. Begin extract routine
14:20:05. Complete extract routine and set last extract date to 14:20:01
14:20:10. Commit Transaction A

So the following night I extract records from 14:20:01 onwards, resulting in me missing Transaction A.

A lot of the implementations I see (including Oracle's own BI Applications routines) use a concept of "Prune Days" where they offset the date by X number of days, so you're always retrieving at least X days worth of data more than you realistically need. The insert insert is then:

Insert Into xxdemo(...)
Values (...)
Where Not Exist (Select 'Already Exists' From xxdemo);


However to me that seems a bit erm... flaky. And like there is a lot of overhead processing done every night for something that is realistically likely to happen once in a blue moon.

So... short of locking the entire table during the update etc, is there any particular method or approach you would recommend for achieving such functionality? I've heard MV Logs mentioned in the past by someone as a possible solution however I'm not sure if/how that would work?

Any suggestions or pointers greatly appreciated.

Thanks,
John

and Connor said...

Dont forget you don't need to lock the table during the update, only to get a consistent set of records.

So you can code something like this:

- lock table T in exclusive mode wait 10;
- open cursor for select * from T where last_udpate > ...;
- commit;
- now do your fetching

So the lock is only held momentarily.

Alternatively, if the "last_update" is a long time in the past (eg, you're getting "yesterdays" records, and hence the likelihood of an open transaction is very low, you could query v$transaction as a double check to make sure no open transactions are that old.

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Locking table

A reader, November 24, 2016 - 11:04 am UTC

Hi,
Thanks for the suggestions. In reality the extracts aren't a select from a single table, they join off to attribute tables instead, so locking the table could potentially last longer than I'd want. There's also the issue that if a transaction started prior to the extract run but lasted longer than the wait, the extract process would fail.
I guess I was looking for a process which would still allow new transactions to come in that spanned the extract routine in one way or another, however would then capture those in the next run if we knew they were uncommitted at the point the previous run started.
Chris Saxon
November 24, 2016 - 12:45 pm UTC

You could look into using flashback "as of scn" queries. Get the SCN at the time the extract starts. Then you can fetch all the new rows between the last and current with queries like:

select * from t versions between scn :st_scn and :en_scn
where  versions_operation = 'I';

Understanding Multi-version Read consistency

Rajeshwaran, Jeyabal, November 25, 2016 - 8:11 am UTC

In the Session#1 i do this.

demo@ORA11G>
demo@ORA11G> create table oltp as
  2  select a.*,sysdate as last_updated
  3  from all_users a
  4  where rownum <=5;

Table created.

demo@ORA11G>
demo@ORA11G> create or replace trigger last_update_trigger
  2  before insert or update
  3  on oltp for each row
  4  begin
  5     :new.last_updated := sysdate;
  6  end;
  7  /

Trigger created.

demo@ORA11G>
demo@ORA11G> exec dbms_lock.sleep(5);

PL/SQL procedure successfully completed.

demo@ORA11G>
demo@ORA11G> update oltp set username='ABC' where rownum = 1;

1 row updated.

demo@ORA11G>
demo@ORA11G> /* *******************************************************************
demo@ORA11G>  Keep this Transaction un-commited and run dw2.sql in session#2
demo@ORA11G> ******************************************************************* */
demo@ORA11G>
demo@ORA11G>


with un-committed transaction in session#1 - I do this in session#2 (data refresh scenario)


demo@ORA11G>
demo@ORA11G> alter session set nls_date_format='dd-mon-yyyy hh12:mi:ss am';

Session altered.

demo@ORA11G>
demo@ORA11G> column first_refresh new_val fr;
demo@ORA11G> select sysdate as first_refresh from dual;

FIRST_REFRESH
-----------------------
25-nov-2016 01:07:13 pm

1 row selected.

demo@ORA11G> create table copy as select * from oltp;

Table created.

demo@ORA11G> exec dbms_lock.sleep(5);

PL/SQL procedure successfully completed.

demo@ORA11G> column next_refresh new_val nr;
demo@ORA11G> select sysdate as next_refresh from dual;

NEXT_REFRESH
-----------------------
25-nov-2016 01:07:29 pm

1 row selected.

demo@ORA11G>
demo@ORA11G>
demo@ORA11G> select * from oltp
  2  where last_updated >= to_date('&fr','dd-mon-yyyy hh12:mi:ss am');
old   2: where last_updated >= to_date('&fr','dd-mon-yyyy hh12:mi:ss am')
new   2: where last_updated >= to_date('25-nov-2016 01:07:13 pm','dd-mon-yyyy hh12:mi:ss am')

no rows selected

demo@ORA11G> merge into copy t1
  2  using ( select * from oltp
  3  where last_updated >= to_date('&fr','dd-mon-yyyy hh12:mi:ss am') ) t2
  4  on ( t1.user_id = t2.user_id )
  5  when matched then
  6     update SET t1.username = t2.username,
  7            t1.last_updated = t2.last_updated
  8  when not matched then
  9     insert (username,user_id,created) values( t2.username,t2.user_id,t2.created)
 10  /
old   3: where last_updated >= to_date('&fr','dd-mon-yyyy hh12:mi:ss am') ) t2
new   3: where last_updated >= to_date('25-nov-2016 01:07:13 pm','dd-mon-yyyy hh12:mi:ss am') ) t2

0 rows merged.

demo@ORA11G>
demo@ORA11G> select 'OLTP',oltp.* from oltp minus select 'OLTP',copy.* from copy
  2  union all
  3  select 'COPY',copy.* from copy minus select 'COPY',oltp.* from oltp;

no rows selected

demo@ORA11G>
demo@ORA11G> /* *******************************************************************
demo@ORA11G>    go back to session#1 and do a Commit and come back here
demo@ORA11G>  ******************************************************************* */
demo@ORA11G>


Before creating the table, recorded SYSDATE as "first_refresh", created the table, waited for 5sec and recorded SYSDATE as "next_refresh"

Then check if stuff available to refresh since the first refresh - since Oracle don't read un-committed data - that un-committed transaction from Session#1 don't show up now.

no changes, so nothing to refresh.

Now, let me switch back to Session#1 and do COMMIT;

demo@ORA11G> commit;

Commit complete.

demo@ORA11G>


Now come back to Session#2 and see if any mismatch between the two sources.

demo@ORA11G>
demo@ORA11G> select 'OLTP',oltp.* from oltp minus select 'OLTP',copy.* from copy
  2  union all
  3  select 'COPY',copy.* from copy minus select 'COPY',oltp.* from oltp;

'OLT USERNAME                          USER_ID CREATED                 LAST_UPDATED
---- ------------------------------ ---------- ----------------------- -----------------------
COPY SYS                                     0 09-oct-2013 06:23:45 pm 25-nov-2016 01:00:12 pm
OLTP ABC                                     0 09-oct-2013 06:23:45 pm 25-nov-2016 01:00:18 pm

2 rows selected.


Yes available, so let me do a refresh now using the "next_refresh_dt"

demo@ORA11G> select to_date('&fr','dd-mon-yyyy hh12:mi:ss am') as first_refresh,
  2     to_date('&nr','dd-mon-yyyy hh12:mi:ss am') as next_refresh
  3  from dual;
old   1: select to_date('&fr','dd-mon-yyyy hh12:mi:ss am') as first_refresh,
new   1: select to_date('25-nov-2016 01:07:13 pm','dd-mon-yyyy hh12:mi:ss am') as first_refresh,
old   2:        to_date('&nr','dd-mon-yyyy hh12:mi:ss am') as next_refresh
new   2:        to_date('25-nov-2016 01:07:29 pm','dd-mon-yyyy hh12:mi:ss am') as next_refresh

FIRST_REFRESH           NEXT_REFRESH
----------------------- -----------------------
25-nov-2016 01:07:13 pm 25-nov-2016 01:07:29 pm

1 row selected.

demo@ORA11G> merge into copy t1
  2  using ( select * from oltp
  3  where last_updated >= to_date('&nr','dd-mon-yyyy hh12:mi:ss am') ) t2
  4  on ( t1.user_id = t2.user_id )
  5  when matched then
  6     update SET t1.username = t2.username,
  7            t1.last_updated = t2.last_updated
  8  when not matched then
  9     insert (username,user_id,created) values( t2.username,t2.user_id,t2.created)
 10  /
old   3: where last_updated >= to_date('&nr','dd-mon-yyyy hh12:mi:ss am') ) t2
new   3: where last_updated >= to_date('25-nov-2016 01:07:29 pm','dd-mon-yyyy hh12:mi:ss am') ) t2

0 rows merged.


Now, nothing is there to refresh - because the changes took place before the "next_refresh_dt" - which is not available to us.

at the end even after the refresh, there were some mismatches.

demo@ORA11G> select 'OLTP',oltp.* from oltp minus select 'OLTP',copy.* from copy
  2  union all
  3  select 'COPY',copy.* from copy minus select 'COPY',oltp.* from oltp;

'OLT USERNAME                          USER_ID CREATED                 LAST_UPDATED
---- ------------------------------ ---------- ----------------------- -----------------------
COPY SYS                                     0 09-oct-2013 06:23:45 pm 25-nov-2016 01:00:12 pm
OLTP ABC                                     0 09-oct-2013 06:23:45 pm 25-nov-2016 01:00:18 pm

2 rows selected.

demo@ORA11G> select to_date('&fr','dd-mon-yyyy hh12:mi:ss am') as first_refresh,
  2     to_date('&nr','dd-mon-yyyy hh12:mi:ss am') as next_refresh
  3  from dual;
old   1: select to_date('&fr','dd-mon-yyyy hh12:mi:ss am') as first_refresh,
new   1: select to_date('25-nov-2016 01:07:13 pm','dd-mon-yyyy hh12:mi:ss am') as first_refresh,
old   2:        to_date('&nr','dd-mon-yyyy hh12:mi:ss am') as next_refresh
new   2:        to_date('25-nov-2016 01:07:29 pm','dd-mon-yyyy hh12:mi:ss am') as next_refresh

FIRST_REFRESH           NEXT_REFRESH
----------------------- -----------------------
25-nov-2016 01:07:13 pm 25-nov-2016 01:07:29 pm

1 row selected.

demo@ORA11G>



The solution would be rely on the least transaction from v$transaction rather than just SYSDATE.

something like this ( will start from the scratch again)

from session-1 ( did this )


demo@ORA11G> conn demo/demo@ora11g
Connected.
demo@ORA11G> drop table oltp;

Table dropped.

demo@ORA11G>
demo@ORA11G> create table oltp as
  2  select a.*,sysdate as last_updated
  3  from all_users a
  4  where rownum <=5;

Table created.

demo@ORA11G>
demo@ORA11G> create or replace trigger last_update_trigger
  2  before insert or update
  3  on oltp for each row
  4  begin
  5     :new.last_updated := sysdate;
  6  end;
  7  /

Trigger created.

demo@ORA11G>
demo@ORA11G> exec dbms_lock.sleep(5);

PL/SQL procedure successfully completed.

demo@ORA11G>
demo@ORA11G> update oltp set username='ABC' where rownum = 1;

1 row updated.

demo@ORA11G>
demo@ORA11G> /* *******************************************************************
demo@ORA11G>  Keep this Transaction un-commited and run dw2.sql in session#2
demo@ORA11G> ******************************************************************* */
demo@ORA11G>
demo@ORA11G>


From the session-2, now we did this.

demo@ORA11G> conn demo/demo@ora11g
Connected.
demo@ORA11G> drop table copy;

Table dropped.

demo@ORA11G>
demo@ORA11G> alter session set nls_date_format='dd-mon-yyyy hh12:mi:ss am';

Session altered.

demo@ORA11G>
demo@ORA11G> column first_refresh new_val fr;
demo@ORA11G> select sysdate, min(to_date(start_time,'mm/dd/rr hh24:mi:ss')) start_time ,
  2      nvl( min(to_date(start_time,'mm/dd/rr hh24:mi:ss'))
  3          , sysdate) as first_refresh
  4  from v$transaction;

SYSDATE                 START_TIME              FIRST_REFRESH
----------------------- ----------------------- -----------------------
25-nov-2016 01:31:22 pm 25-nov-2016 01:23:10 pm 25-nov-2016 01:23:10 pm

1 row selected.

demo@ORA11G> create table copy as select * from oltp;

Table created.

demo@ORA11G>
demo@ORA11G> exec dbms_lock.sleep(5);

PL/SQL procedure successfully completed.

demo@ORA11G> column next_refresh new_val nr;
demo@ORA11G> select sysdate, min(to_date(start_time,'mm/dd/rr hh24:mi:ss')) start_time ,
  2      nvl( min(to_date(start_time,'mm/dd/rr hh24:mi:ss'))
  3          , sysdate) as next_refresh
  4  from v$transaction;

SYSDATE                 START_TIME              NEXT_REFRESH
----------------------- ----------------------- -----------------------
25-nov-2016 01:32:00 pm 25-nov-2016 01:23:10 pm 25-nov-2016 01:23:10 pm

1 row selected.

demo@ORA11G> select * from oltp
  2  where last_updated >=  to_date('&fr','dd-mon-yyyy hh12:mi:ss am');
old   2: where last_updated >=  to_date('&fr','dd-mon-yyyy hh12:mi:ss am')
new   2: where last_updated >=  to_date('25-nov-2016 01:23:10 pm','dd-mon-yyyy hh12:mi:ss am')

no rows selected

demo@ORA11G>
demo@ORA11G> merge into copy t1
  2  using ( select * from oltp
  3  where last_updated >= to_date('&fr','dd-mon-yyyy hh12:mi:ss am') ) t2
  4  on ( t1.user_id = t2.user_id )
  5  when matched then
  6     update SET t1.username = t2.username,
  7            t1.last_updated = t2.last_updated
  8  when not matched then
  9     insert (username,user_id,created) values( t2.username,t2.user_id,t2.created)
 10  /
old   3: where last_updated >= to_date('&fr','dd-mon-yyyy hh12:mi:ss am') ) t2
new   3: where last_updated >= to_date('25-nov-2016 01:23:10 pm','dd-mon-yyyy hh12:mi:ss am') ) t2

0 rows merged.

demo@ORA11G>
demo@ORA11G> select 'OLTP',oltp.* from oltp minus select 'OLTP',copy.* from copy
  2  union all
  3  select 'COPY',copy.* from copy minus select 'COPY',oltp.* from oltp;

no rows selected

demo@ORA11G>
demo@ORA11G> /* *******************************************************************
demo@ORA11G>    go back to session#1 and do a Commit and come back here
demo@ORA11G>  ******************************************************************* */
demo@ORA11G>


Switch back into session#1 and do commit

demo@ORA11G>
demo@ORA11G> commit;

Commit complete.

demo@ORA11G>


came back to session#2 and did this.

demo@ORA11G>
demo@ORA11G> select 'OLTP',oltp.* from oltp minus select 'OLTP',copy.* from copy
  2  union all
  3  select 'COPY',copy.* from copy minus select 'COPY',oltp.* from oltp;

'OLT USERNAME                          USER_ID CREATED                 LAST_UPDATED
---- ------------------------------ ---------- ----------------------- -----------------------
COPY SYS                                     0 09-oct-2013 06:23:45 pm 25-nov-2016 01:23:05 pm
OLTP ABC                                     0 09-oct-2013 06:23:45 pm 25-nov-2016 01:23:10 pm

2 rows selected.

demo@ORA11G>
demo@ORA11G> select to_date('&fr','dd-mon-yyyy hh12:mi:ss am') as first_refresh,
  2     to_date('&nr','dd-mon-yyyy hh12:mi:ss am') as next_refresh
  3  from dual;
old   1: select to_date('&fr','dd-mon-yyyy hh12:mi:ss am') as first_refresh,
new   1: select to_date('25-nov-2016 01:23:10 pm','dd-mon-yyyy hh12:mi:ss am') as first_refresh,
old   2:        to_date('&nr','dd-mon-yyyy hh12:mi:ss am') as next_refresh
new   2:        to_date('25-nov-2016 01:23:10 pm','dd-mon-yyyy hh12:mi:ss am') as next_refresh

FIRST_REFRESH           NEXT_REFRESH
----------------------- -----------------------
25-nov-2016 01:23:10 pm 25-nov-2016 01:23:10 pm

1 row selected.

demo@ORA11G>
demo@ORA11G> merge into copy t1
  2  using ( select * from oltp
  3  where last_updated >= to_date('&nr','dd-mon-yyyy hh12:mi:ss am') ) t2
  4  on ( t1.user_id = t2.user_id )
  5  when matched then
  6     update SET t1.username = t2.username,
  7            t1.last_updated = t2.last_updated
  8  when not matched then
  9     insert (username,user_id,created) values( t2.username,t2.user_id,t2.created)
 10  /
old   3: where last_updated >= to_date('&nr','dd-mon-yyyy hh12:mi:ss am') ) t2
new   3: where last_updated >= to_date('25-nov-2016 01:23:10 pm','dd-mon-yyyy hh12:mi:ss am') ) t2

1 row merged.

demo@ORA11G>
demo@ORA11G>
demo@ORA11G> select 'OLTP',oltp.* from oltp minus select 'OLTP',copy.* from copy
  2  union all
  3  select 'COPY',copy.* from copy minus select 'COPY',oltp.* from oltp;

no rows selected

demo@ORA11G> select to_date('&fr','dd-mon-yyyy hh12:mi:ss am') as first_refresh,
  2     to_date('&nr','dd-mon-yyyy hh12:mi:ss am') as next_refresh
  3  from dual;
old   1: select to_date('&fr','dd-mon-yyyy hh12:mi:ss am') as first_refresh,
new   1: select to_date('25-nov-2016 01:23:10 pm','dd-mon-yyyy hh12:mi:ss am') as first_refresh,
old   2:        to_date('&nr','dd-mon-yyyy hh12:mi:ss am') as next_refresh
new   2:        to_date('25-nov-2016 01:23:10 pm','dd-mon-yyyy hh12:mi:ss am') as next_refresh

FIRST_REFRESH           NEXT_REFRESH
----------------------- -----------------------
25-nov-2016 01:23:10 pm 25-nov-2016 01:23:10 pm

1 row selected.

demo@ORA11G>


so now after the refresh, no mismatch between the two sources.

long story in short, rather than working with sysdate, check for any outstanding transactions from v$transaction - then proceed with min(start_time) - incase, no outstanding transaction go with sysdate, and that is what we have done with this below sql to capture the refresh time.
select sysdate, min(to_date(start_time,'mm/dd/rr hh24:mi:ss')) start_time ,
    nvl( min(to_date(start_time,'mm/dd/rr hh24:mi:ss'))
        , sysdate) as next_refresh
from v$transaction;

Connor McDonald
November 26, 2016 - 3:03 am UTC

nice demo