Skip to Main Content
  • Questions
  • Selecting rows that have changed via timestamp problematic with non-blocking reads

Breadcrumb

Dev Live Dev Intro

This month we are celebrating Developers at AskTOM. We welcome Developers of all levels of experience to join us at our FREE Developer Live events coming in August. Just click on the left to register today! If you are brand new to Database Technology, then we also have got you covered. Just click on the right for your comprehensive FREE training program to kick start your Oracle Database Development journey!

Question and Answer

Tom Kyte

Thanks for the question, Graham.

Asked: March 31, 2004 - 2:26 pm UTC

Answered by: Tom Kyte - Last updated: September 23, 2013 - 5:51 pm UTC

Category: Database - Version: 8.1.7 and up

Viewed 10K+ times! This question is

You Asked

I have a problem related to non-blocking reads with Oracle. Our users want to periodically run queries (via our app) that get changes since the last time they ran the query. We add a last_modified timestamp column to each table, and use triggers to set that to the current system time whenever the row is inserted or updated (we separately track deletes, but lets ignore that for now). For a DBMS with blocking reads, we can just do:

SELECT ... WHERE last_modifed >= last_checked_by_this_user

However, doing the above with Oracle's non-blocking reads, we can miss changes, if the the last_modified_time is earlier than last_checked_time but it was not committed before last_checked_time. For example (where TXN1 is some other transaction and TXN2 and TXN3 are consecutive runs for our application):

Time Event
T0 row R1 has value X (and last_modified < last_checked_by_this_user)
T1 TXN1 begins
T2 TXN1 updates R1 from X to Y, setting last_modified to T2
T3 TXN2 begins (and notes T3 for next last_checked_by_this_user)
T4 TXN2 does SELECT ... WHERE last_modifed >= last_checked_by_this_user
-> Correctly finds no changes for R1
T5 TXN2 COMMITs
T6 TXN1 COMMITS
T7 TXN3 begins (and notes T7 for next last_checked_by_this_user)
T8 TXN3 does SELECT ... WHERE last_modifed >= T3
-> Incorrectly finds no changes for R1

I effectively need to set last_modified to the commit time (T6) for something like the above to work with Oracle's non-blocking reads. I have to do it with triggers, because we don't have control over the other transactions running against the database. There are also many users, so we can't simply flag rows as changed or not. I could set the next last_checked_by_this_user time to be the start time of the oldest open transaction (via V$TRANSACTION), but then I could get some of the same changes over again the next time. Note that I only want to fetch the most recent value if there has been a change; I don't need to see each change.

I think this must be a common problem for anyone who periodically wants to run reports, without getting the same data repeated in subsequent reports, but I haven't found or come up with a solution.

Perhaps I need to use SCNs instead of timestamps, but I'd prefer to use timestamps if possible (so users can understand them if need be).

Any suggestions?

In case you are wondering if it would work at all with a blocking database, here's some more clarification.

It does work in databases with blocking reads, because:
- the last_checked time is set at the beginning of the transaction.
- if another transaction updates a row after it was scanned, then it will have a newer (or equal) timestamp. Note that the query used "greater than or equal to".
- if another transaction updates a row before (or at the same time) it is scanned, then the read blocks until the update is committed, and the scan will pick up the change.

So no changes are missed with blocking reads.

You could get the same change twice only if you repeated the scan within the timestamp resolution. Theoretically that is more of a problem with Oracle's 1 second precision (other DBMSes we use have millisecond or microsecond timestamp precision), but please assume that our business rules don't allow subsequent queries to be that close together. (In fact, we can handle duplicates if they arise, but saying it can happen if a query is re-run within a second is much better than saying it can happen if the query is re-run within the time of the longest transaction, which could be arbitrarily long depending on what else our customer is doing with the database.)

Is there a way to query for changes since the last query with Oracle's non-blocking reads?

It seems like userenv('commitscn') does something like what I'm after with SCNs, but that's undocumented and unsupported, and I'd need a different unique transaction identifier to put on each row (which was linked to the commitscn), since the commitscn can only be in one row. Is there a built-in identifier that I could use (that wouldn't wrap for the lifetime of the database)?

and we said...

Well, I beg to differ with your analysis of the blocking read. You are assuming ALL reads will be done via an index on the affected column. If for any reason that index is not used - the scenario I describe below is not only 100% possible -- but 100% PROBABLE. All that would have to happen is for a row to be inserted on a block you've already processed. Committed read in blocking databases = wrong answers, this is one of the definite cases.

You've always been subjected to this UNLESS you are 100% sure you are ALWAYS using an index on this timestamp field.

with blocking reads you have the same opportunity to "miss".


time event
------- -----------------------
t0 you start reading data looking for everything past NOON (last
read time)
t1 you read along, you've processed block 1, block 2, block 3...
t2 another transaction comes along and inserts data. data goes
onto block 1.

well, you just missed that row didn't you. you won't re-read block one again.

That assumes you set your timestamp for the next query to be the time AT THE END of the query (again, you could be missing a second of modifications in there as well since the getting of the timestamp and the getting of the last row are not atomic operations)

If we assume your timestamp for the next query is to be the time AT THE BEGINNING of this query -- then you definitely re-read data twice.

Anyway -- the approach in Oracle would be as follows. the share mode lock will allow concurrent reads -- but will prohit modifications and it'll wait for outstanding modifications to complete. So, the reads will block writes instead of the other way around. We use an autonomous transaction so we can give up the lock almost immediately after getting it -- since we use multi-versioning and the result set associated with a cursor is pre-ordained at the point in time we opened the cursor (no, we don't copy the data anywhere, we use versioning to provide a read consistent, correct result set as of the point in time you opened the cursor as we fetch the data). The commit will not affect any in-flight transactions your session has.


This will allow you to use your timestamp (in 9i and above, use timestamp, it goes to the millisecond -- not available in 8i) safely.


ops$tkyte@ORA9IR2> create table t ( x int, last_updated date );

Table created.

ops$tkyte@ORA9IR2> insert into t values ( 1, sysdate );

1 row created.

ops$tkyte@ORA9IR2> commit;

Commit complete.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create or replace package my_pkg
2 as
3 type rc is ref cursor;
4 procedure get_result_set_as_of( p_time in date, p_result_set out rc );
5 end;
6 /

Package created.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create or replace package body my_pkg
2 as
3 procedure get_result_set_as_of( p_time in date, p_result_set out rc )
4 is
5 pragma autonomous_transaction;
6 begin
7 lock table t in share mode;
8 open p_result_set for
9 select * from t where last_updated >= p_time;
10 commit;
11 end;
12 end;
13 /

Package body created.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> variable x refcursor
ops$tkyte@ORA9IR2> exec my_pkg.get_result_set_as_of( sysdate, :x );

PL/SQL procedure successfully completed.

ops$tkyte@ORA9IR2> print x

X LAST_UPDA
---------- ---------
1 01-APR-04
-------------------------------------------------------------------------








and you rated our response

  (64 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

April 01, 2004 - 11:06 am UTC

Reviewer: Graham from Cambridge, ON Canada

I accept your point about missing data with a last_read timestamp from the *end* of a query. But I tried to be clear that we don't do that.

My example showed that our next last_checked timestamp is noted at the *beginning* of the query's transaction, not the end. I also repeated that later as a bullet. I think you'll agree that we don't miss any changes with blocking read databases, but you're right that we can get some duplication between successive queries. I forgot to mention that duplication can occur in the following case:

- Changes with a timestamp the same as the last_checked timestamp, that were commited before the last query ran. Those will have been fetched in the last running of the query as well as in the current one.

As I said, we can deal with duplicates but it would be better if the duplication window were as small as possible (i.e. 1 second with Oracle 8i and 1 microsecond with Oracle 9i and above). However the best solution I had previously (of using the start time of the oldest open transaction, via V$TRANSACTION) had a duplication window of the longest transaction.

I disagree that I assumed indexed reads. That's irrelevant. Note that the duplication I was talking about was between successive runs of the query by the same user. Phantom rows will of course be possible unless serializable isolation is used regardless of database type. (I was assuming Oracle's default of read committed isolation.)

Thanks for your suggestion of using a share mode lock on the table and autonomous_transaction. If I understand correctly, that gets the duplication window down to the timestamp precision, as it is by default with blocking-read databases. FYI, when a share mode lock is waiting "for outstanding modifications to complete" that effectively turns the read with that lock into a blocking read, so your suggestion is basically to use blocking reads.

Note that your suggestion of locking the whole table is more severe than the row or page level locking that a blocking-read database will do to accomplish the same thing. However putting it an autonomous transaction amelioriates that somewhat.

So it seems to be a tradeoff between minimizing the duplication window and performance issues from adding share mode table locking.

(Sorry to nitpick some more, but do you not consider the RBS a copy of the data?)

Thanks again for your help!

Tom Kyte

Followup  

April 01, 2004 - 11:21 am UTC

RBS is not a copy of the data - RBS provides a purpose.

In your other databases -- they put the RBS in the redo, we just keep it separate. Helps us perform better (we only write redo, we do not read redo during normal operation -- as redo is a serious point of serialization in databases -- anything we do to avoid having to ever read it is "a good thing (tm)". rollback is not a point of serialization.




Why the headache?

April 01, 2004 - 1:13 pm UTC

Reviewer: Gabriel from close to ... Cambridge, ON Canada

<quote>Our users want to periodically run queries (via our app) that get changes since the last time they
ran the query.</quote>

My reasoning …
This is an application with named users … you have to persist the last_checked_by_this_user date … when you do that, rather than saving just the DATE, save the result set of the query … next time round do the following:
1. run the query to get the latest result set
2. compare the new result set against the previous one
3. if the same then return nothing
if different then delete the old result set, store the new result set and return the new result set

In essence you have to return “<Latest ResultSet> Minus <Saved ResultSet>” and “Merge the <Latest ResultSet> over <Saved ResultSet>”.

Going after v$transaction, SCN, RBS, non-blocking reads, redo or whatever is like trying to split the atom … only thinking about them when building a business application should raise a big red flag … I would argue that, for the said requirement, there is no solution short of some sort of data versioning (but not going down into the guts of Oracle … go the other way and snap some copies).

Hope I didn’t oversimplify.


Are you taking shortcuts here?

April 02, 2004 - 8:17 am UTC

Reviewer: Mike from Cleveland, OH USA

I agree that this seems to be getting too deep into the internals. The business requirement should be more precisely stated.

This is more than a 'convenience' requirement to avoid forcing the users to see results they had seen previously. If it was just a 'convenience', then you would not be getting complaints about a few rows being displayed more than once.

Your requirement sounds like: log every row that is 'seen' by this user. Do not show the row again unless it has a change. This makes the viewing of the data a business transaction.

If this is the case, you should design to that requirement. It sounds like you are trying to take shortcuts here.

You could:
- Have a child table of the viewed data, keyed by the key of the viewed data plus the username
- When you view the data, exclude any that have a child for this user. Insert a child row for every row that is displayed.
- When your trigger updates the timestamp, delete all of its children.

You might also ask questions like:
What if the user requests the data, but never scrolls all the way to the end?

Performance matters

April 12, 2004 - 7:41 am UTC

Reviewer: Graham from Cambridge, ON Canada

Sorry if I confused others about the nature of our application. I just said I thought the problem would *also* be faced by report generation applications (not that our application generates reports). I thought it would be a common problem since an implementation that works with other DBMSes can miss data with Oracle's non-blocking reads.

There are two reasons we use timestamps instead of result sets: speed and storage! Performance is of paramount importance for our application. In many deployments our application is used simultaneously by thousands of users, so concurrent performance for the query is essential. We often need to address that at very low levels for the DBMSes that we support.

FYI, Tom was right that duplication with our scheme is possible with other DBMSes, but with those the window for duplication is only 1 millisecond or 1 microsecond (since those are the timestamp precisions with the other databases). For our application that level of duplication is fine, but missing data is not. More duplication is okay for correctness, but undesirable for performance.

To see how much effect shared table locking has on the performance of our application, I have benchmarked it using our benchmark framework. I tested with 1000 users running the query, with a maximum of 5 active at time (the rest queued up). 6 scenarios were timed, with all 1000 clients doing the same thing:

1. select 0 rows
2. select 1000 rows
3. select 0 rows followed by update of 1000 rows
4. select 0 rows followed by delete of 1000 rows
5. select 0 rows followed by insert of 1000 rows

Moving the SELECT into a stored procedure made no significant difference in the timings.

Making that SP an autonomous transaction and adding a shared table lock, had the following effect on the total time for all 1000 clients:

1. less than 1% difference
2. 10% longer
3. 10% longer
4. 27% longer
5. 58% longer

So the shared table locking, even in a short transaction, has significant negative effect on concurrent performance, particularly when inserts or deletes are happening at the same time.

Our alternative of saving the oldest start time of the open transactions does not have concurrency issues, just the larger window for duplication.

I think an SCN-based solution would be best. I'm thinking of using a trigger to mark each row with a transaction id, and maintaining a mapping between transaction id and commit SCN. Two questions:

1. If I have an SCN number, can I determine the corresponding system TIMESTAMP (for 9.0.1 and up) or DATE (for 8.1.7)? Or the reverse (SCN corresponding to a timestamp)? If it makes a difference, I have in mind the SCN from the post-commit value of userenv('commitscn').

2. To uniquely identify transactions, without chance of wrapping for the lifetime of the database, should I use xidusn,xidslot,xidsqn from V$TRANSACTION (for 8.1.7) or dbms_transaction.local_transaction_id (for 9.0.1 and up)?

Recall that the overall problem is selecting the latest values for any rows that have changed since the last time the user checked.

Tom Kyte

Followup  

April 12, 2004 - 8:21 am UTC

as I said -- this is a most *UNUSUAL* request, in fact, one I've never encountered as a request ever.

I've never had some one ask for "give me the data added since the last time I was there". That would mean you are doing a ton of work in the client. If you have messages you want to process, we use AQ (queueing technology) to delive messages to 1 or more consumers.

And -- have you measured the effects of blocking reads in the other databases -- all we've done is introduce blocking reads here.

the blocking reads, even in short transaction, has significant negative effect on concurrent performance, particularly when inserts or deletes
are happening at the same time!


1) in 8i, no. in 9i -- within plus or minus 5 minutes (and only within the last 5 days). in 10g -- within plus or minus 3 seconds and only for the last 5 days.

You must be aware that userenv('commitscn') is totally undocumented and its behavior subject to change at our whim. The use of that feature could lead to unpredicable behavior in the future.

2) dbms_transaction is easier by far.




In Oracle, the correct implementation would use message queuing.

Never say never ;-)

April 12, 2004 - 3:08 pm UTC

Reviewer: Graham from Cambridge, ON Canada

>as I said -- this is a most *UNUSUAL* request, in fact, one I've never
>encountered as a request ever.
>
>I've never had some one ask for "give me the data added since the last time I
>was there".

I must have missed where you said that. Maybe it's not as common as I assumed for reports, but it is for anything that involves data synchronization. Haven't you ever used a web forum that had a "view new posts since I was last here" function? That's pretty universal in other web forums I've used (though not this one). Another example is source code control, where you typically want to get checked-in changes since the last time you updated. Windows Briefcase works the same way - when you sync you get the latest version of any file that has changed. In everyday life I'm often concerned with "what's new since I last checked?"

Doesn't seem that unreasonable to want do that type of query on an Oracle database. It works well (including good concurrent performance) with other DBMSes. I'm surprised you suggested switching to a messaging architecture to retrieve changes from a database!

(Of course it is often further filtered than just "what's new" - just think of that as additional WHERE clauses for the SELECT which I've left out for simplicity.)

>And -- have you measured the effects of blocking reads in the other databases --
>all we've done is introduce blocking reads here.

Without the shared table lock performance for Oracle 9.2.0.1.0 is about the same as with the best performing DBMSes that use blocking reads (using the same read committed isolation). Adding the shared table lock with Oracle lowers the performance to be significantly below those. The other DBMSes use row or page-level locks which apparently offer better concurrency than the table level locking that you suggested for Oracle.

FYI, I tried LOCK TABLE t IN ROW SHARE MODE, because I wasn't sure from the documentation description if that would block reads in the same way. Turns out that doesn't block reads, so it doesn't solve the problem in the same way as LOCK TABLE t IN SHARE MODE.

Note that blocking reads wouldn't be needed if one had access to a timestamp, or some other sequential marker, corresponding to the time of commit.

>1) in 8i, no. in 9i -- within plus or minus 5 minutes (and only within the last
>5 days). in 10g -- within plus or minus 3 seconds and only for the last 5 days.

How (for 9i and 10g)?

>You must be aware that userenv('commitscn') is totally undocumented and its
>behavior subject to change at our whim. The use of that feature could lead to
>unpredicable behavior in the future.

Yes, I mentioned that drawback in my original post. You have said that Oracle uses commitscn for Oracle replication. That is a similar problem, though our clients are not Oracle databases, and we don't care about each transaction, just the current value when we run our query.

>2) dbms_transaction is easier by far.

So wrapping isn't a concern with either method?

Tom Kyte

Followup  

April 12, 2004 - 7:07 pm UTC

sorry -- i thought i had put that in there at some point (the unusual nature), guess I didn't.

....
I'm surprised you suggested switching to a messaging architecture to
retrieve changes from a database!
.......

why? very common approach in my experience. Rather than poll for whats new, subscribe to new data.


The viewing new posts since I was last here is universally "almost always wrong" in my experience. I always see something I saw or am missing something I didn't see (cause they use lots of databases that do dirty reads or have NO concurrency controls whatsoever). I don't have it here cause I don't want to make people log in, people get so darn antsy when I set a cookie (some people freak out as funny as that is -- they don't get it). And it is simple enough for them to look at the posts (with timestamps) backwards.

Also -- given that a transaction should be much better than sub-second (especially on the type of forum you are talking about) and we don't block much (if ever) -- so the transactions will all be way sub-second, the 'missing of a message' doesn't come into play (just back the timestamp off by 1 second if you are truly paranoid).

Guess instead of reads blocking writes -- you could just serialize the modifications, that would solve the issue pretty much as well.


Source code control works on "check in check out" -- very easy to accomidate in Oracle -- source code control works on "one person owns a piece of code at a time, only one modifier" (needs strict concurrency controls).

I won't even comment on windows stuff. No concurrency controls whatsoever going on there (good luck with that)



As for 9i and 10g -- in 10g there are functions to convert an SCN to a timestamp and a timestamp to an SCN. in 9i, you would query the table SMON_SCN_TIME to go either way.



Arghh

April 13, 2004 - 12:27 am UTC

Reviewer: Graham from Cambridge, ON Canada

I guess that all this runaround means you agree that it is indeed problematic to query "What's new since I last checked?" via SQL with Oracle's non-blocking reads. Your only SQL-based suggestion was to use blocking reads, but (not surprisingly) Oracle doesn't do that as well as the DBMSes that rely on blocking reads instead of snapshot-based isolation. (BTW, writes-blocking-reads is the requirement, not reads-blocking-writes.)

As you point out for seeing what's new in your forum, it's only natural to use the timestamps for posts. Fair enough that you don't want to require logins or cookies. However that's no reason to dismiss in general the validity of wanting to be able to query "what's new since I last checked" via timestamps, since that is exactly what you are expecting your readers to do manually.

(FYI our application is not a web forum or a source control system. Those were just examples of common applications where one typically wants to get changes since the last time one checked.)

The reason I was surprised about your messaging suggestion, is that you usually seem to advocate doing things in the database if possible, using SQL or PL/SQL.

I'm similarly surprised that you suggest serialization, because (like me) you usually seem obsessed with good concurrent performance! (To be fair, you were suggesting it for a web forum like yours.)

It's off topic, but I guess you've never used CVS or Perforce source control systems. Both allow concurrent "checkouts", and both are very widely, and successfully, used.

Back on topic - when does dbms_transaction.local_transaction_id wrap?

The same question was asked (about recycling transaction_ids) before, but I didn't see that part answered:
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:8913646338893#8918465772722 <code>



Tom Kyte

Followup  

April 13, 2004 - 7:46 am UTC

And as I said -- to query what is new since I last checked -- all that takes is a TIMESTAMP. All of the transactions on this system are way subsecond. You would miss *nothing*

Messaging IS A DATABASE SOLUTION. 100%! AQ is in the database itself. It is an RDBMS feature (well, in Oracle is sure is, since a message queue requires a stable store. all messaging implementations have at their core a database -- IBM MQSERIES -> db2 is behind it for example. We just choose to call a horse a horse and expose it as a database feature, not as a separate piece of software)

yes, xid's can roll -- it takes a really utterly long time as they are a function of three bits -- the rollback segment id (USN) a transaction slot and a sqn number.

As we round round around your rbs's, we'll reuse slots and each slot has a 32bit (believe it is 32 bits) signed integer as its sequence which means it'll roll at 32k.

Now, on my system -- i have about 10 rbs's (varies with auto undo). each has 48 slots. So, I'd need to have 10*48*32k transactions to see a possible roll around.



Always compromises

April 13, 2004 - 10:41 am UTC

Reviewer: Graham from Cambridge, ON Canada

Sorry if I haven't been clear. I want to solve the problem of "what's new since I last checked" for *any* Oracle database (*not* this forum). A requirement is that it be done via a SQL SELECT statement that creates a result set with the latest values for the new or changed rows.

So far I've got two timestamp-based solutions for Oracle that won't miss data:

1. Set the next "last checked" time to be the oldest start time of the open transactions.
- Scales well.
- Max window for duplication is longest transaction. Could be excessive duplication, affecting performance, if there are long transactions.

2. Use shared table lock to enable blocking reads.
- Duplication window minimized (max is timestamp precision).
- Scales poorly.

An SCN-based solution might be possible.
- Have to use own transaction id since can't rely on having unique dbms_transaction.local_transaction_id for lifetime of database.
- Only way to determine commit order of transactions is via undocumented and unsupported commitscn feature.

So our existing solution (#1) seems best, unless minimized duplication is more important than concurrent throughput (in which case #2 might be better).

Thanks Tom for suggesting #2. Too bad there isn't a better performing option.

Is there a limit on transaction duration? (UNDO_RETENTION seems to just be the time limit for a query from the start of the transaction. I'm not sure if inserts, updates and deletes have time limits.)

Tom Kyte

Followup  

April 13, 2004 - 10:56 am UTC

there is no limit on transaction duration.

Transaction limits?

April 13, 2004 - 12:02 pm UTC

Reviewer: Graham from Cambridge, ON Canada

>there is no limit on transaction duration.

Maybe not a time limit per se, but I think the maximum undo tablespace size imposes some limit.

If one has a finite undo tablespace, and a long running transaction such that there is an unable to extend error, which transaction gets the error? The long running one or a subsequent one?

I'm guessing the long running one, otherwise I think all subsequent transactions would fail too.

Tom Kyte

Followup  

April 13, 2004 - 6:16 pm UTC

no, not really. insert a row and walk away.

you didn't say the transaction has to be LARGE, just long running.

the transaction that fails to extend will get an error and have to decide whether to rollback or retry.

(this is assuming you do not have resumable statements enabled which changes everything....)

Correction to transaction id wrap?

April 13, 2004 - 9:25 pm UTC

Reviewer: Graham from Cambridge, ON Canada

You said:
-----------
yes, xid's can roll -- it takes a really utterly long time as they are a function of three bits -- the rollback segment id (USN) a transaction slot and a sqn number.

As we round round around your rbs's, we'll reuse slots and each slot has a 32bit (believe it is 32 bits) signed integer as its sequence which means it'll roll at 32k.

Now, on my system -- i have about 10 rbs's (varies with auto undo). each has 48 slots. So, I'd need to have 10*48*32k transactions to see a possible roll around.
-----------

Sorry I didn't check the math earlier. 32k is 2^15. 2^31 is 2G.

10*48*32k is around 15 million, which isn't an outrageous number of transactions for a busy database.

However if it's actually 10*48*2^31 (about 10^12) then I agree that's much less likely to wrap in the lifetime of a database. (10 years if there were ~3,300 transactions per second, if my math is right.)

Is the slot sequence rollover at 32k or 2^31?

Tom Kyte

Followup  

April 14, 2004 - 7:37 am UTC

doh, you are right not 32k, 2^31, 2billion.


I just had a thought about "show me whats changed". Do you need to support UPDATED information as well as just newly added?

On the edge of my seat....

April 14, 2004 - 9:53 am UTC

Reviewer: Graham from Cambridge, ON Canada

>I just had a thought about "show me whats changed". Do you need to support >UPDATED information as well as just newly added?

Yes! I did say that in the first sentence of the original post, but that was a while ago.

Got a new idea? I'm all ears!

Tom Kyte

Followup  

April 14, 2004 - 10:56 am UTC

I'm being the devils advocate here.

then what happens in the "other" databases when


you start reading "all new stuff"

while you are in "the middle", someone updates the first row (takes away 50$ from savings) and puts it into the last row (adds 50$ to checking).

you'll return a set of new rows that is totally inconsistent -- as you'll return the last row, but not the first.

On the second get (next time back in) you'll return the first row for the first time and last row again. What the heck would a client do with that? Totally inconsistent results.

See, I'm still trying to fathom the general purpose usage of this in the real world -- what does your application really do here.





Oops

April 14, 2004 - 9:56 am UTC

Reviewer: Graham from Cambridge, ON Canada

Meant to say "paragraph" instead of "sentence" in above review.

Sigh.

April 14, 2004 - 3:14 pm UTC

Reviewer: Graham from Cambridge, ON Canada

That's an example of the standard repeatable read problem. You can get non-repeatable reads with Oracle's default isolation too (though admittedly not for a single query like you can with a blocking-read database using read committed isolation).

You're right about the inconsistency after the first query, but wrong about the final consistency. Remember, we are selecting *values* for the rows, not replicating transactions or deltas. In terms of your example, after the first query the client will have the old value for the 1st row and the new value for the 2nd row. After the second query, the client will have the new value for both rows. So after the second query the client will have the same values as the database for both rows. That's perfectly consistent.

Yes the new value of the 2nd row will have been fetched twice, but as I've said, we can handle that kind of duplication. The window for duplication is the length of the query's transaction (or the timestamp precision, whichever is longer).

A customer can choose repeatable read or serializable isolation if they don't want to have that temporary inconsistency. (In Oracle one would have to use a read-only transaction or serializable isolation to get repeatable reads and avoid phantoms if one was querying multiple tables or the same table more than once.)

Note that the standard database issues of non-repeatable reads and phantoms are not peculiar to our application (or which DBMS one is using)!

I'm glad you're thinking about this, but I hope you can switch back to thinking about the best way to select "what's new since I last checked" with Oracle! ;-)

FYI, the real world problem is data synchronization (but the clients aren't Oracle databases so Oracle replication isn't an option).

Tom Kyte

Followup  

April 14, 2004 - 3:47 pm UTC

I'd like to know how "values" can be useful -- really -- I'm very curious.

what is your (not hypothetical) application of this, what do you actually USE it for?

if you tell me that -- perhaps I can answer the question.

data syncronization screams to me "streams" (which is yet another messaging architecture, pub/sub) or CDC -- change data capture, both of which are features of the database.

Would ORA_ROWSCN help here

April 14, 2004 - 6:17 pm UTC

Reviewer: Justin Cave from Boise, ID

It seems to me that the new 10g ORA_ROWSCN feature would help here. From the 10g SQL Reference </code> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10759/pseudocolumns006.htm#sthref706 <code>

"For each row, ORA_ROWSCN returns the conservative upper bound system change number (SCN) of the most recent change to the row. This pseudocolumn is useful for determining approximately when a row was last updated. It is not absolutely precise, because Oracle tracks SCNs by transaction committed for the block in which the row resides. You can obtain a more fine-grained approximation of the SCN by creating your tables with row-level dependency tracking."

Tom Kyte

Followup  

April 15, 2004 - 8:12 am UTC

only if you build/rebuild all tables with "row dependencies" -- rebuild, not alter -- on.

It is a possibility in 10g.

Here is a demo I have, we'll copy EMP:


scott@ORA10G> create table temp as select * from emp;

Table created.

scott@ORA10G> select ename, ora_rowscn from temp where empno = 7788;

ENAME ORA_ROWSCN
---------- ----------
SCOTT 3476986

by default the ora_rowscn is tracked loosely, at the block, so, we update all of the rows on that block EXCEPT the one of interest:

scott@ORA10G>
scott@ORA10G> begin
2 for x in ( select empno from temp where empno <> 7788 )
3 loop
4 update temp set ename=ename where empno=x.empno;
5 commit;
6 end loop;
7 end;
8 /

PL/SQL procedure successfully completed.

scott@ORA10G>
scott@ORA10G>
scott@ORA10G>
scott@ORA10G> select ename, ora_rowscn from temp where empno = 7788;

ENAME ORA_ROWSCN
---------- ----------
SCOTT 3477030

and its row scn floated up with everyone elses. Rebuild the table with row dependencies (used by replication as an optimization, to limit the changes that need to be propagated)

scott@ORA10G>
scott@ORA10G>
scott@ORA10G>
scott@ORA10G> drop table temp;

Table dropped.

scott@ORA10G> create table temp rowdependencies as select * from emp ;

Table created.

scott@ORA10G>
scott@ORA10G>
scott@ORA10G>
scott@ORA10G> select ename, ora_rowscn from temp where empno = 7788;

ENAME ORA_ROWSCN
---------- ----------
SCOTT 3477044

scott@ORA10G>
scott@ORA10G> begin
2 for x in ( select empno from temp where empno <> 7788 )
3 loop
4 update temp set ename=ename where empno=x.empno;
5 commit;
6 dbms_lock.sleep(1);
7 end loop;
8 end;
9 /

PL/SQL procedure successfully completed.

scott@ORA10G>
scott@ORA10G>
scott@ORA10G>
scott@ORA10G> select ename, ora_rowscn from temp where empno = 7788;

ENAME ORA_ROWSCN
---------- ----------
SCOTT 3477044

and we can see the scn does not (in fact, will not) change for that row unless that row is modified. the reason for the sleep is to show the +/- 3 second rule:


scott@ORA10G>
scott@ORA10G>
scott@ORA10G>
scott@ORA10G> select empno, ora_rowscn, scn_to_timestamp(ora_rowscn)
2 from temp
3 order by empno
4 /

EMPNO ORA_ROWSCN SCN_TO_TIMESTAMP(ORA_ROWSCN)
---------- ---------- ---------------------------------------------------------------------------
7369 3477052 15-APR-04 08.09.37.000000000 AM
7499 3477055 15-APR-04 08.09.37.000000000 AM
7521 3477058 15-APR-04 08.09.37.000000000 AM
7566 3477061 15-APR-04 08.09.37.000000000 AM
7654 3477064 15-APR-04 08.09.40.000000000 AM
7698 3477067 15-APR-04 08.09.40.000000000 AM
7782 3477070 15-APR-04 08.09.40.000000000 AM
7788 3477044 15-APR-04 08.09.34.000000000 AM
7839 3477073 15-APR-04 08.09.43.000000000 AM
7844 3477076 15-APR-04 08.09.43.000000000 AM
7876 3477079 15-APR-04 08.09.43.000000000 AM
7900 3477082 15-APR-04 08.09.46.000000000 AM
7902 3477085 15-APR-04 08.09.46.000000000 AM
7934 3477088 15-APR-04 08.09.46.000000000 AM

14 rows selected.


Thanks Justin!

April 15, 2004 - 10:44 am UTC

Reviewer: Graham from Cambridge, ON Canada

Thanks Justin - I hadn't looked at 10g new features yet but that's exactly the kind of thing we could use, especially with rowdependencies enabled for the table. It's useful for our data synchronization for exactly the same reason that it is for Oracle replication - to get only the rows that have changed. Kudos to Oracle for adding that to 10g! Clearly my need wasn't that unusual, since Oracle built in a solution in 10g.

BTW Tom, I don't think any database is useful unless it has "values" in it! :-) I meant the values for the columns in the rows. Sorry if that wasn't clear.

Thanks Tom for pointing out the 3 sec resolution and 5 day limit for SCN to timestamp conversion. I don't see either of those mentioned in 10g documentation for SCN_TO_TIMESTAMP or TIMESTAMP_TO_SCN. Can you please point me to where those limits are documented?

I hope better resolution and range will be in a future release.

For 8i and 9i I think one could get the equivalent to ora_rowscn with row dependency as follows (using the undocumented commitscn):
- add a column to each table to hold the transaction id.
- add a table (eg. called tx_scn) to hold transaction ids, the start SCN, and userenv('commitscn').
- add an update and insert trigger for each table that sets the transaction id column and, only once per transaction, inserts a row in the tx_scn table.

Then you could select using the the commit transaction id in the where clause by joining each table with the tx_scn table.

Does making a table row-dependent have much of a performance hit? (I didn't see that mentioned in the CREATE TABLE docs, and am not yet set up with 10g to try it out.)

Tom Kyte

Followup  

April 15, 2004 - 11:09 am UTC

the change wasn't added for what you speak of (that we call STREAMS and CDC)

it was added for lost update detection in a stateless web environment.


My point about values is -- the data you "synchronize" to is wholly inconsistent -- if you deal with VALUES and not transactionally inconsistent results. So I sync up and get "garbage". Sync again "get good stuff". Sync again get garbage.


You cannot look at "values" you have to look at transactionally consistent SETS.


there probably will never be a better resolution in the future (else you would need an exhaustive list of scn -> timestamp for every transaction. imagine the size of that table eh?)

I will still encourage you to look at streams (a message based pub/sub approach). Over and over I will. You still give no real world use case. I'm interested, I'm curious. What is the specific problem you are trying to solve here.


<quote> sql reference:

SCN_TO_TIMESTAMP takes as an argument a number that evaluates to a system change number (SCN), and returns the approximate timestamp associated with that SCN. The returned value is of TIMESTAMP datatype. This function is useful any time you want to know the timestamp associated with an SCN. For example, it can be used in conjunction with the ORA_ROWSCN pseudocolumn to associate a timestamp with the most recent change to a row.





they punt by using approximate timestamp.

if you try one that is too old:


sys@ORA10G> select scn_to_timestamp( 14004 ) from dual;
select scn_to_timestamp( 14004 ) from dual
*
ERROR at line 1:
ORA-08181: specified number is not a valid system change number
ORA-06512: at "SYS.SCN_TO_TIMESTAMP", line 1
ORA-06512: at line 1





Oops again!

April 15, 2004 - 10:58 am UTC

Reviewer: Graham from Cambridge, ON Canada

(Should review more closely before clicking Submit!)

The second last sentence in my last review should have been:

Then you could select using the the commit SCN in the where clause by joining each table with the tx_scn table.

I suspect the performance hit of making a table row-dependent in 10g is less than the hit of adding a trigger like the one I described.

Oracle streams not an option for us

April 15, 2004 - 1:46 pm UTC

Reviewer: Graham from Cambridge, ON Canada

You said:
Rebuild the table with row dependencies (used by replication as an optimization, to limit the changes that need to be propagated)

And from that I said:
It's useful for our data synchronization for exactly the same reason that it is for Oracle replication - to get only the rows that have changed.

I was just taking your word that it was used, and is useful, for Oracle replication!

I believe that with Oracle we can be guaranteed a transaction-consistent set of values iff we select in a read-only transaction (or use table locks or serializable isolation). But with multiple tables and default Oracle isolation there's no guarantee of consistency. It seems that you think that the chance of inconsistent values is *never* acceptable. If so you must also think that Oracle should only offer read-only or serializable transactions! I'm sure you don't, so I think you are being inconsistent. ;-)

Your complaint about inconsistent values from a select, as I keep trying to tell you, is a standard database problem for which isolation levels were invented. Sure that's an issue, but it can only be completely solved by using serializable isolation. Snapshot isolation has advantages, but it introduces the version skew problem too. You probably know that MySql has snapshot isolation and Microsoft has added it for Yukon (but both of those have row versioning instead of Oracle's block versioning). You can (and probably will) argue against other DBMS implementations for handling isolation, but that's irrelevant for how our application works with Oracle.

Since you asked, our application is a server for synchronizing data between a central database and multiple remote (non-Oracle) databases. It works with 5 flavours of central database, including Oracle and the other leading DBMSes. It predates the stream and change data capture features of Oracle - the first release worked with Oracle 8, which was the then-current version.

Oracle streams might be an option if we were starting from scratch and only concerned with Oracle. But we're not.

AFAIK with streams each subscriber will get all changes (for what they have subscribed to). Ours isn't a server push model, for scalability it is client-pull (by default). If there are 1000 updates to a row between a client's syncs, the second sync only gets one new value for the row. I think with Oracle streams or messaging the client would get 1000 values.

FYI, for flexibity the statements which select data from the central database are under the customer's control. Most choose to use timestamp-based synchronization for most of their tables to just get data that has changed since the last synchronization. I'm looking for a beter way to do that for an Oracle central database (without rearchitecting our application or abandonning other types of central database). I think it's in both of our interests to have our application get optimal performance with an Oracle database as the central database. After all, our customers that use Oracle central databases are your customers too.

Note that our current solution for timestamp-based synchronization with Oracle works quite well, and is used successfully by our mutual customers. The only drawback is possibly reduced performance from the greater window for duplication.

Has anyone tested the performance hit in 10g of enabling row dependencies for a table? (I'm sorry I can't yet.)

Tom Kyte

Followup  

April 15, 2004 - 2:26 pm UTC

not to nit pick but I was refering to:

"learly my need wasn't that unusual, since Oracle built in a solution in 10g."

row dependencies was added in 9iR1 -- the new feature in 10g is ora_rowscn -- the new feature added in 10g wasn't in support what you are after, rather it is for lost update detection. polling a table using this technique by many users (or just one user) is going to be as expensive as it gets. If your product was let loose on my database -- looking for "whats different", I'd get upset. I'd rather tell IT whats different - back to streams, back to CDC.


I'm saying the results of a table refresh (as you appear to be doing) should be consistent at the very very very least. And yes, when doing multi-table refreshes (in our venecular they are called snapshot or mv refresh groups) they would be done transactionally consistent -- we do that.


Oracle stored in UNDO a change vector. This is what we version with. It is not a before image of a block, it is a small set of changed bytes.

There are lots of replication products on the market today that accomplish what you are trying to code. Have you considered the use of an off the shelf solution (that generally use logs and such as streams does so as to not kill the database by asking "hey whats new" over and over and over)










Alternative solution

April 15, 2004 - 9:26 pm UTC

Reviewer: Gary from Sydney, Australia

Since the root cause of the issue appears to be the interval between a timestamp recorded on a record on insert/update and it becoming 'visible' after a commit, wouldn't a solution be a routine to update the timestamp post-commit.

User_a updates table_1 at 10:15 (trigger sets flag).
User_b updates table_1 at 10:16 (trigger sets flag).
User_b commits change at 10:17.
User_a commits change at 10:19.

Background routine runs every minute (or however frequently required) and sets the timestamp for flagged records and unflags the record.
So at 10:18 it will set the timestamp for user_b's changes and at 10:20 it sets the timestamp for user_a's changes.

You don't get any missing entries or duplicates.

[Obviously the trigger setting the flag should not fire when updating the timestamp, only the 'real' fields of the table.]

Interesting idea!

April 16, 2004 - 1:09 pm UTC

Reviewer: Graham from Cambridge, ON Canada

Thanks Gary. I appreciate you getting back to the fundamental problem.

Interesting idea, but I think it just shifts the problem. You can still get missing rows if the background process transaction overlaps with the start of the query.

However a tweak fixes that: change the query to pick up flagged rows as well as the timestamp criterion. That introduces duplication, but the duplication window is controllable; it's the time between runs of the background process. I think that solution would only be preferred if you were prepared to run the background process with a shorter inter-run interval than the longest transaction time. Otherwise our current solution has a smaller duplication window.

Now back to Tom's concerns...

>row dependencies was added in 9iR1 -- the new feature in 10g is ora_rowscn -- the new feature added in 10g wasn't in support what you are after, rather it is for lost update detection.

Okay I'll rephrase:

Clearly my need wasn't that unusual, since Oracle added a mechanism to track when rows changed in 9i and a change in 10g exposed it for user queries.

Apparently Oracle replication sends a whole block when just one row has changed unless the table was created with row dependencies. Yuck.

>polling a table using this technique by many users (or just one user) is going to be as expensive as it gets.  If your product was let loose on my database -- looking for "whats different", I'd get upset.  I'd rather tell IT whats different - back to streams, back to CDC.

It doesn't constantly poll. It only queries in response to a sync request from the remote (which can optionally be initiated by the sync server). Rather than generating extra work when the database is most busy (as replication, streams and CDC would) you can choose when these sync requests happen. And if the database is changing the same rows over and over, you only move the latest values, not each intermediate value (unlike replication, streams and CDC).

It's not intended for tight coupling of a few databases. Replication is better for that. It's intended for lots of remote databases that synchronize less often (eg. every few minutes, hourly, daily). It supports syncing different subsets, so if you need to sync a few tables frequently but others less frequently, you can do that and put less load on your database when it is busy. You also only select what is appropriate for that remote. If data is common for many remotes, you can use a single query for all of them. It's really very flexible and scalable. A recent deployment involves over 20,000 remote users synchronizing with a single central Oracle database. Apparently having that many queries run doesn't "kill the database." (FYI, that customer first tried using an Oracle solution, but switched to ours after extensive comparison benchmarking. As you say in your book, that's the only way to determine which is better for one's particular needs.)

>And yes, when doing multi-table refreshes (in our venecular they are called snapshot or mv refresh groups) they would be done transactionally consistent -- we do that.

You DON'T do that by default (for multiple selects)!

Connected to:
Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> create table tbl1
  2  ( pk int primary key,
  3  str varchar(20)
  4  );

Table created.

SQL> create table tbl2
  2  ( pk int primary key,
  3  str varchar(20)
  4  );

Table created.

SQL> insert into tbl1 ( pk, str ) values ( 1, 'consistent' );

1 row created.

SQL> insert into tbl2 ( pk, str ) values ( 1, 'consistent' );

1 row created.

SQL> commit;

Commit complete.

SQL> select * from tbl2;

        PK STR
---------- --------------------
         1 consistent

SQL> select * from tbl1;

        PK STR
---------- --------------------
         1 consistent

SQL> declare
  2  pragma autonomous_transaction;
  3  begin
  4  update tbl2 set str = 'inconsistent' where pk = 1;
  5  commit;
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> select * from tbl2;

        PK STR
---------- --------------------
         1 inconsistent



So read-consistency for a whole transaction is NOT guaranteed by default for multiple selects! You have to lock tables or use read only or serializable transactions. The consistency issue you keep bringing up is INHERENT TO THE TYPE OF CENTRAL DATABASE AND THE USERS CHOICE OF ISOLATION; IT'S INDEPENDENT OF OUR APPLICATION! Please stop flogging that dead horse when trying to criticize our application.

(Yes I did get access to a 10g setup since I last wrote.)

>Oracle stored in UNDO a change vector.  This is what we version with.  It is not a before image of a block, it is a small set of changed bytes.

Okay, but unless I'm mistaken the set of bytes corresponds to the changes for a block rather than a row.

Have fun in Toronto! Maybe we'll discuss this stuff in person there. 

Tom Kyte

Followup  

April 16, 2004 - 3:14 pm UTC

no, row dependencies is not for that at all!  It is for parallel propagation, so we can determine that these N transactions can all be propagated *in parallel*.

Nothing but nothing but nothing is done "block level wise" in replication -- NOTHING.

rowdependencies is for parallel propagation -- to allow us to propagate two unrelated transactions that took place in the source database in a different order (perhaps) then they actually took place -- in parallel -- to increase the replication speed.


So, not "yuck" -- but "good", "better".

Replication propagates only the changes (column by column, row by row) that took place.

Streams is non-intrusive, it works on *redo* to publish data *elsewhere*.  Polling the database in response to a sync is *expensive*.

Streams = little to no impact on source system
Polling for a sync = lots of work on source system.


Streams can do the last change stuff as well, streams is non-intrusive -- you might want to give the streams guide a quick glance -- it is fairly short.


<quote>
It's not intended for tight coupling of a few databases. Replication is better 
for that. It's intended for lots of remote databases that synchronize less often 
(eg. every few minutes, hourly, daily).
</quote>

and so unless you use SERIALIZABLE or REPEATABLE READ, there is a high probability that the "synced" data set in read lock databases is inconsistent (that update issue).

Also, that paragraph very succicently describes the streams architecture as well.


We do snapshot groups when told to.  Yes it is a choice.


In oracle -- see what happens to concurrency with serializable.
Then, describe what happens to concurrency with serializable in others.

Tell me what would happen to that update in a read lock database vs oracle.


I'm not trying to criticize your application, I'm saying "read consistency to me is best, read locks are not".


-------------------------- added after driving home...... ------------------

something I just thought of....


Your logic to sync must be something like:


T0 - log into source database;
T1 - 
T2 - get NEW_DATE/TIME to be used for next sync;
T3 - get all rows in table(s) where timestamp >= DATE/TIME from last sync;
T4 - logout


Ok, you are worried about missing rows in oracle such that they were in flight transactions (modifications taking place during T1 basically)

They would have a timestamp T1, but you would have a timestamp T2 for the next retrieval and hence would "miss them"


If you change T2 to:

select nvl( min(to_date(start_time,'mm/dd/rr hh24:mi:ss')),sysdate) 
  from v$transaction;

what you'll have is the date time of either

a) right now if no one is doing a transaction (no chance of a missed record) or
b) the timestamp of the last time no one WAS doing an in flight transaction


sure -- it opens the window for dups (but you have that already -- the dup window is not a nano-second or even a second, it it the time it takes for T3 to complete which could be measurable) but it prevents you from missing anything and leaves your code intact.




ops$tkyte@ORA9IR2> select sysdate,
  2         nvl( min(to_date(start_time,'mm/dd/rr hh24:mi:ss')),sysdate),
  3             count(*)
  4    from V$transaction;
 
SYSDATE              NVL(MIN(TO_DATE(STAR   COUNT(*)
-------------------- -------------------- ----------
16-apr-2004 18:12:00 16-apr-2004 18:12:00          0

<b>No transactions -> sysdate is the timestamp</b>
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> update t set owner = owner where rownum = 1;
 
1 row updated.
 
ops$tkyte@ORA9IR2> exec dbms_lock.sleep(5);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select sysdate,
  2         nvl( min(to_date(start_time,'mm/dd/rr hh24:mi:ss')),sysdate),
  3             count(*)
  4    from V$transaction;
 
SYSDATE              NVL(MIN(TO_DATE(STAR   COUNT(*)
-------------------- -------------------- ----------
16-apr-2004 18:12:05 16-apr-2004 18:11:57          1

<b>active transactions -> their start time is the timestamp</b>


 

So throw out the database and just use a serial log file?

April 16, 2004 - 7:14 pm UTC

Reviewer: Graham from Cambridge, ON Canada

My turn to play Devil's advocate.

<da mode>
So streams are what you would choose to get 20,000 *different* (but overlapping) sets of data from an Oracle database?

If that's the best way, and as low-impact as you assert, why keep all those database blocks around. All you'd need is the log!
</da mode>

Databases, including Oracle's, are very good at locating and retrieving data that meets specified criteria. Why not use the database for what is intended to do? When properly indexed, an empty selection is extremely fast. If selecting the same stuff over and over, caches work wonders.

I can't believe I have to argue the merits of using a database with you!

Querying a database and scanning a log are two different ways to get data. One isn't always better than the other.

AFAIK, Oracle doesn't publish its log format, and the streams interface to the log file didn't exist when our application was designed. (I was wrong before - it first worked with Oracle 7, not 8.)

If you want to keep camparing isolation implementations, don't ignore write skew and related consistency problems peculiar to snapshot isolation. I hope your talk on Tuesday covers those.

Added after reading your addendum:
That's exactly the solution we already use (and have talked about here several times)! As I've said, it scales very well (we only select the min start_time from V$TRANSACTION once per sync) and the window for duplication is the longest transaction.

That's still the best solution we have so far with timestamps, though Gary's idea (with tweak) might be better if very long transactions were common.

I think an SCN based solution could be better. Maybe you can look back at those ideas if you are bored while travelling. (The tx_scn table could be trimmed of "orphan" rows occassionally if its size became an issue. That way it could never be bigger than the total number of rows for the synchronized tables, and would likely be much, much less.)

BTW, like most servers we use connection pooling instead of making a new connection to the database each sync request.

Tom Kyte

Followup  

April 16, 2004 - 7:34 pm UTC

well, then i guess you always had the optimal answer from the get go.

Great thread...

April 17, 2004 - 5:34 am UTC

Reviewer: Jon from Sydney

Very interesting, fun almost. Like Shakespeare and Redo, things always go full circle ;-)

Tom Kyte

Followup  

April 17, 2004 - 9:58 am UTC

yes, this got so long, and ran so long -- that i keep forgetting what was already visited.

What about this...

April 17, 2004 - 6:48 am UTC

Reviewer: Jon from Sydney

Graham,

Have you tried this approach.

Earlier, Gabriel posted that "In essence you have to return “<Latest ResultSet> Minus <Saved ResultSet>”".

You expressed concerns about performance and storage. Well the DB stores the Latest and Previous in the RBS (whether via change vectors or not is not important), which we can access via flashback queries.

Why not record the current SCN at each sync request, instead of a timestamp (dbms_flashback.get_system_change_number will do that for you), and then use a flashback query to access you previous table state. I mocked up an example with 100,000 rows, and it worked just fine:

select id, to_char(change_ts,'HH24:MI:SS.FF') change_ts, change_scn
from x
minus
select id, to_char(change_ts,'HH24:MI:SS.FF') change_ts, change_scn
from x as of scn 661860;

ID CHANGE_TS CHANGE_SCN
---------- ------------------ ----------
100001 20:18:47.383000 661834

If you're worried about performance, then possibly you could try recording the timestamp (systimestamp) as well as the SCN, and include:

where change_ts >= (timestamp - time of longest running txn - safety factor)

in each of the queries to reduce the result set to a reasonable size before the exact minus on SCN.

Haven't tried that bit, just thinking out loud - but seems it would work.

An after thought...

April 17, 2004 - 7:06 am UTC

Reviewer: Jon from Sydney

So, to ensure no overlaps or missed data, your sync queries could look like:

create or replace procedure get_x_delta(
p_old_scn in number,
p_new_scn out number,
p_rc out sys_refcursor)
is
begin
p_new_scn := dbms_flashback.get_system_change_number;
open p_rc for
select * from x as of scn p_new_scn
minus
select * from x as of scn p_old_scn+1;
end;

plus a few tweaks for performance.

This way, you allow the read-consistency model to work for you.

Tom Kyte

Followup  

April 17, 2004 - 10:07 am UTC

I will say that Graham has (had) the right answer from the get go.

His code already has to handle dups (no avoid that).

The time of the oldest transaction is going to be near "now".

They never had "just a millisecond of overlap dup window" -- it's always been much larger than that (due to the fact that you have to get timestamp AND THEN sync -- everything you sync, EVERYTHING, could be duplicated on the next sync)

So, using the nvl(min(start_time),sysdate) from v$transaction is the most efficient method for the simple reason that I have to assume most of his queries are using existing indexes to identify subsets (eg: he is not having 20,000 people FULL SCAN the database on recurring basis, they are pulling a small slice of the data -- massive horizontal slicing and dicing going on there)


That would, to me, rule out anything more sophisticated -- the min() is more than good enough -- more than good enough (nothing is going to let you see data you cannot see, you are not going to get the in-flight data this time around no matter what, and if they used a read only transaction, heck -- they would get very very very few dups indeed -- if any.


The flashback query would be something I would not want to suggest/rely on. For one -- it is expensive to flash an entire table back like that (and anything expensive done 20,000 times is truly expensive) and then minus it. Also, in 9i undo_retention is a REQUEST, not a demand (so you might not be able to flashback in all cases -- in 10g you can make it a demand but even so, the further in the past you have to flashback to -- the more expensive it becomes, so the people that sync every day would be truly "expensive"




So, I still think v$transaction and start_time is more than sufficient as transactions are typically pretty short. If you wanted to truly maximize the start_time (get as big of one as possible), they could join v$transaction and v$lock to see what the start_time for an object they are *interested* in is. Eg: if you are syncing t1 and t2 but no one is using t1/t2 -- sysdate is your time. V$LOCK would give you that.





Hope this has helped others as well!

April 17, 2004 - 2:24 pm UTC

Reviewer: Graham from Cambridge, ON Canada

Thanks Tom and Jon! Interesting idea about joining with V$LOCK. Some customers might want to do that if they have long transactions on tables that are not synchronized.

Tom is right that polling the database does put a load on it. In cases where that extra load is a problem, it might well be best for our mutual customers to use streams or CDC to populate a staging database, and have our sync server connect to that staging database instead. My guess is that streams would be better (since CDC's change capture triggers will be firing most when the database is most busy).

He's also right that good performance usually requires that the sync'd tables be indexed properly for the "what's changed?" queries. We're no different than any other database application in that regard - our app works better when the queries it issues run quickly.

In a couple of my earlier posts I misstated the "window" for duplication (though I think I got it right last time). Here's what I think it is for read-blocking databases and the best-so-far Oracle solution:

- For a read-blocking DBMS, it is the greater of the transaction time for the "what's changed" queries or the timestamp resolution.

- For Oracle, using nvl(min(start_time),sysdate) from v$transaction in a read-only transaction, the window is that_value_for_sysdate - min(start_time), and the largest possible value for that is longest transaction time. So the window is the greater of the longest transaction time or the timestamp resolution.

- Without using a read-only transaction in Oracle, as Tom pointed out, the window is larger (assuming multiple tables). It is the longest transaction time plus the transaction time for the "what's changed" queries, or the timestamp resolution if that is greater.

FYI, the only way that everything would be duplicated is if it were all inserted/updated in the duplication window. Our application could handle that, but it's very unlikely to happen.

Using SCNs instead of timestamps could probably eliminate any duplication. I'm loathe to recommend that our customers use an undocumented feature like userenv('commitscn'), so I'll stick with recommending our current timestamp solution (with read-only transactions if possible) for 8i and 9i. However in 10g the combination of ora_rowscn and row dependency might remove the need for timestamp maintaining triggers, but I'd have to investigate how "approximate" the SCN from ora_rowscn is and confirm that the ora_rowscn psuedocolumn can be indexed. (No access to 10g right now.) Not needing triggers would be simpler and should reduce load on the central database.

Unfortunately recent family commitmittents prevent me from going to the IOUG just down the highway in Toronto. FYI, there will be a lot of people glued to TV sets in Toronto on Sunday night watching hockey, and you'll see a lot of celebrating if the Toronto Maple Leafs win the series then or in a 7th game midweek!

Tom Kyte

Followup  

April 17, 2004 - 2:46 pm UTC

...
FYI, the only way that everything would be duplicated is if it were all
inserted/updated in the duplication window. Our application could handle that,
but it's very unlikely to happen.
.....

:) concurr (i was wondering if you were going to point about the "low probability of occurence of that case).....


The SCN with rowdependencies is accurate -- it is the resolution of the timestamp that is approximate.

I concurr 100% with your decision on userenv('commitscn').

keep in mind the difference between snapshot and event

April 17, 2004 - 6:55 pm UTC

Reviewer: Jaromir D.B. Nemec from Austria

Hi Tom,

I agree with all your argumentation in the follow up’s above, except for one point – Graham is definitively not alone.
Let me take it into the right context. From the point of view of Oracle streams and CDC are “state of the art”. But – at least in the projects I have an overview – the typical database is now slowly migrating to 9i and the mentality of the application (i.e. the crucial features in use) is somewhere between 7.3 and 8.0.
So don’t wonder that such “any change please?” solutions exists.
But I 100% agree with you, that this approach is definitively not optimal. So people using it
a) must be happy with it, or
b) having problems they must accept, that the cause is not the database or anything else but even this approach.

I’ll try to demonstrate this pointing out one aspect that I missed in the discussion above.

There are two type of data stored in database: state data (e.g. customer table) and transaction data (e.g. booking entries).
The inherit problem of the “what’s new?” approach is the access of the state data to get the transaction information. Of course you may try to select the customer table balance information and check if something was changed since last check. But the customer table is definitive a bad target. To trace the customer account movement the right target is the account table (i.e. a transaction table with the booking entries).
You may more or less optimise the snapshot table (index on change date column etc.) to derive the transaction information (yes, it is a true derivation) but it is always a workaround.
The proper approach to get the transaction data is to USE the transaction data. The decision how the transaction data is implemented - if you use a database table (and the application is responsible to fill it) or some DB feature as triggers, streams or CDC – is only an implementation decision.

The task of getting an event out from a snapshot table will be always a hard work. In other words, you can push or pull events as you like, but it is not comparable with polling for events out of a snapshot.

A short comment to the topic of “1000 updates and one sync only”: There are two extreme situations. You have a very small snapshot with tons of changes. It could be acceptable to sync via refreshing the total snapshot to limit the volume of data transfer.
If the snapshot is very big with lots of changes, you can filter or aggregate the events to limit the communication.


At the end of my “dispute” a question on CDC.
As I know CDC is based on redo log not on triggers as mentioned above. The same concept, I assume, is used in streams. What are the criteria to choose CDC or streams to capture the events?


Thx

Jaromir D.B. Nemec
</code> http://www.db-nemec.com <code>


Tom Kyte

Followup  

April 17, 2004 - 6:59 pm UTC


CDC in 9i is trigger based.

ASYNC CDC in 10g is streams based. SYNC CDC in 10g is trigger based.

(i would always go for streams based, background, non-intrusive, async)

April 18, 2004 - 1:19 am UTC

Reviewer: Jon from Sydney

I take your point about flashback being a suggestion - but considering the Oracle documentation suggest all sorts of application uses for flashback queries (reports, replacing manual history tables in some situations, etc...), one would hope that it is a very strong suggestion.  I practice, I have never come across a situation where a request hasn't be satisfied within the UNDO retention period.

Nevertheless, you missed by point about performance - I was not suggesting full scans at all, rather augmenting this with a timestamp, (which would be approximate, and include a safety factor) to limit the rows being flash'd back.  I tested this out, using a large table.  From the "syncing" session, I get:

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.5.0 - Production

SQL> set timing on
SQL> select count(*) from big_table;

  COUNT(*)
----------
   2759801

Elapsed: 00:00:01.04
SQL> select *
  2  from big_table as of scn 4303140821778
  3  where change_ts >= to_date('18/04/2004 14:59:07','DD/MM/YYYY HH24:MI:SS')-1/60/24
  4  minus
  5  select *
  6  from big_table as of scn 4303140821621
  7  where change_ts >= to_date('18/04/2004 14:59:07','DD/MM/YYYY HH24:MI:SS')-1/60/24;

 OBJECT_ID OBJECT_NAME                    CHANGE_TS
---------- ------------------------------ ---------
        -1 Test                           18/APR/04

Elapsed: 00:00:00.01

That seem efficient to me, elegant, and, unless my logic is off somewhere, 100% no dups.  If I'm wrong, please show me, but I can't see how that wouldn't work well.  And with no duplicates, no work is required at the client side. 

Tom Kyte

Followup  

April 18, 2004 - 9:29 am UTC

refreshes are minutes, hours or ===>> day(s) <<===

My major qualms are:

o the undo is not gauranteed and you cannot make it so. I might be there in 5 minutes, 5 hours or 5days from now. Then again -- it might not. (just takes one whopping big transaction to make it go "poof". With no way to get it back again. This very item renders this solution not reasonable in 9i.

o flashing back gets more expensive every minute. Flashing back for point in time reports is great (basically what READ ONLY did/does -- but you get to pick the point in time at which to run the report as of). Every transaction however makes flashing further and further back in time more expensive.

I didn't not say it would not work.





Interesting

April 18, 2004 - 9:50 am UTC

Reviewer: Jon from Sydney

Interesting, in my experience with using flash back, going back 7 days doesn't have any noticeable performance impact when using keyed/index reads, even on a busy system. A 1000 row delta isn't huge.

As for undo, doesn't correctly sizing (or autoextending) the undo tablespace mitigate this risk?

...
I didn't not say it would not work.
...

Triple negatives. Hmmm, I'm still trying to decode that (its late at night in Sydney - not too sharp at the moment)

But what you did say was:

...
His code already has to handle dups (no avoid that).
...

Well I think this is a no dups solution. Less efficient in the DB, but in totality (depending on client work de-duping) *maybe* better.

Tom Kyte

Followup  

April 18, 2004 - 10:27 am UTC

but you cannot flash back 7days. 5 is the limit (unless you shutdown for 2 of the days -- it is 5 days of uptime, or you manually updated a dictionary table)

and to have 7 days of undo *online*? I'm not aware of any systems personally that have that.

A 1,000 row delta on a single row will incurr typically 1,000 lios instead of one (unless you are lucky and the version of the block you need is in the buffer cache, but not likely). Now, do that for 100 changed rows, 20,000 times.


the triple negative was wrong "i did not say it would not work" :)

Heavy on concepts ... but intriguing nonetheless!

April 19, 2004 - 9:38 am UTC

Reviewer: Gabriel

Graham,

One question … you said few times “we can handle duplicates (at the client I assume) if they arise”. Out of curiosity … how are they handled?

Cheers.


Handling duplicates

April 19, 2004 - 4:10 pm UTC

Reviewer: Graham from Cambridge, ON Canada

Duplicates are indeed handled by the clients, which are databases. We use "upserts" to apply the values (i.e. insert if PK doesn't exist, otherwise update). So a duplicate becomes a redundant update.

FYI, the remote databases we support have built-in upsert support (as does Oracle 9i via MERGE) and they either don't have triggers or won't fire a row-level update trigger if the row is unchanged by the update (unlike Oracle AFAIK, but please let's not launch into a big discussion of that here).

Thanks! From concept to simple example on the trailing edge...

April 21, 2004 - 11:13 pm UTC

Reviewer: Duke Ganote from beautiful Warsaw, Indiana USA (Orthopaedic Capitol of the World!)

Tom-- Graham had some excellent questions, and this provoked me to construct a trailing-edge (8.1.7.4-based), simple example. I make no claims about scalability; this is more for conceptual understanding -- suppose there's a table T with some data and we want to track what has changed on T since our last query. Of course, T's LAST_MODIFIED datetime doesn't show the "effective" or transaction commitment timestamp for a row change, rather it's the "clock time" when the user made the row change during the transaction. For brevity, I first show the script setting up the example, then example sessions alternating between 2 users: DGANOTE who makes changes, and EOM who reads them.

In retrospect, I suppose I could add an "approximate SCN timestamp" column to TRANS_INFO and use a background job to populate that column in order to map SCN to datetime.

***** SCN.SQL ****
CREATE TABLE t ( some_data varchar2(10), last_modified date default sysdate,
trans_id varchar2(30) ) tablespace dw1_data_small
/
CREATE TABLE trans_info
(trans_id VARCHAR2(25) NOT NULL,
trans_scn NUMBER,
trans_user_id VARCHAR2(30),
trans_inittn_tms DATE
,
CONSTRAINT PK_TRANS_INFO
PRIMARY KEY (TRANS_ID)
USING INDEX)
/
CREATE OR REPLACE PACKAGE pkge_trans_info
IS
PROCEDURE sp_set_trans_info (curr_trans_id VARCHAR2);
END pkge_trans_info;
/
CREATE OR REPLACE PACKAGE BODY pkge_trans_info
IS
lcl_trans_id VARCHAR2 (25); -- indicates SCN already called once for transaction
lcl_user_id VARCHAR2 (30); -- caches USER in PGA once per session
PROCEDURE sp_set_trans_info (curr_trans_id VARCHAR2) AS
BEGIN
-- only one call for USERENV ('commitscn') per transaction
IF lcl_trans_id != curr_trans_id
OR lcl_trans_id IS NULL
THEN
INSERT INTO trans_info
VALUES (
curr_trans_id, USERENV ('commitscn')
, lcl_user_id, SYSDATE
);
lcl_trans_id := curr_trans_id;
END IF;
END sp_set_trans_info;

BEGIN
-- who is user?
lcl_user_id := USER;
END pkge_trans_info;
/
CREATE OR REPLACE TRIGGER tuidar_t
BEFORE INSERT OR UPDATE OR DELETE ON t
FOR EACH ROW
BEGIN
:new.trans_id := DBMS_TRANSACTION.local_transaction_id;
pkge_trans_info.sp_set_trans_info (:new.trans_id);
END tuidar_t;
/
GRANT SELECT ON T TO EOM
/
GRANT SELECT ON TRANS_INFO TO EOM
/

***** THE SESSIONS ****
DW1D\dganote> @scn
DW1D\dganote> insert into t ( some_data ) values ( 'Hello' );

1 row created.

DW1D\dganote> insert into t ( some_data ) values ( 'World' );

1 row created.

DW1D\dganote> column trans_scn format 99999999999999999999;
DW1D\dganote> ed
Wrote file afiedt.buf

1 SELECT t.*
2 , TO_CHAR (t.last_modified, 'HH:MI:SS') tms
3 , ti.trans_scn
4 FROM dganote.t t
5 , dganote.trans_info ti
6* WHERE t.trans_id = ti.trans_id
DW1D\dganote> /

SOME_DATA LAST_MODI TRANS_ID TMS TRANS_SCN
---------- --------- ------------------------------ -------- ---------------------
Hello 21-APR-04 6.92.276048 04:19:38 7728900791301 -- "provisional" SCN!
World 21-APR-04 6.92.276048 04:19:44 7728900791301

DW1D\dganote> commit;

Commit complete.

DW1D\dganote> ed
Wrote file afiedt.buf

1 SELECT t.*
2 , TO_CHAR (t.last_modified, 'HH:MI:SS') tms
3 , ti.trans_scn
4 FROM dganote.t t
5 , dganote.trans_info ti
6* WHERE t.trans_id = ti.trans_id
DW1D\dganote> /

SOME_DATA LAST_MODI TRANS_ID TMS TRANS_SCN
---------- --------- ------------------------------ -------- ---------------------
Hello 21-APR-04 6.92.276048 04:19:38 7728900791303 -- true SCN! -----------
World 21-APR-04 6.92.276048 04:19:44 7728900791303

DW1D\dganote> @connect eom@dw1d;
DW1D\eom> column trans_scn format 99999999999999999999;
DW1D\eom> ed
Wrote file afiedt.buf

1 SELECT t.*
2 , TO_CHAR (t.last_modified, 'HH:MI:SS') tms
3 , ti.trans_scn
4 FROM dganote.t t
5 , dganote.trans_info ti
6* WHERE t.trans_id = ti.trans_id
DW1D\eom> /

SOME_DATA LAST_MODI TRANS_ID TMS TRANS_SCN
---------- --------- ------------------------------ -------- ---------------------
Hello 21-APR-04 6.92.276048 04:19:38 7728900791303 -- others see only true/committed SCN
World 21-APR-04 6.92.276048 04:19:44 7728900791303

DW1D\eom> @connect dganote@dw1d;
DW1D\dganote> insert into t ( some_data ) values ( 'Goodbye' );

1 row created.

DW1D\dganote> commit;

Commit complete.

DW1D\dganote> insert into t ( some_data ) values ( 'Y''all' );

1 row created.

DW1D\dganote> commit;

Commit complete.

DW1D\dganote> @connect eom@dw1d;
DW1D\eom> ed
Wrote file afiedt.buf

1 SELECT t.*
2 , TO_CHAR (t.last_modified, 'HH:MI:SS') tms
3 , ti.trans_scn
4 FROM dganote.t t
5 , dganote.trans_info ti
6* WHERE t.trans_id = ti.trans_id and ti.trans_scn > 7728900791303 -- what changed since the last commit?
DW1D\eom> /

SOME_DATA LAST_MODI TRANS_ID TMS TRANS_SCN
---------- --------- ------------------------------ -------- ---------------------
Goodbye 21-APR-04 9.64.268092 04:24:41 7728900791767
Y'all 21-APR-04 10.42.262796 04:24:53 7728900791770


Tom Kyte

Followup  

April 22, 2004 - 7:28 am UTC

we discussed mapping the transaction id to the single UNDOCUMENTED, TOTALLY "magic happening here" commitscn and tagging each record with the transaction id.

rejected for the good concern that commitscn is not documented.... that and this is "hard".


using the time from v$transaction is so simple and the probabily of a dup is so near zilch (and their code handles it already) -- seems to me to be the only thing to even consider.

from trailing edge to mainstream...

April 22, 2004 - 11:26 pm UTC

Reviewer: Duke Ganote from beautiful Warsaw, Indiana USA (Orthopaedic Capitol of the World)

Tom-- Ah, but after upgrading from 8i to 9i, the demo still seems to work by swapping the undocumented USERENV('commitscn') function for the documented DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER.

Tom Kyte

Followup  

April 23, 2004 - 11:19 am UTC

no, it won't.

that gets the SCN as of *right now*.  Not the ending transaction SCN (commit scn) which is what you need.

ops$tkyte@ORA9IR2> create table t ( x number );
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t values ( userenv('commitscn') );
 
1 row created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select x, dbms_flashback.get_system_change_number from t;
 
         X GET_SYSTEM_CHANGE_NUMBER
---------- ------------------------
  24758134                 24758134
 
ops$tkyte@ORA9IR2> exec dbms_lock.sleep(10);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> commit;
 
Commit complete.
 
ops$tkyte@ORA9IR2> select x, dbms_flashback.get_system_change_number from t;
 
         X GET_SYSTEM_CHANGE_NUMBER
---------- ------------------------
  24758136                 24758138



userenv('commitscn') is pure magic.  it is a value that is magically updated with YOUR transactions commitscn upon commit. 

Pre-commitment and post magic

April 23, 2004 - 12:29 pm UTC

Reviewer: Duke Ganote from beautiful Warsaw, Indiana USA (Orthopaedic Capitol of the World!)

Tom-- Thank you! So just as T's LAST_MODIFIED datetime shows the "clock time" when the user made the row change during the transaction, DBMS_FLASHBACK.get_system_change_number shows the current SCN at the time the DML is performed.

Hmm. One mechanism that roughly approximates USERENV('commitscn') would be to leave TRANS_SCN null during the insert, and have an background job running like this:
DBMS_JOB.SUBMIT(job#,'UPDATE trans_info SET trans_scn = test_seq.nextval WHERE trans_scn IS NULL;',sysdate,'sysdate+1/24/60/60')
which would "see" and periodically update only committed transactions. It only approximates USERENV('commitscn') because multiple transactions could be tagged with the same approximate SCN.

PS, that is a nice spellcheck for the reviews!

dbms_flashback.get_system_change_number vs. commitscn

April 23, 2004 - 12:34 pm UTC

Reviewer: Graham from Cambridge, ON Canada

Thanks Duke, you implemented basically what I was thinking for an SCN based scheme using userenv('commitscn'). Doesn't solve the issue of precisely mapping timestamps to SCNs, but could be used instead of timestamps if one was prepared to use the undocumented feature.

As Tom points out, Dbms_flashback.get_system_change_number() does not have the same magic behaviour as userenv('commitscn') - it's just the SCN that is current at the time that you call it. However 10g's ora_rowscn (with rowdependencies enabled) does have the commit-time SCN for that row. (Without rowdependencies I think it has the commit-time SCN for most-recently-committed row in that block.)

You can see the above points with the following script:

create table t (
tx_id varchar(25) primary key,
commit_scn number,
flashback_scn number
) rowdependencies;
insert into t ( tx_id, commit_scn, flashback_scn )
values ( dbms_transaction.local_transaction_id,
userenv('commitscn'),
dbms_flashback.get_system_change_number() );
select t.*, ora_rowscn from t;
declare
pragma autonomous_transaction;
begin
insert into t ( tx_id, commit_scn, flashback_scn )
values ( dbms_transaction.local_transaction_id,
userenv('commitscn'),
dbms_flashback.get_system_change_number() );
commit;
end;
/
commit;
select t.*, ora_rowscn from t;

This gives the following output (using 10g's web version of iSQL*Plus):

Table created.

1 row created.

TX_ID COMMIT_SCN FLASHBACK_SCN ORA_ROWSCN
2.36.1100 731651 731651

PL/SQL procedure successfully completed.

Commit complete.

TX_ID COMMIT_SCN FLASHBACK_SCN ORA_ROWSCN
2.36.1100 731653 731651 731654
1.10.1105 731651 731651 731652

(Note that the magic final value for userenv('commitscn') is one less than the "true" commit scn, as others have noticed and posted elsewhere.)

Personally, I doubt that Oracle will remove or change userenv('commitscn') as long as their replication relies on it (or if any important customers rely on it). However the 10g ora_rowscn feature is much, much better if one wants an SCN-based solution instead of a timestamp-based solution.

But for most of our customers I think the oldest-transactions solution will be best.

Oracle documents commitment to SCN

April 24, 2004 - 11:50 pm UTC

Reviewer: Duke Ganote from Warsaw, Indiana USA (a.k.a "Lake City" and "Orthopaedic Capitol of the World")

Graham-- Thanks for the 10g-based example (also thanks to Justin Cave and Tom). Now I can see concisely how Oracle exposed this functionality as a documented feature.

In trying to think of sub-10g, documented-feature workarounds, I toyed with changing my background job (if using 9i) to

DBMS_JOB.SUBMIT(job#,'UPDATE trans_info SET trans_scn = dbms_flashback.get_system_change_number WHERE
trans_scn IS NULL;',sysdate,'sysdate+1/24/60/60')

in order to better approximate the SCN. I'm also exploring use of DBMS_LOGMNR in the background job, since all that's needed is the marriage between SCN and transaction_id, which V$LOGMNR_CONTENTS has. However, I'm sure neither of the latency that LOGMNR would require, nor how much DBA involvement. Chasing that rabbit a bit further, I was surprised to see that SQL_REDO and SQL_UNDO don't use bind variables (p. 1118 of Tom's Expert 1-on-1). And while I'm on that tangent, I confess I find the term 'bind variables' befuddling -- by contrast, the Java documentation on PreparedStatement refers to 'parametric SQL', a term that I find easier to grasp conceptually: simply put, I write SQL statements that take parameters, much like I would write procedures that take parameters. Voila!.

And I also get to explore Streams per Tom's suggestion. Ah well, the adventure continues, to learn more of the capabilities that this DBMS has to offer! :))

Really?

April 25, 2004 - 9:19 am UTC

Reviewer: Jon from Sydney

Been a busy week; haven't a chance to reply to your last comment.  I don't want to flog a dead horse, but I think there is life in this one yet.

You said:
<Quote>
but you cannot flash back 7days.  5 is the limit (unless you shutdown for 2 of 
the days -- it is 5 days of uptime, or you manually updated a dictionary table)
</Quote>

Well I think you're mistaken.  Of course you can flashback more than 5 days - you just can't flashback using a timestamp more than 5 days.  Using SCN, which is what I'm doing, you can flashback as long as you want to.  All that's needed is to keep a record of the SCN at the time of the event that interests you.  For Graham's problem, it is at the moment of each sync request.

Using this approach removes the whole problem of timestamp => SCN conversion.  There is no need for userenv('commitscn') or similar.  You change the query from "give me everything from time a to time b" - which is a continuous, approximate measure by definition, to "give me everything from SCN A+1 to SCN B".  This uses the precise, quanta like properties of SCN - the DBs heatbeat if you like.

<Quote>
and to have 7 days of undo *online*? I'm not aware of any systems personally that have that. 
</Quote>

Well now you do! :-)  We frequently go back more than seven days.  As you have said many times yourself, disk is cheap.  What's 10G of undo?

<Quote>
A 1,000 row delta on a single row will incurr typically 1,000 lios instead of 
one (unless you are lucky and the version of the block you need is in the buffer 
cache, but not likely).  Now, do that for 100 changed rows, 20,000 times.
</Quote>

This is where things get interesting.  You maintain that to flashback 1000 changes requires 1000 LIOs.  Well I don't think Undo works that way.  My understanding is that Oracle maintains a transaction table, that enables a SCN to block translation.  So, during normal operation, when an old block is required for read consistency, it looks up the transaction table to find out where this block is within the rollback segments.  So 1 LIO become 3 LIOs (initial block read, transaction table read, RBS read) - regardless of how many changes have been made.

If this was not the case, then a long running query over a busy table would take longer to materialise blocks near the end of the query than at the beginning.  This would limit the effectiveness of multi-versioning altogether.

From the Concepts manual:
...
Among other information, a rollback entry includes block information (the file number and block ID corresponding to the data that was changed) and the data as it existed before an operation in a transaction.

For each rollback segment, Oracle maintains a transaction table--a list of all transactions that use the associated rollback segment and the rollback entries for each change performed by these transactions. Oracle uses the rollback entries in a rollback segment to perform a transaction rollback and to create read-consistent results for queries.
...

So I've done a test to prove this:

SQL> set linesize 200
SQL> 
SQL> set timing on
SQL> 
SQL> set serveroutput on
SQL> 
SQL> alter session set nls_date_format='DD-MON-RR HH24:MI:SS';

Session altered.

Elapsed: 00:00:00.00
SQL> 
SQL> column pk format 9999999999
SQL> 
SQL> column change_date format a20
SQL> 
SQL> select count(*) from big_table;

  COUNT(*)
----------
  30765000

Elapsed: 00:00:38.01

(Was suppose to be 3 million, but I got a bit carried away...)


These are the rows 101 rows we'll update:
SQL> 
SQL> select * from big_table
  2  where pk between 6179200 and 6179300
  3  and rownum < 10;

         PK SOME_DATA                      CHANGE_DATE          CHANGE_VERSION CHANGE_SCN
----------- ------------------------------ -------------------- -------------- ----------
    6179200 DIRECT_DEBIT_TYPE_PK           24-APR-04 15:04:03                1     700815
    6179201 DIRECT_DEBIT_TYPE_UK1          24-APR-04 15:04:03                1     700815
    6179202 EFEED                          24-APR-04 15:04:03                1     700815
    6179203 EFEED                          24-APR-04 15:04:03                1     700815
    6179204 EFEED_E01_IMPORT               24-APR-04 15:04:03                1     700815
    6179205 EFEED_EMPIRE_MAP               24-APR-04 15:04:03                1     700815
    6179206 EFEED_EMPIRE_MAP_PK            24-APR-04 15:04:03                1     700815
    6179207 EFEED_FIELD                    24-APR-04 15:04:03                1     700815
    6179208 EFEED_FIELD_PK                 24-APR-04 15:04:03                1     700815

9 rows selected.

Elapsed: 00:00:00.01

How much redo do we have at the moment:
SQL> 
SQL> select
  2     (select round(sum(u.undoblks)*8192/1024/1024,2)
  3      from v$undostat u) used_mb,
  4     (select round(sum(e.bytes)/1024/1024,2)
  5      from dba_extents e
  6      where e.tablespace_name = 'UNDOTBS1') alloc_mb
  7  from dual;

   USED_MB   ALLOC_MB
---------- ----------
    230.35     233.23

Elapsed: 00:00:00.08

Now, lets generate heaps of undo.  We'll update these 100 rows 10,000 times, and record the SCN and sysdate at the beginning, middle and end.

SQL> begin
  2      commit;
  3      dbms_output.put_line('T0: sysdate='||to_char(sysdate,'HH24:MI:SS')||', SCN='||dbms_flashback.get_system_change_number);
  4      for i in 1..5000 loop
  5          dbms_application_info.set_client_info('i='||i);
  6          update big_table
  7          set some_data = substr(some_data,2)||substr(some_data,1,1),
  8              change_date = sysdate,
  9              change_version = change_version+1,
 10              change_scn = dbms_flashback.get_system_change_number
 11          where pk between 6179200 and 6179300;
 12          commit;
 13      end loop;
 14      dbms_lock.sleep(60);
 15      dbms_output.put_line('T1: sysdate='||to_char(sysdate,'HH24:MI:SS')||', SCN='||dbms_flashback.get_system_change_number);
 16      for i in 1..5000 loop
 17          dbms_application_info.set_client_info('i='||i);
 18          update big_table
 19          set some_data = substr(some_data,2)||substr(some_data,1,1),
 20              change_date = sysdate,
 21              change_version = change_version+1,
 22              change_scn = dbms_flashback.get_system_change_number
 23          where pk between 6179200 and 6179300;
 24          commit;
 25      end loop;
 26      dbms_output.put_line('T2: sysdate='||to_char(sysdate,'HH24:MI:SS')||', SCN='||dbms_flashback.get_system_change_number);
 27  end;
 28  /

T0: sysdate=22:37:31, SCN=805637
T1: sysdate=22:39:04, SCN=811729
T2: sysdate=22:39:43, SCN=817778

PL/SQL procedure successfully completed.

Elapsed: 00:02:05.02

Lets see how much Undo we have now:

SQL> select 
  2     (select round(sum(u.undoblks)*8192/1024/1024,2)
  3      from v$undostat u) used_mb,
  4     (select round(sum(e.bytes)/1024/1024,2) 
  5      from dba_extents e
  6      where e.tablespace_name = 'UNDOTBS1') alloc_mb
  7  from dual;

   USED_MB   ALLOC_MB
---------- ----------
    387.77     420.23

Elapsed: 00:00:02.04

So we've added about 200M of Undo - and since this is my laptop I know it is all mine.

Now, lets run the sync query, getting all changes between T0 and T2.  I've included a safety factor of 2 minutes in the change_date predicate, for any transactions that may have been running during the T0 sync request.

SQL> select *
  2  from big_table as of scn 817778
  3  where change_date > to_date('25-APR-2004 22:35:31','DD-MON-YYYY HH24:MI:SS')
  4  minus
  5  select *
  6  from big_table as of scn 805637
  7  where change_date > to_date('25-APR-2004 22:35:31','DD-MON-YYYY HH24:MI:SS');

         PK SOME_DATA                      CHANGE_DATE          CHANGE_VERSION CHANGE_SCN
----------- ------------------------------ -------------------- -------------- ----------
    6179200 DIRECT_DEBIT_TYPE_PK           25-APR-04 22:39:43            10001     817777
    6179201 EBIT_TYPE_UK1DIRECT_D          25-APR-04 22:39:43            10001     817777
    6179202 EFEED                          25-APR-04 22:39:43            10001     817777
    6179203 EFEED                          25-APR-04 22:39:43            10001     817777
    6179204 EFEED_E01_IMPORT               25-APR-04 22:39:43            10001     817777
    6179205 EFEED_EMPIRE_MAP               25-APR-04 22:39:43            10001     817777
    6179206 _MAP_PKEFEED_EMPIRE            25-APR-04 22:39:43            10001     817777
    6179207 EED_FIELDEF                    25-APR-04 22:39:43            10001     817777
    6179208 ELD_PKEFEED_FI                 25-APR-04 22:39:43            10001     817777
    6179209 _FIELD_UK1EFEED                25-APR-04 22:39:43            10001     817777
    6179210 EED_RAW32EF                    25-APR-04 22:39:43            10001     817777
...

101 rows selected.

Elapsed: 00:00:00.08
SQL> 

And you get similar results if you query between T0 & T1 or T1 & T2 etc.

So that's 100*10,000 changes - I don't think Oracle is doing 1 million LIOs in 8/100ths of a second.

In fact if we compare the stats on the flashback queries:

SQL> set autotrace traceonly
SQL> select *
  2  from big_table as of scn 805637
  3  where change_date > to_date('25-APR-2004 22:35:31','DD-MON-YYYY HH24:MI:SS');

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=284 Card=61234 Bytes
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE' (Cost=284 Car
   2    1     INDEX (RANGE SCAN) OF 'BIG_TABLE_IX1' (NON-UNIQUE) (Cost

Statistics
----------------------------------------------------------
          3  recursive calls
          0  db block gets
         36  consistent gets
          0  physical reads
          0  redo size
       4159  bytes sent via SQL*Net to client
        565  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
        101  rows processed

with the "normal" queries:

SQL> select *
  2  from big_table
  3  where change_date > to_date('25-APR-2004 22:35:31','DD-MON-YYYY HH24:MI:SS');

101 rows selected.

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=284 Card=61234 Bytes
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE' (Cost=284 Car
   2    1     INDEX (RANGE SCAN) OF 'BIG_TABLE_IX1' (NON-UNIQUE) (Cost

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         18  consistent gets
          0  physical reads
          0  redo size
       4159  bytes sent via SQL*Net to client
        565  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        101  rows processed

There is only double the LIOs for the flashback query.  I would guess that because many of the updated rows would be on the same block, the queries would be getting several changes for the price of one.

However it is clear that 18 LIOs don't become 18*10,000 LIOs when using flashback as you were suggesting in you previous post.

So your concerns about performance don't seem correct.  Maybe my test is missing something, and if I'm wrong I'll happily capitualate - but you'll need to prove it.

But, if I'm right, well... I expect a large carton of beer to arrive at my door within 48 hrs :-)

 

Tom Kyte

Followup  

April 26, 2004 - 6:04 am UTC

the flashback by scn won't work over 5 days of uptime either!

sys@ORA9IR2> column S new_val S
sys@ORA9IR2>
sys@ORA9IR2> select min(SCN_BAS) S from SMON_SCN_TIME
2 /

S
----------
616905

sys@ORA9IR2>
sys@ORA9IR2> select * from sys.source$ as of scn &S where rownum = 1
2 /
old 1: select * from sys.source$ as of scn &S where rownum = 1
new 1: select * from sys.source$ as of scn 616905 where rownum = 1

OBJ# LINE
---------- ----------
SOURCE
------------------------------------------------------------------------
647 1
package STANDARD AUTHID CURRENT_USER is -- careful on this
line; SED edit occurs!


sys@ORA9IR2>
sys@ORA9IR2> select min(scn_bas)-1 S from smon_scn_time
2 /

S
----------
616904

sys@ORA9IR2>
sys@ORA9IR2> select * from sys.source$ as of scn &S where rownum = 1
2 /
old 1: select * from sys.source$ as of scn &S where rownum = 1
new 1: select * from sys.source$ as of scn 616904 where rownum = 1
select * from sys.source$ as of scn 616904 where rownum = 1
*
ERROR at line 1:
ORA-08181: specified number is not a valid system change number


we use that mapping table and if the scn isn't there, well, the scn isn't there and we cannot (will not) use it.

soooo, unless you manually modify that mapping table (definitely not even near the realm of recommended!! 10g changes the entire thing as well (that table is inifitely more complex and even less "updatable". if you update it wrong -- all bets are 100% off)


10g if undo -- that is a very light weight system, many systems generate that in hours. that is what I meant by "i'm not aware of any production systems with 7 days of undo online".

As for the 1,000's of LIO's, thats a bit off as well. Try this:

drop table t;

create table t ( x int, y int ) tablespace users;
insert into t values (1,1);
commit;

variable a refcursor
variable b refcursor
variable c refcursor

alter session set events '10046 trace name context forever, level 12';
begin
open :a for select * from t a;
open :b for select * from t b;
open :c for select * from t c;
end;
/
print a
begin
for i in 1 .. 10000
loop
update t set x = x+1;
commit;
end loop;
end;
/
print b
print c


(part of a presentation I do to explain read/write consistency).


Now, run tkprof on the trace file. You should see:

query a: single digit LIO's
query b: thousands of LIO's
query c: single digit LIO's

it depends on what is in the cache (and the further back in time we go -- well, the less chance that what you want is in cache!) query C benefits from the CR version of the block that query B built and cached (we keep something like 6 block versions in cache)





Very Interesting

April 26, 2004 - 7:32 am UTC

Reviewer: Jon from Sydney

2 more questions before I RIP on this issue:

1) If you can only go back 5 days regardless, then why does the Oracle documentation read:

...
Application Developer's Guide - Fundamentals
Currently, the flashback query feature keeps track of times up to a maximum of 5 days. This period reflects server uptime, not wall-clock time. For example, if the server is down for a day during this period, then you can
specify as far back as 6 days. To query data farther back than this, you must specify an SCN rather than a date and time. You must record the SCN yourself at the time of interest, such as before doing a DELETE. 
...

It quite clearly states you can query back further using an SCN.

And why can I go back 7 or 8 days? (And no, I haven't updated the data dictionary)

2) You talk about CR blocks being constructed and brought into the cache.  Well why do I still get good performance if I restart the DB?  Nothing in the cache then.  In my example, I had 10,000 updates as well.  Still getting fast query time:

SQL> set echo on;
SQL> 
SQL> set timing on;
SQL> 
SQL> alter tablespace data_app offline;

Tablespace altered.

Elapsed: 00:00:00.08
SQL> 
SQL> alter tablespace data_app online;

Tablespace altered.

Elapsed: 00:00:01.02
SQL> 
SQL> select *
  2  from big_table as of scn 817778
  3  where change_date > to_date('25-APR-2004 22:35:31','DD-MON-YYYY HH24:MI:SS')
  4  minus
  5  select *
  6  from big_table as of scn 805637
  7  where change_date > to_date('25-APR-2004 22:35:31','DD-MON-YYYY HH24:MI:SS');

101 rows selected.

Elapsed: 00:00:00.09

Still only 9/100ths of a second... nothing to worry about.

(P.S. How's Iceland... cold?) 

Tom Kyte

Followup  

April 26, 2004 - 7:51 am UTC

1) documentation is incorrect then. i'll file a doc bug.

whats your min(scn_base), that'll be the line. peek at that table.


2) how many rows/block do you have, how many times do you re-read a block? you see, if you read the same block 100 times -- it won't roll it back 100 times -- just once.

also, please measure LIO's vs LIO's -- LIOs = latching. did you test this under load where multiple people doing excessive LIO's are banging into eachother?

i'm not measuring time, its all about "what work am i doing and how will this affect my in a real system"

read rollback from a day ago = PHYSICAL IO's added to system
use N days of rollback to reconstruct block = LOGICAL IO's added to system

in a real system, with lots of users, both of those are things you don't want to do unless you really have to.

don't get me wrong -- flashback is *cool*, just beware of how it physically works (run that little test with queries a,b,c -- see what happens, now multiply that by 100 users, 1000 users, 10000 users maybe near simultaneously.

And finally

April 26, 2004 - 8:05 am UTC

Reviewer: Jon from Sydney

Yes, that doco is misleading then.

I must say, though, many people are playing with breaking the 5-day barrier. Inserting into SMON_SCN_TIME being one of them, and JL suggested setting event 10311 to bypass the SCN => timestamp lookup.

I like you CR sample script - simple and clear. I was being slightly disingenious with you on that last post. My 9/100ths of a sec after clearing the cache was actually 14,000 LIOs on a single user system, dropping to 36 LIOs on the second run.

Just seeing if you were on your toes... you were ;-)

Tom Kyte

Followup  

April 26, 2004 - 9:19 am UTC

so, when does my beer arrive?

:)

Beer

April 26, 2004 - 10:02 am UTC

Reviewer: Jon from Sydney

Yes indeed. I owe you. When are you next in Australia?

Great discussion but isn't there a simpler answer?

April 26, 2004 - 1:41 pm UTC

Reviewer: JAG from ID

What about this for an alternative:

- Add a column to the query table (tab1) and populate via a sequence (seq1)
- Create a new table (tab2) that contains the max sequence number queried by each user
- Create type (type1) that is equivalent to the definition of tab1
- Create a type (type2) that is a table of types type1
- Create a function (f1) that returns type2. The function queries tab1 and stuffs the data a variable of type2. After this is updates tab2 with the max returned sequence number.
- Use cast etc. to allow the call to f1 to behave like a table.

Tom Kyte

Followup  

April 26, 2004 - 2:29 pm UTC

no, because of read consistency (either that or you would be serializing updates against a single row, which is what the lock table in share mode did in a way as well) suffers the same exact problem as reading by timestamp!

the simple answer:

a) use timestamp
b) adjust your time based on v$transaction <<== only difference from "existing code"





Nothing simpler so far ...

April 26, 2004 - 3:16 pm UTC

Reviewer: Gabriel

Jag,

Your sequence-based alternative is no different than the timestamp method (except for not being able to link into v$transaction).

User A inserts new record in Tab1 ... uses sequence 100 … no commit
User A updates existing record in Tab1 … uses sequence 101 … no commit
User B inserts new record in Tab1 … uses sequence 102 … commit
Sync process (User C) executes with a last max sequence number of 50 … it sees the modification 102 but not 100 and 101 … moves everything greater than 50 and sets the next max sequence number to 102.
User A commits changes 100 and 101.

Next time User C syncs it will get everything greater than 102 … hence it totally misses 100 and 101.

Graham’s method of using timestamps and v$transaction to get the smallest time window “safe enough” to include all potentially changed records seems to be the only method (not requiring serialization at source) so far. It only has one caveat … it may process the same change multiple times (this could be a problem if the target system is being modified by other processes … otherwise it is just a bit more CPU consumption … likely much less CPU than a full sync method would require).

Tom’s insight into using both v$transaction and v$lock minimizes the possibility of processing the same change more than once. I wouldn’t say the probability is “near zilch” … all it takes is a long open transaction and/or a busy system (add to that trigger-happy sync users). In the example above, if User C comes twice for sync and User A has still not closed that transaction, the record with change 102 will be processed at least 3 times (since in both cases the earliest safe timestamp will be set to the start of User A’s transaction).

Very good thread indeed.



End up in confusion

April 27, 2004 - 12:39 am UTC

Reviewer: Tony from India, Chennai

As there is a requirment for me to load changed data every night to Data warehouse database from 24x7 OLTP, I red this page and ended up in utter confusion.

My 24x7 OLTP has hundreds of tables. Certainly, I do not prefer trigger based solutions. Some of the tables do not have timestamp column. Even if all tables have timestamp column, I might miss out some rows as master and details rows may have different (in seconds) timestamp for a transaction. It seems that we can not use transaction number based queries also.

Can you please give me some reliable and simple solution to capture the changes with example?.

Tom Kyte

Followup  

April 28, 2004 - 11:44 am UTC

streams.... read about streams.....

</code> http://otn.oracle.com/pls/db92/db92.docindex?remark=homepage#index-STR <code>

Persuing V$LOCK suggestion

April 28, 2004 - 1:01 pm UTC

Reviewer: Graham from Cambridge, ON Canada

You suggested joining v$lock with v$transaction to get the oldest start time for "in flight" transactions with locks on the tables of interest. I've just started to look into this, and can't see a common column on which to join, or how to get the names of the affected tables.

I guess I could join via v$session, using the binary manipulations you showed here:
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:839412906735 <code>

Can you please suggest a query to get the oldest start time from v$transaction where the transaction has uncommitted changes on tables called 'T1' and 'T2'?

BTW, does the db that powers this forum store previews?

Tom Kyte

Followup  

April 28, 2004 - 7:17 pm UTC

v$transaction(addr) -> v$session(taddr), giving you the sid, v$session(sid) -> v$lock(sid), giving you TM locks and id1 = object_id, id1 -> dba_objects(object_id)


ops$tkyte@ORA9IR2> select nvl(min(start_time),to_char(sysdate,'mm/dd/yy hh24:mi:ss'))
  2    from v$transaction
  3   where addr in
  4   (select taddr
  5      from v$session
  6     where sid in
  7   (select sid
  8      from v$lock
  9     where type = 'TM'
 10       and id1 in
 11   (select object_id
 12      from dba_objects
 13     where object_name in ('T', 'T1', 'T2' )
 14       and owner = 'OPS$TKYTE')
 15   )
 16   )
 17  /
 
NVL(MIN(START_TIME),
--------------------
04/28/04 19:15:10
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> commit;
 
Commit complete.
 
ops$tkyte@ORA9IR2> @test
ops$tkyte@ORA9IR2> select nvl(min(start_time),to_char(sysdate,'mm/dd/yy hh24:mi:ss'))
  2    from v$transaction
  3   where addr in
  4   (select taddr
  5      from v$session
  6     where sid in
  7   (select sid
  8      from v$lock
  9     where type = 'TM'
 10       and id1 in
 11   (select object_id
 12      from dba_objects
 13     where object_name in ('T', 'T1', 'T2' )
 14       and owner = 'OPS$TKYTE')
 15   )
 16   )
 17  /
 
NVL(MIN(START_TIME),
--------------------
04/28/04 19:18:31


should do it...


didn't understand the BTW q. 

Streaming and summarizing

April 28, 2004 - 1:02 pm UTC

Reviewer: Duke Ganote from Warsaw, Indiana USA (aka "Lake City" and "Orthopaedic Capitol of the World")

Tom-- I found an intriguing, simple example of streams at </code> http://www.oracle-base.com/articles/9i/Streams9i.php <code>
and it looks great! The article describes streams as an 'extension of a number of existing technologies including Advanced Queuing, LogMinor [sic] and Job Scheduling'. These are all 8i technologies, but it appears that 9i is mandatory because the example script includes:
GRANT EXECUTE ON DBMS_FLASHBACK TO strmadmin;
Could streams be retrofitted onto an 8i database?

Also, would it be fair to summarize the original question and 30-or-so responses as follows--
Q: How can I identify what's committed/changed since the last time I checked a table?
A: Solutions are either 'pre-commit' or 'after commit'.
(1) 'After commit' solutions looked at tying committed changes to an SCN; these were 'hard' but precise solutions.
(2) The 'pre-commit' solution uses V$TRANSACTION and V$LOCK to look for any transaction that *might* commit changes to the table. This is the recommended and simple solution.




Tom Kyte

Followup  

April 28, 2004 - 7:19 pm UTC

streams requires 9ir2 -- it cannot be used before.


the solutions were not pre/post commit -- all were POST commit. there is quite simply "no other way" in Oracle.

the techniques were to minimize the duplication (retrieving the same change two times).

the easiest would be timestamp based, assuming a timestamp on each and every table row is maintained. (if you are DIY, streams would be the prefered method)

Thanks for v$lock query!

April 29, 2004 - 12:13 am UTC

Reviewer: Graham from Cambridge, ON Canada

Thanks for the query Tom! That's a cleaner way to go through v$session than matching the transaction id (via bit manipulation) that I envisioned. A bit convoluted but only has to be done once per sync if one wants that extra level of duplicate elimination.

On the streams vs. polling front, I agree that change capture via the log will often be more efficient and less intrusive than querying for changes, but I wouldn't say that streams are always preferred. Which performs best will depend on factors such as frequency of changes vs. frequency of syncs, how similar the syncs are, and when the syncs occur. Other benefits for querying, which are important to our customer base, are portability, simplicity, and transparency. The same technique works for all mainstream SQL RDBMS products.

Sorry for going off-topic with the BTW. I just wondered if the prototype review is recorded when one presses the Preview Review button. I wasn't being paranoid, but if you record those and ever look at them one could argue that you've made use of the equivalent of dirty reads. ;-)

Tom Kyte

Followup  

April 29, 2004 - 7:28 am UTC

In general, given a system where you have to be non-intrusive (eg: no timestamps exist, no triggers can be added) there isn't really much of a choice.


The Preview Review uses a "state" table -- as all htmldb applications do. So yes, the preview is stored in a session state table, however we don't really ever look at that table - it is just variable/values for a given htmldb session.

Minor correction

April 29, 2004 - 7:58 am UTC

Reviewer: Graham from Cambridge, ON Canada

One correction to the v$lock query in case anyone else wants to use it - it's comparing date strings (in the hard to compare mm/dd/yy format) instead of dates. So I think it would break across year boundaries. Your first version did it better:

select nvl( min(to_date(start_time,'mm/dd/rr hh24:mi:ss')),sysdate)
from v$transaction
...

Date stringification

April 29, 2004 - 9:45 am UTC

Reviewer: Duke Ganote from Warsaw Indiana USA (aka "Lake City" and "Orthopaedic Capitol of the World")

I find an almost ISO 8601 date-string format 
http://www.w3.org/TR/NOTE-datetime
is convenient; for example:
SQL> select to_char(sysdate,'YYYY-MM-DD HH24:MI:SS') dateTime from dual;

DATETIME
-------------------
2004-04-29 08:41:06 

v$transaction.start_timestamp?

April 29, 2004 - 11:41 am UTC

Reviewer: Graham from Cambridge, ON Canada

For v$transaction, we're stuck with start_time in the anachronistic 'mm/dd/rr hh24:mi:ss' format. I too am a strong proponent of ISO 8601 (though I prefer a blank instead of a 'T' between date and time, which seems to have become a de facto standard).

I hope that someday Oracle adds a start_timestamp column to v_$transaction, that is a TIMESTAMP (with or without time zone) that has better than 1 sec resolution.

Tom Kyte

Followup  

April 29, 2004 - 11:56 am UTC

well, in 10g there is a start_DATE of type date (removes the ambiguity with the date format) but it is a DATE, not a timestamp

Transaction time

June 09, 2004 - 3:12 am UTC

Reviewer: MEHMOOD from Karachi, Pakistan

Tom:

Is there any way, that we could see the date and time for a transaction which was committed one month before. And the database in no archive log mode.

Thanks in advance

Tom Kyte

Followup  

June 09, 2004 - 8:42 am UTC

not unless you recorded it somewhere.

August 10, 2004 - 11:46 pm UTC

Reviewer: A reader

ops$tkyte@ORA9IR2> insert into t values ( userenv('commitscn') );

1 row created.

ops$tkyte@ORA9IR2> select x, dbms_flashback.get_system_change_number from t;

X GET_SYSTEM_CHANGE_NUMBER
---------- ------------------------
24758134 24758134

ops$tkyte@ORA9IR2> exec dbms_lock.sleep(10);

PL/SQL procedure successfully completed.

ops$tkyte@ORA9IR2> commit;

Commit complete.

ops$tkyte@ORA9IR2> select x, dbms_flashback.get_system_change_number from t;

X GET_SYSTEM_CHANGE_NUMBER
---------- ------------------------
24758136 24758138


I dont get this, how can the column t.x change magically when no updates have happened on that row?

No matter what magic userenv('commitscn') does, it is physically stored in the table, so how does it after the sleep() and commit()?

Thanks

Tom Kyte

Followup  

August 11, 2004 - 9:36 am UTC

the database can do whatever magical stuff it wants to, we wrote it after all :)

userenv('commitscn') -- which can be used but once per transaction -- is a flag to the server that says "hey, remember where you just stuffed me ok, cause when we commit, you really gotta come back and update me"



A reader

October 28, 2004 - 3:40 am UTC

Reviewer: A from Aus

Hi,
We have got a normal oltp db and datawarehouse db.We need to pull only those data which has changed since last time we pulled from oltp to warehouse.I am aware of the fact that CDC and stream can do,but these are available only on enterprise edition ,we have standard edition(9i).Is there any way with which we can do this.I can think of writing trigger and add a datastamp column in every table to accomplish this,but I don't know if it is good idea to have triggers for every table ??Please advice

Tom Kyte

Followup  

October 28, 2004 - 9:12 am UTC

well, if you don't have EE, you are in "DIY" mode (do it yourself). Your options are quite limited.

you have read only materialized views (snapshots).
you have "put a trigger on every table and use the technique described above".

Another DIY suggestion?

October 28, 2004 - 10:59 am UTC

Reviewer: Bob B from Albany, NY

I'm not sure if this is what Tom was suggesting way back with AQ, but couldn't the the new/changed rows be inserted into a log table. A job can then comes along, take a row from the log table and duplicate it into a user log table once for each user that would request the information and then delete the row from the log table. Then it is up to you to determine the rules of when a row of the user log table is considered read and then whether to delete or update that row.

Performance wise, this puts a hit on each transaction (it has to insert into the log table) and the database takes a hit proportional to [transaction rate * user count]. If the read rule matches what the users expect, then there will be no missed changes and no duplicates. Changes may take a little longer to show up, but they will show up. I doubt this will scale well at all, but should work ok if the system can handle the combination of the transaction rate with the background job.

If the user ids are numbers or can be easily hashed to a unique number, then a bitmap on each transaction table that is to be tracked could work. On insert/update, the bitmap would be set to 000000000000... (in binary; requires at least cieling( # users/8 ) bytes ). Each time a user checks for updates, you query the tables' bitmaps. If the bit for the user is a 0, display the row and update that bit to 1, otherwise skip it.

I haven't played around with bitmaps, I have no idea how that would perform (I see bad things if the user id is a sequence and has large gaps).

Tom Kyte

Followup  

October 28, 2004 - 1:53 pm UTC

that is alot like the timestamp approach with extra work (sort of like using read only MV's

Excellent explanation by Tom

November 08, 2004 - 3:46 pm UTC

Reviewer: Juan Velez from Bellevue, WA, USA

If Tom or others can answer: I still have a doubt. How is possible using the "oldest open transaction" to get duplicates?

Tom Kyte

Followup  

November 08, 2004 - 5:42 pm UTC

you tend to "re-pull" the same row.

At 11:00 you did the initial copy -- so the next time you pull everything from 11:00


At 12:00 transaction t1 started.
At 12:01 transaction t2 started.
At 12:03 transaction t2 stopped.

at 12:05 you "pull" -- you remember 12:00 as the next pull time (because of the outstanding transaction). You pull everything from 11:00 on -- that included t2 since it committed.

At 12:30 transaction t1 commits.

At 13:00 you "pull" -- everything changed since 12:00 -- you re-pull transaction T2.

Changed Data Capture

March 16, 2005 - 9:37 am UTC

Reviewer: Neelz from Japan

Dear Tom

<quote>
Followup:
well, if you don't have EE, you are in "DIY" mode (do it yourself). Your options are quite limited.

you have read only materialized views (snapshots).
you have "put a trigger on every table and use the technique described above".
</quote>

I too am facing almost similar problem with Oracle9iR2 SE.

Scenario
--------
Our requirement is to replicate 3 tables at a remote country site which will be used for reporting. Time lag allowed is 15 min between production DB and remote DB. Network outages are common.

The approach we were planning
-----------------------------
1. Write triggers on each table for each insert, update and delete which will log these sql statements into a log table.
2. Every 15 min a job will be scheduled to execute these sql statements at the remote country site via dblink.
3. For each successful execution at the remote site, that record will be deleted from the log table.

Is this the correct approach? While executing the sql statements for the remote site, if network outage happen what will be the consequences and how should I take care of it?

Could you please help me on this?

Thanks and Regards
Neelz

Tom Kyte

Followup  

March 16, 2005 - 10:05 am UTC

you have read only snapshots for this - for reporting purposes, you need nothing else.


Just use a materialized view (snapshot), it is already done.

How about putting the timestamp in the mview itself?

October 19, 2005 - 7:21 pm UTC

Reviewer: Anonymous from NYC, USA

Here's a possible solution we came up with that works great on 9iR2 (our current system) but will not run on 10g. We decided not to use it because we'll be upgrading to 10g early next year.

The idea is to have the local systimestamp added as a column to the fast refresed materialized view. This isn't normally allowed but if you replace it with to_timestamp(to_char(systimestamp)) it -DOES- work (again only on 9iR2). 

Check out this example to see how only the changed rows get the new timestamp:

/*******************************************************/

/* Formatted on 2005/10/19 19:12 (Formatter Plus v4.8.0) */
SELECT *
  FROM v$version
BANNER                                                          
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
PL/SQL Release 9.2.0.5.0 - Production                           
CORE    9.2.0.6.0    Production                                       
TNS for IBM/AIX RISC System/6000: Version 9.2.0.5.0 - Production
NLSRTL Version 9.2.0.5.0 - Production                           
5 rows selected
 
 
DROP TABLE tb_test
Table dropped
 
DROP MATERIALIZED VIEW mv_test
Snapshot dropped
 
CREATE TABLE tb_test (
   a VARCHAR2(45)
   ,b NUMBER
)
Table created
 
INSERT INTO tb_test
   SELECT object_name
        , ROWNUM
     FROM all_objects
    WHERE ROWNUM < 5
4 rows inserted
 
 
COMMIT
Commit complete
 
SELECT *
  FROM tb_test
A                                              B
--------------------------------------------- --
/1005bd30_LnkdConstant                         1
/10076b23_OraCustomDatumClosur                 2
/10297c91_SAXAttrList                          3
/103a2e73_DefaultEditorKitEndP                 4
4 rows selected
 
 
CREATE MATERIALIZED VIEW LOG ON tb_test WITH ROWID INCLUDING NEW VALUES
Snapshot log created
 
CREATE MATERIALIZED VIEW mv_test
 REFRESH WITH ROWID FAST
 AS
   SELECT t.*
        , TO_TIMESTAMP (TO_CHAR (SYSTIMESTAMP, 'mm/dd/yyyy hh:mi:ssxff am'), 'mm/dd/yyyy hh:mi:ssxff am') row_lup_ts
     FROM tb_test t
Snapshot created
 
COMMIT
Commit complete
 
SELECT   a
       , b
       , TO_CHAR (row_lup_ts)
    FROM mv_test
ORDER BY row_lup_ts DESC
A                                              B TO_CHAR(ROW_LUP_TS)                  
--------------------------------------------- -- -------------------------------------
/1005bd30_LnkdConstant                         1 19-OCT-05 07.13.59.781179000 PM      
/10076b23_OraCustomDatumClosur                 2 19-OCT-05 07.13.59.781179000 PM      
/10297c91_SAXAttrList                          3 19-OCT-05 07.13.59.781179000 PM      
/103a2e73_DefaultEditorKitEndP                 4 19-OCT-05 07.13.59.781179000 PM      
4 rows selected
 
 
INSERT INTO tb_test
   SELECT 'INSERTED --> ' || object_name
        , 5 + ROWNUM
     FROM all_objects
    WHERE ROWNUM < 5
4 rows inserted
 
 
COMMIT
Commit complete
 
begin 
 DBMS_MVIEW.REFRESH(USER||'.MV_TEST');
end;
PL/SQL procedure successfully completed
 
SELECT   a
       , b
       , TO_CHAR (row_lup_ts)
    FROM mv_test
ORDER BY row_lup_ts DESC
A                                              B TO_CHAR(ROW_LUP_TS)                  
--------------------------------------------- -- -------------------------------------
INSERTED --> /1005bd30_LnkdConstant            6 19-OCT-05 07.14.00.526721000 PM      
INSERTED --> /10076b23_OraCustomDatumClosur    7 19-OCT-05 07.14.00.526721000 PM      
INSERTED --> /103a2e73_DefaultEditorKitEndP    9 19-OCT-05 07.14.00.526721000 PM      
INSERTED --> /10297c91_SAXAttrList             8 19-OCT-05 07.14.00.526721000 PM      
/1005bd30_LnkdConstant                         1 19-OCT-05 07.13.59.781179000 PM      
/10076b23_OraCustomDatumClosur                 2 19-OCT-05 07.13.59.781179000 PM      
/10297c91_SAXAttrList                          3 19-OCT-05 07.13.59.781179000 PM      
/103a2e73_DefaultEditorKitEndP                 4 19-OCT-05 07.13.59.781179000 PM      
8 rows selected
 
 
UPDATE tb_test
   SET a = 'UPDATED --> ' || a
 WHERE ROWNUM = 1
1 row updated
 
 
COMMIT
Commit complete
 
begin 
 DBMS_MVIEW.REFRESH(USER||'.MV_TEST');
end;
PL/SQL procedure successfully completed
 
SELECT   a
       , b
       , TO_CHAR (row_lup_ts)
    FROM mv_test
ORDER BY row_lup_ts DESC
A                                              B TO_CHAR(ROW_LUP_TS)                  
--------------------------------------------- -- -------------------------------------
UPDATED --> /1005bd30_LnkdConstant             1 19-OCT-05 07.14.00.634670000 PM      
INSERTED --> /1005bd30_LnkdConstant            6 19-OCT-05 07.14.00.526721000 PM      
INSERTED --> /10297c91_SAXAttrList             8 19-OCT-05 07.14.00.526721000 PM      
INSERTED --> /10076b23_OraCustomDatumClosur    7 19-OCT-05 07.14.00.526721000 PM      
INSERTED --> /103a2e73_DefaultEditorKitEndP    9 19-OCT-05 07.14.00.526721000 PM      
/10076b23_OraCustomDatumClosur                 2 19-OCT-05 07.13.59.781179000 PM      
/10297c91_SAXAttrList                          3 19-OCT-05 07.13.59.781179000 PM      
/103a2e73_DefaultEditorKitEndP                 4 19-OCT-05 07.13.59.781179000 PM      
8 rows selected

/*******************************************************/

/******************************************************
Here's just the sql script itself so that you don't have to cut/paste (I also commented out the drop lines):
/*****************************************************/
SELECT *
  FROM v$version;

--DROP TABLE tb_test;
--DROP MATERIALIZED VIEW mv_test;

CREATE TABLE tb_test (
   a VARCHAR2(100)
   ,b NUMBER
);

INSERT INTO tb_test
   SELECT object_name
        , ROWNUM
     FROM all_objects
    WHERE ROWNUM < 5;

COMMIT ;

SELECT *
  FROM tb_test;

CREATE MATERIALIZED VIEW LOG ON tb_test WITH ROWID INCLUDING NEW VALUES;

CREATE MATERIALIZED VIEW mv_test
 REFRESH WITH ROWID FAST
 AS
   SELECT t.*
        , TO_TIMESTAMP (TO_CHAR (SYSTIMESTAMP, 'mm/dd/yyyy hh:mi:ssxff am'), 'mm/dd/yyyy hh:mi:ssxff am') row_lup_ts
     FROM tb_test t;

COMMIT ;

SELECT   a
       , b
       , TO_CHAR (row_lup_ts)
    FROM mv_test
ORDER BY row_lup_ts DESC;

INSERT INTO tb_test
   SELECT 'INSERTED --> ' || object_name
        , 5 + ROWNUM
     FROM all_objects
    WHERE ROWNUM < 5;

COMMIT ;

EXEC DBMS_MVIEW.REFRESH(USER||'.MV_TEST');

SELECT   a
       , b
       , TO_CHAR (row_lup_ts)
    FROM mv_test
ORDER BY row_lup_ts DESC;

UPDATE tb_test
   SET a = 'UPDATED --> ' || a
 WHERE ROWNUM = 1;

COMMIT ;

EXEC DBMS_MVIEW.REFRESH(USER||'.MV_TEST');

SELECT   a
       , b
       , TO_CHAR (row_lup_ts)
    FROM mv_test
ORDER BY row_lup_ts DESC;


/******************************************************/
On 10g the mview creation gives the same error that 9iR2 gives if you just put SYSTIMESTAMP by itself:

SELECT *
  FROM v$version
BANNER                                                          
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.1.0.4.2 - Prod
PL/SQL Release 10.1.0.4.2 - Production                          
CORE    10.1.0.4.0    Production                                      
TNS for Linux: Version 10.1.0.4.0 - Production                  
NLSRTL Version 10.1.0.4.2 - Production                          
5 rows selected
 
 
SQL> CREATE MATERIALIZED VIEW mv_test
  2   REFRESH WITH ROWID FAST
  3   AS
  4     SELECT t.*
  5          , TO_TIMESTAMP (TO_CHAR (SYSTIMESTAMP, 'mm/dd/yyyy hh:mi:ssxff am'), 'mm/dd/yyyy hh:mi:ssxff am') row_lup_ts
  6       FROM tb_test t;
        , TO_TIMESTAMP (TO_CHAR (SYSTIMESTAMP, 'mm/dd/yyyy hh:mi:ssxff am'), 'mm/dd/yyyy hh:mi:ssxff am') row_lup_ts
                                 *
ERROR at line 5:
ORA-12015: cannot create a fast refresh materialized view from a complex query 


/*********************************************************/

Assumming that the refreshing of the mview is being done by a single job (say every X minutes), is there any way to have a serially updated column (either timestamp or even a sequence #) that will work in both 9iR2 and 10g? Then the process that reads from the mview can always do a greater than check based on the max value that was in that column last time and do it's delta processing using only that chunk (and if nothing changed, then it procesess nothing).

Thanks...

PS: Without using Oracle Streams to capture the changes, I'm talking only about straight mviews here... 

Tom Kyte

Followup  

October 19, 2005 - 7:59 pm UTC

that means it was a bug in 9i fixed in 10g - the timestamp should never have been allowed.

So is there any legitimate way of getting only the changed records from the mview?

October 20, 2005 - 12:26 am UTC

Reviewer: Anonymous from NYC, USA

Is there a "legitimate" way of getting only the changed records for the mview? The DB itself knows what changed (it's a fast refreshable mview) so is there a way to get that info out to the application. This seems like there is a bit more leeway than just updates to a table since there is the serialized step of refreshing the mview (which is where I see some kind of monotonic value being added).

Any ideas?

Tom Kyte

Followup  

October 20, 2005 - 8:06 am UTC

no legitimate way - but the materialized view log will have the primary keys of the changed rows........ and a dml type (insert/update/delete)....

unless you direct path, then you'll have rowid ranges instead of primary keys - so it can get tricky.


There is a thing called CDC (change data capture)....
There is a thing called Streams

in current software to help solve the situation you have, of wanting to know about the changes yourself

How about using a deterministic function?

October 20, 2005 - 10:16 am UTC

Reviewer: Anonymous from NYC, USA

I changed the script to create a deterministic function that returns systimestamp and it works in both 9iR2 and 10g.

Here's the modified script:

/* =================================================== */

/* Formatted on 2005/10/20 10:05 (Formatter Plus v4.8.0) */
DROP TABLE tb_test
Table dropped

DROP MATERIALIZED VIEW mv_test
Snapshot dropped

SELECT *
FROM v$version
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.1.0.4.2 - Prod
PL/SQL Release 10.1.0.4.2 - Production
CORE 10.1.0.4.0 Production
TNS for Linux: Version 10.1.0.4.0 - Production
NLSRTL Version 10.1.0.4.2 - Production
5 rows selected


CREATE OR REPLACE FUNCTION fdw_tmp_ts
RETURN TIMESTAMP DETERMINISTIC
IS
BEGIN
RETURN SYSTIMESTAMP;
END;
Function created

CREATE TABLE tb_test (
a VARCHAR2(45)
,b NUMBER
)
Table created

INSERT INTO tb_test
SELECT object_name
, ROWNUM
FROM all_objects
WHERE ROWNUM < 5
4 rows inserted


COMMIT
Commit complete

SELECT *
FROM tb_test
A B
--------------------------------------------- --
DUAL 1
DUAL 2
SYSTEM_PRIVILEGE_MAP 3
SYSTEM_PRIVILEGE_MAP 4
4 rows selected


CREATE MATERIALIZED VIEW LOG ON tb_test WITH ROWID INCLUDING NEW VALUES
Snapshot log created

CREATE MATERIALIZED VIEW mv_test
REFRESH WITH ROWID FAST
AS
SELECT t.*
, fdw_tmp_ts row_lup_ts
FROM tb_test t
Snapshot created

COMMIT
Commit complete

SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC
A B TO_CHAR(ROW_LUP_TS)
--------------------------------------------- -- ----------------------------------------
SYSTEM_PRIVILEGE_MAP 4 20-OCT-05 10.06.29.017678000 AM
SYSTEM_PRIVILEGE_MAP 3 20-OCT-05 10.06.29.017600000 AM
DUAL 2 20-OCT-05 10.06.29.017503000 AM
DUAL 1 20-OCT-05 10.06.29.013737000 AM
4 rows selected


INSERT INTO tb_test
SELECT 'INSERTED --> ' || object_name
, 5 + ROWNUM
FROM all_objects
WHERE ROWNUM < 5
4 rows inserted


COMMIT
Commit complete

begin
DBMS_MVIEW.REFRESH(USER||'.MV_TEST');
end;
PL/SQL procedure successfully completed

SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC
A B TO_CHAR(ROW_LUP_TS)
--------------------------------------------- -- ----------------------------------------
INSERTED --> SYSTEM_PRIVILEGE_MAP 9 20-OCT-05 10.06.29.526768000 AM
INSERTED --> SYSTEM_PRIVILEGE_MAP 8 20-OCT-05 10.06.29.526460000 AM
INSERTED --> DUAL 7 20-OCT-05 10.06.29.526100000 AM
INSERTED --> DUAL 6 20-OCT-05 10.06.29.521610000 AM
SYSTEM_PRIVILEGE_MAP 4 20-OCT-05 10.06.29.017678000 AM
SYSTEM_PRIVILEGE_MAP 3 20-OCT-05 10.06.29.017600000 AM
DUAL 2 20-OCT-05 10.06.29.017503000 AM
DUAL 1 20-OCT-05 10.06.29.013737000 AM
8 rows selected


UPDATE tb_test
SET a = 'UPDATED --> ' || a
WHERE ROWNUM = 1
1 row updated


COMMIT
Commit complete

begin
DBMS_MVIEW.REFRESH(USER||'.MV_TEST');
end;
PL/SQL procedure successfully completed

SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC
A B TO_CHAR(ROW_LUP_TS)
--------------------------------------------- -- ----------------------------------------
UPDATED --> DUAL 1 20-OCT-05 10.06.29.675079000 AM
INSERTED --> SYSTEM_PRIVILEGE_MAP 9 20-OCT-05 10.06.29.526768000 AM
INSERTED --> SYSTEM_PRIVILEGE_MAP 8 20-OCT-05 10.06.29.526460000 AM
INSERTED --> DUAL 7 20-OCT-05 10.06.29.526100000 AM
INSERTED --> DUAL 6 20-OCT-05 10.06.29.521610000 AM
SYSTEM_PRIVILEGE_MAP 4 20-OCT-05 10.06.29.017678000 AM
SYSTEM_PRIVILEGE_MAP 3 20-OCT-05 10.06.29.017600000 AM
DUAL 2 20-OCT-05 10.06.29.017503000 AM
8 rows selected

/* =================================================== */

Here's just the script itself (so you don't have to cut/paste):

DROP TABLE tb_test;
DROP MATERIALIZED VIEW mv_test;

SELECT *
FROM v$version;

CREATE OR REPLACE FUNCTION fdw_tmp_ts
RETURN TIMESTAMP DETERMINISTIC
IS
BEGIN
RETURN SYSTIMESTAMP;
END;
/

CREATE TABLE tb_test (
a VARCHAR2(45)
,b NUMBER
);

INSERT INTO tb_test
SELECT object_name
, ROWNUM
FROM all_objects
WHERE ROWNUM < 5;

COMMIT ;

SELECT *
FROM tb_test;

CREATE MATERIALIZED VIEW LOG ON tb_test WITH ROWID INCLUDING NEW VALUES;

CREATE MATERIALIZED VIEW mv_test
REFRESH WITH ROWID FAST
AS
SELECT t.*
, fdw_tmp_ts row_lup_ts
FROM tb_test t;

COMMIT ;

SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC;

INSERT INTO tb_test
SELECT 'INSERTED --> ' || object_name
, 5 + ROWNUM
FROM all_objects
WHERE ROWNUM < 5;

COMMIT ;

EXEC DBMS_MVIEW.REFRESH(USER||'.MV_TEST');

SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC;

UPDATE tb_test
SET a = 'UPDATED --> ' || a
WHERE ROWNUM = 1;

COMMIT ;

EXEC DBMS_MVIEW.REFRESH(USER||'.MV_TEST');

SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC;

/* =================================================== */

How consistent do you think this would be in actual use? The function is only called on the changed records so only those timestamps are being updated (as in the example above)...

Tom Kyte

Followup  

October 20, 2005 - 4:32 pm UTC

you are lying to us, it is not deterministic.

Play with fire.
You get burned.

Think about it.

Query on the above thread..

January 22, 2007 - 1:23 pm UTC

Reviewer: A reader

Hi,
Pardon for the ignorance, but will the above example not suffice if there would only be conventional inserts (no direct path) and no updates AND the refresh is done once a day.


Regards,

last modified record

October 10, 2007 - 6:40 am UTC

Reviewer: D

Hi Tom,

Could you please let me know is there any way to find out the last inserted record in a table?

Thanks,
Dheeraj

v$transaction solution

October 29, 2009 - 6:23 am UTC

Reviewer: A reader

Tom,

"select nvl( min(to_date(start_time,'mm/dd/rr hh24:mi:ss')),sysdate)
from v$transaction;"

Does this solution work in a RAC config ?

I'm having problems with it on Release 11.1.0.7 with 2 RAC nodes. User has performed an update via 1 node (not committed), the batch job which selects from v$transaction can run on either node, if it happens to run on the node where the interactive user has performed the update, it can see a value in V$TRANSACTION, otherwise it can't.

Any idea how I can get round this problem?
Tom Kyte

Followup  

October 29, 2009 - 10:10 am UTC

use gv$transaction in RAC

Which changes are being keep track by CDC

May 28, 2010 - 5:38 pm UTC

Reviewer: A reader

Hi Tom,
If I have a table A with columns C1,C2,C3,C4,C5 and I configure CDC to keep track of changes (I/U/D) for table A only for column C5, will CDC capture changes if an update is done for table A on column C3?
My understanding is that during update statements, CDC only capture changes only if the column on which CDC is set up to track down changes is modified (in my example CDC only keeps track of changes done via update for table A only if column C5 is modified)
Am I right?

Thanks in advance for your help.

May 04, 2011 - 2:42 pm UTC

Reviewer: A reader from UK

We have a query in our application that is -

SELECT /*+ FIRST_ROWS (100) */
a_ordered.output_sequence_no
,a_ordered.message_type
,a_ordered.message_data
FROM
(
SELECT
a.output_sequence_no,
a.message_type,
a.message_data
FROM message_audit a
WHERE a.trading_day = :trading_day
AND a.exchange_code = :exchange_code
AND a.class IN ( SELECT ucp_message_class
FROM tramp_ucp_message_class
WHERE tramp_message_class = :message_class
)
and a.firm IN ( SELECT g.firm from operator_group g
WHERE g.group_name = :operator_group_name
AND g.trading_day = :trading_day
AND g.exchange_code = :exchange_code
)
AND a.output_sequence_no > :output_sequence_no
AND a.ora_rowscn <= ( SELECT NVL( tx.earliest_scn, db.current_scn ) AS query_scn
FROM ( SELECT MIN( start_scn ) AS earliest_scn
FROM v$transaction
WHERE name = USER||'_MESSAGE_AUDIT'
) tx,
v$database db
)
order by a.trading_day, a.exchange_code, a.output_sequence_no
) a_ordered
WHERE rownum < :req_message
/


The requirements for the query are:
1. Trade transactions will be processed and committed in parallel.
2. Each transaction can produce several new message audit records (no updates).
3. The query must return audit records in output_sequence_no (OSN) order, but cant return larger OSNs if a transaction with lower OSNs is still pending.

The problem:
1. When operating right at the point where active transactions are committing, the query is returning a record for a later OSN while earlier transactions are still pending. This results in records with earlier OSNs committed a fraction of a second later being missed from the audit trail because the later OSN is used to drive the next iteration of the query. This is the issue that the V$TRANSACTION data is being used to avoid, but clearly it is not working as expected.

The design is in several parts:

1. The incoming data is pre-processed in PL/SQL, which names the transaction using SET TRANSACTION command.
2. The name appears in V$TRANSACTION view as soon as real transactional work is done (INSERT, UPDATE or SELECT FOR UPDATE).
3. The queries are running on a loop to pick up new audit messages.
4. The queries uses data from V$TRANSACTION to get the MIN(start_scn) for in flight transaction and exclude commited rows with a higher ORA_ROWSCN.

We are making some assumptions as to how V$TRANSACTION works:
1. That all transactions will appear in V$TRANSACTION, if only briefly.
2. That the INSERT to the message_audit table will assign the new MESSAGE_ID from the sequence and add the transaction to V$TRANSACTION as an atomic step. This captures the appropriate start_scn.
3. That the entry stays in V$TRANSACTION until the commit is complete, even when synchronous DataGuard is in place.


I have looked on metalink for any bugs around ORA_ROWSCN and cant seem to see any relevance to the problem we are experiencing.

Any help would be much appreciated.

Tom Kyte

Followup  

May 04, 2011 - 3:36 pm UTC

what is an OSN?

I presume you meant "output_sequence_no"?

How is that number assigned? What rule do you have in place to ensure that a higher 'OSN' means "my transaction started after yours"?

That the INSERT to the message_audit table will assign the new MESSAGE_ID
from the sequence and add the transaction to V$TRANSACTION as an atomic step.
This captures the appropriate start_scn.


That cannot happen - it is not even feasible to consider it happening that way. You will have an entry in v$transaction as part of a sequential set of steps - things that take place one after the other.


The other issue is that queries against v$ views are not read consistent - they are volatile in memory data structures.



Instead of trying to say what this query is supposed to do for you - why don't you tell us the requirements of the process you are trying to accomplish

May 05, 2011 - 2:31 am UTC

Reviewer: A reader

Ok well a simple as I can explain this we have a business requirement to send back audit details to our members.

The process is that the member will send in the last output_sequence_no they received, the output_sequence_number (OSN)is a number generated by a sequence in oracle. We then need to provide them with all their audit data since that OSN they provided. They requrest this in batches which is why the rownum limit is in the query.

However it is entirely possible that OSN 10 could be committed before OSN 9 and so our requirement is to look at any in flight transactions on the message_audit table, for inserts into the message_audit table to cater for this we set the transaction name so we can query v$transaction for these transactions.

The problem that is occurring is that when we send OSN 10 and do not include any below this the audit data will never be sent as they will request OSN > 10 on the next feed.

Again any help would be gratefully appreciated.

Regards

Gareth

Tom Kyte

Followup  

May 06, 2011 - 9:32 am UTC

why not just send them the last N committed transactions - mark them as you retrieve them as "sent". Wouldn't that be trivial to accomplish and solve the requirement?

Using Materialized View Log instead of CDC

September 13, 2012 - 8:47 am UTC

Reviewer: Lal from India

Tom,

I have a few queries related to change data capture.

Do you foresee any issues if i use mv logs for
getting the data changes in oltp (Insert,Update,Delete)
and then use the same for populating DWH?

The same thing can be achieved by Triggers also.
Will MV logs be more efficient compared to triggers?

Currently we use CDC and we are facing some issues with CDC. (already raised to Oracle support).

Seems that golden gate is the Oracle recommended solution in future for CDC, but this would need extra license cost.

Is it possible to get the changes as insert update delete (IUD) with golden gate similar to that in CDC?


Tom Kyte

Followup  

September 14, 2012 - 6:45 pm UTC

you cannot use the mv logs, plain and simple. they are ours, they are undocumented, do not use them.

I would recommend either

a) streams
b) golden gate


yes, golden gate has the ability to process transformations just like streams (and more)

Extract delta from MV

September 07, 2013 - 12:54 pm UTC

Reviewer: karma from USA

I have declared MV logs on 4 source tables (database A ) and then I have created 4 Materialized Views on this tables (Database B using dblinks ) with FAST Refresh which works fine. But I need to use this Materialized view which would be refreshed every 5 minutes and I want to extract recent changes i.e Delta only and then apply to 4-5 different target tables. This are big tables and target tables needs to refreshed instantly how can I get just delta ?



May seem obvious but I am not able to design reliable process/mechanism. Any help/suggestions ?

Tom Kyte

Followup  

September 10, 2013 - 9:38 am UTC

materialized views are not designed to provide you that information - they are designed for us to be able to apply deltas from one set of tables to another. We do not track the "deltas" of the "other" set of tables anywhere at all.

Now, you can have materialized views of materialized views - perhaps these 4-5 other tables should be materialized views themselves - of the materialized views.

Extract delta from MV

September 11, 2013 - 2:38 pm UTC

Reviewer: karma_hs from Boston, MA

Ok. Is creating MV log on Materalized View an option to get delta and use them to feed target tables ? Any drawbacks issues with this approach ? Or this is not advisable
Tom Kyte

Followup  

September 23, 2013 - 5:51 pm UTC

you may have materialized views (mvs) of mvs yet.

but we do the extraction from the mv logs - you do not.