Graham, April 01, 2004 - 11:06 am UTC
I accept your point about missing data with a last_read timestamp from the *end* of a query. But I tried to be clear that we don't do that.
My example showed that our next last_checked timestamp is noted at the *beginning* of the query's transaction, not the end. I also repeated that later as a bullet. I think you'll agree that we don't miss any changes with blocking read databases, but you're right that we can get some duplication between successive queries. I forgot to mention that duplication can occur in the following case:
- Changes with a timestamp the same as the last_checked timestamp, that were commited before the last query ran. Those will have been fetched in the last running of the query as well as in the current one.
As I said, we can deal with duplicates but it would be better if the duplication window were as small as possible (i.e. 1 second with Oracle 8i and 1 microsecond with Oracle 9i and above). However the best solution I had previously (of using the start time of the oldest open transaction, via V$TRANSACTION) had a duplication window of the longest transaction.
I disagree that I assumed indexed reads. That's irrelevant. Note that the duplication I was talking about was between successive runs of the query by the same user. Phantom rows will of course be possible unless serializable isolation is used regardless of database type. (I was assuming Oracle's default of read committed isolation.)
Thanks for your suggestion of using a share mode lock on the table and autonomous_transaction. If I understand correctly, that gets the duplication window down to the timestamp precision, as it is by default with blocking-read databases. FYI, when a share mode lock is waiting "for outstanding modifications to complete" that effectively turns the read with that lock into a blocking read, so your suggestion is basically to use blocking reads.
Note that your suggestion of locking the whole table is more severe than the row or page level locking that a blocking-read database will do to accomplish the same thing. However putting it an autonomous transaction amelioriates that somewhat.
So it seems to be a tradeoff between minimizing the duplication window and performance issues from adding share mode table locking.
(Sorry to nitpick some more, but do you not consider the RBS a copy of the data?)
Thanks again for your help!
April 01, 2004 - 11:21 am UTC
RBS is not a copy of the data - RBS provides a purpose.
In your other databases -- they put the RBS in the redo, we just keep it separate. Helps us perform better (we only write redo, we do not read redo during normal operation -- as redo is a serious point of serialization in databases -- anything we do to avoid having to ever read it is "a good thing (tm)". rollback is not a point of serialization.
Why the headache?
Gabriel, April 01, 2004 - 1:13 pm UTC
<quote>Our users want to periodically run queries (via our app) that get changes since the last time they
ran the query.</quote>
My reasoning
This is an application with named users
you have to persist the last_checked_by_this_user date
when you do that, rather than saving just the DATE, save the result set of the query
next time round do the following:
1. run the query to get the latest result set
2. compare the new result set against the previous one
3. if the same then return nothing
if different then delete the old result set, store the new result set and return the new result set
In essence you have to return <Latest ResultSet> Minus <Saved ResultSet> and Merge the <Latest ResultSet> over <Saved ResultSet>.
Going after v$transaction, SCN, RBS, non-blocking reads, redo or whatever is like trying to split the atom
only thinking about them when building a business application should raise a big red flag
I would argue that, for the said requirement, there is no solution short of some sort of data versioning (but not going down into the guts of Oracle
go the other way and snap some copies).
Hope I didnt oversimplify.
Are you taking shortcuts here?
Mike, April 02, 2004 - 8:17 am UTC
I agree that this seems to be getting too deep into the internals. The business requirement should be more precisely stated.
This is more than a 'convenience' requirement to avoid forcing the users to see results they had seen previously. If it was just a 'convenience', then you would not be getting complaints about a few rows being displayed more than once.
Your requirement sounds like: log every row that is 'seen' by this user. Do not show the row again unless it has a change. This makes the viewing of the data a business transaction.
If this is the case, you should design to that requirement. It sounds like you are trying to take shortcuts here.
You could:
- Have a child table of the viewed data, keyed by the key of the viewed data plus the username
- When you view the data, exclude any that have a child for this user. Insert a child row for every row that is displayed.
- When your trigger updates the timestamp, delete all of its children.
You might also ask questions like:
What if the user requests the data, but never scrolls all the way to the end?
Performance matters
Graham, April 12, 2004 - 7:41 am UTC
Sorry if I confused others about the nature of our application. I just said I thought the problem would *also* be faced by report generation applications (not that our application generates reports). I thought it would be a common problem since an implementation that works with other DBMSes can miss data with Oracle's non-blocking reads.
There are two reasons we use timestamps instead of result sets: speed and storage! Performance is of paramount importance for our application. In many deployments our application is used simultaneously by thousands of users, so concurrent performance for the query is essential. We often need to address that at very low levels for the DBMSes that we support.
FYI, Tom was right that duplication with our scheme is possible with other DBMSes, but with those the window for duplication is only 1 millisecond or 1 microsecond (since those are the timestamp precisions with the other databases). For our application that level of duplication is fine, but missing data is not. More duplication is okay for correctness, but undesirable for performance.
To see how much effect shared table locking has on the performance of our application, I have benchmarked it using our benchmark framework. I tested with 1000 users running the query, with a maximum of 5 active at time (the rest queued up). 6 scenarios were timed, with all 1000 clients doing the same thing:
1. select 0 rows
2. select 1000 rows
3. select 0 rows followed by update of 1000 rows
4. select 0 rows followed by delete of 1000 rows
5. select 0 rows followed by insert of 1000 rows
Moving the SELECT into a stored procedure made no significant difference in the timings.
Making that SP an autonomous transaction and adding a shared table lock, had the following effect on the total time for all 1000 clients:
1. less than 1% difference
2. 10% longer
3. 10% longer
4. 27% longer
5. 58% longer
So the shared table locking, even in a short transaction, has significant negative effect on concurrent performance, particularly when inserts or deletes are happening at the same time.
Our alternative of saving the oldest start time of the open transactions does not have concurrency issues, just the larger window for duplication.
I think an SCN-based solution would be best. I'm thinking of using a trigger to mark each row with a transaction id, and maintaining a mapping between transaction id and commit SCN. Two questions:
1. If I have an SCN number, can I determine the corresponding system TIMESTAMP (for 9.0.1 and up) or DATE (for 8.1.7)? Or the reverse (SCN corresponding to a timestamp)? If it makes a difference, I have in mind the SCN from the post-commit value of userenv('commitscn').
2. To uniquely identify transactions, without chance of wrapping for the lifetime of the database, should I use xidusn,xidslot,xidsqn from V$TRANSACTION (for 8.1.7) or dbms_transaction.local_transaction_id (for 9.0.1 and up)?
Recall that the overall problem is selecting the latest values for any rows that have changed since the last time the user checked.
April 12, 2004 - 8:21 am UTC
as I said -- this is a most *UNUSUAL* request, in fact, one I've never encountered as a request ever.
I've never had some one ask for "give me the data added since the last time I was there". That would mean you are doing a ton of work in the client. If you have messages you want to process, we use AQ (queueing technology) to delive messages to 1 or more consumers.
And -- have you measured the effects of blocking reads in the other databases -- all we've done is introduce blocking reads here.
the blocking reads, even in short transaction, has significant negative effect on concurrent performance, particularly when inserts or deletes
are happening at the same time!
1) in 8i, no. in 9i -- within plus or minus 5 minutes (and only within the last 5 days). in 10g -- within plus or minus 3 seconds and only for the last 5 days.
You must be aware that userenv('commitscn') is totally undocumented and its behavior subject to change at our whim. The use of that feature could lead to unpredicable behavior in the future.
2) dbms_transaction is easier by far.
In Oracle, the correct implementation would use message queuing.
Never say never ;-)
Graham, April 12, 2004 - 3:08 pm UTC
>as I said -- this is a most *UNUSUAL* request, in fact, one I've never
>encountered as a request ever.
>
>I've never had some one ask for "give me the data added since the last time I
>was there".
I must have missed where you said that. Maybe it's not as common as I assumed for reports, but it is for anything that involves data synchronization. Haven't you ever used a web forum that had a "view new posts since I was last here" function? That's pretty universal in other web forums I've used (though not this one). Another example is source code control, where you typically want to get checked-in changes since the last time you updated. Windows Briefcase works the same way - when you sync you get the latest version of any file that has changed. In everyday life I'm often concerned with "what's new since I last checked?"
Doesn't seem that unreasonable to want do that type of query on an Oracle database. It works well (including good concurrent performance) with other DBMSes. I'm surprised you suggested switching to a messaging architecture to retrieve changes from a database!
(Of course it is often further filtered than just "what's new" - just think of that as additional WHERE clauses for the SELECT which I've left out for simplicity.)
>And -- have you measured the effects of blocking reads in the other databases --
>all we've done is introduce blocking reads here.
Without the shared table lock performance for Oracle 9.2.0.1.0 is about the same as with the best performing DBMSes that use blocking reads (using the same read committed isolation). Adding the shared table lock with Oracle lowers the performance to be significantly below those. The other DBMSes use row or page-level locks which apparently offer better concurrency than the table level locking that you suggested for Oracle.
FYI, I tried LOCK TABLE t IN ROW SHARE MODE, because I wasn't sure from the documentation description if that would block reads in the same way. Turns out that doesn't block reads, so it doesn't solve the problem in the same way as LOCK TABLE t IN SHARE MODE.
Note that blocking reads wouldn't be needed if one had access to a timestamp, or some other sequential marker, corresponding to the time of commit.
>1) in 8i, no. in 9i -- within plus or minus 5 minutes (and only within the last
>5 days). in 10g -- within plus or minus 3 seconds and only for the last 5 days.
How (for 9i and 10g)?
>You must be aware that userenv('commitscn') is totally undocumented and its
>behavior subject to change at our whim. The use of that feature could lead to
>unpredicable behavior in the future.
Yes, I mentioned that drawback in my original post. You have said that Oracle uses commitscn for Oracle replication. That is a similar problem, though our clients are not Oracle databases, and we don't care about each transaction, just the current value when we run our query.
>2) dbms_transaction is easier by far.
So wrapping isn't a concern with either method?
April 12, 2004 - 7:07 pm UTC
sorry -- i thought i had put that in there at some point (the unusual nature), guess I didn't.
....
I'm surprised you suggested switching to a messaging architecture to
retrieve changes from a database!
.......
why? very common approach in my experience. Rather than poll for whats new, subscribe to new data.
The viewing new posts since I was last here is universally "almost always wrong" in my experience. I always see something I saw or am missing something I didn't see (cause they use lots of databases that do dirty reads or have NO concurrency controls whatsoever). I don't have it here cause I don't want to make people log in, people get so darn antsy when I set a cookie (some people freak out as funny as that is -- they don't get it). And it is simple enough for them to look at the posts (with timestamps) backwards.
Also -- given that a transaction should be much better than sub-second (especially on the type of forum you are talking about) and we don't block much (if ever) -- so the transactions will all be way sub-second, the 'missing of a message' doesn't come into play (just back the timestamp off by 1 second if you are truly paranoid).
Guess instead of reads blocking writes -- you could just serialize the modifications, that would solve the issue pretty much as well.
Source code control works on "check in check out" -- very easy to accomidate in Oracle -- source code control works on "one person owns a piece of code at a time, only one modifier" (needs strict concurrency controls).
I won't even comment on windows stuff. No concurrency controls whatsoever going on there (good luck with that)
As for 9i and 10g -- in 10g there are functions to convert an SCN to a timestamp and a timestamp to an SCN. in 9i, you would query the table SMON_SCN_TIME to go either way.
Arghh
Graham, April 13, 2004 - 12:27 am UTC
I guess that all this runaround means you agree that it is indeed problematic to query "What's new since I last checked?" via SQL with Oracle's non-blocking reads. Your only SQL-based suggestion was to use blocking reads, but (not surprisingly) Oracle doesn't do that as well as the DBMSes that rely on blocking reads instead of snapshot-based isolation. (BTW, writes-blocking-reads is the requirement, not reads-blocking-writes.)
As you point out for seeing what's new in your forum, it's only natural to use the timestamps for posts. Fair enough that you don't want to require logins or cookies. However that's no reason to dismiss in general the validity of wanting to be able to query "what's new since I last checked" via timestamps, since that is exactly what you are expecting your readers to do manually.
(FYI our application is not a web forum or a source control system. Those were just examples of common applications where one typically wants to get changes since the last time one checked.)
The reason I was surprised about your messaging suggestion, is that you usually seem to advocate doing things in the database if possible, using SQL or PL/SQL.
I'm similarly surprised that you suggest serialization, because (like me) you usually seem obsessed with good concurrent performance! (To be fair, you were suggesting it for a web forum like yours.)
It's off topic, but I guess you've never used CVS or Perforce source control systems. Both allow concurrent "checkouts", and both are very widely, and successfully, used.
Back on topic - when does dbms_transaction.local_transaction_id wrap?
The same question was asked (about recycling transaction_ids) before, but I didn't see that part answered:
</code>
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:8913646338893#8918465772722 <code>
April 13, 2004 - 7:46 am UTC
And as I said -- to query what is new since I last checked -- all that takes is a TIMESTAMP. All of the transactions on this system are way subsecond. You would miss *nothing*
Messaging IS A DATABASE SOLUTION. 100%! AQ is in the database itself. It is an RDBMS feature (well, in Oracle is sure is, since a message queue requires a stable store. all messaging implementations have at their core a database -- IBM MQSERIES -> db2 is behind it for example. We just choose to call a horse a horse and expose it as a database feature, not as a separate piece of software)
yes, xid's can roll -- it takes a really utterly long time as they are a function of three bits -- the rollback segment id (USN) a transaction slot and a sqn number.
As we round round around your rbs's, we'll reuse slots and each slot has a 32bit (believe it is 32 bits) signed integer as its sequence which means it'll roll at 32k.
Now, on my system -- i have about 10 rbs's (varies with auto undo). each has 48 slots. So, I'd need to have 10*48*32k transactions to see a possible roll around.
Always compromises
Graham, April 13, 2004 - 10:41 am UTC
Sorry if I haven't been clear. I want to solve the problem of "what's new since I last checked" for *any* Oracle database (*not* this forum). A requirement is that it be done via a SQL SELECT statement that creates a result set with the latest values for the new or changed rows.
So far I've got two timestamp-based solutions for Oracle that won't miss data:
1. Set the next "last checked" time to be the oldest start time of the open transactions.
- Scales well.
- Max window for duplication is longest transaction. Could be excessive duplication, affecting performance, if there are long transactions.
2. Use shared table lock to enable blocking reads.
- Duplication window minimized (max is timestamp precision).
- Scales poorly.
An SCN-based solution might be possible.
- Have to use own transaction id since can't rely on having unique dbms_transaction.local_transaction_id for lifetime of database.
- Only way to determine commit order of transactions is via undocumented and unsupported commitscn feature.
So our existing solution (#1) seems best, unless minimized duplication is more important than concurrent throughput (in which case #2 might be better).
Thanks Tom for suggesting #2. Too bad there isn't a better performing option.
Is there a limit on transaction duration? (UNDO_RETENTION seems to just be the time limit for a query from the start of the transaction. I'm not sure if inserts, updates and deletes have time limits.)
April 13, 2004 - 10:56 am UTC
there is no limit on transaction duration.
Transaction limits?
Graham, April 13, 2004 - 12:02 pm UTC
>there is no limit on transaction duration.
Maybe not a time limit per se, but I think the maximum undo tablespace size imposes some limit.
If one has a finite undo tablespace, and a long running transaction such that there is an unable to extend error, which transaction gets the error? The long running one or a subsequent one?
I'm guessing the long running one, otherwise I think all subsequent transactions would fail too.
April 13, 2004 - 6:16 pm UTC
no, not really. insert a row and walk away.
you didn't say the transaction has to be LARGE, just long running.
the transaction that fails to extend will get an error and have to decide whether to rollback or retry.
(this is assuming you do not have resumable statements enabled which changes everything....)
Correction to transaction id wrap?
Graham, April 13, 2004 - 9:25 pm UTC
You said:
-----------
yes, xid's can roll -- it takes a really utterly long time as they are a function of three bits -- the rollback segment id (USN) a transaction slot and a sqn number.
As we round round around your rbs's, we'll reuse slots and each slot has a 32bit (believe it is 32 bits) signed integer as its sequence which means it'll roll at 32k.
Now, on my system -- i have about 10 rbs's (varies with auto undo). each has 48 slots. So, I'd need to have 10*48*32k transactions to see a possible roll around.
-----------
Sorry I didn't check the math earlier. 32k is 2^15. 2^31 is 2G.
10*48*32k is around 15 million, which isn't an outrageous number of transactions for a busy database.
However if it's actually 10*48*2^31 (about 10^12) then I agree that's much less likely to wrap in the lifetime of a database. (10 years if there were ~3,300 transactions per second, if my math is right.)
Is the slot sequence rollover at 32k or 2^31?
April 14, 2004 - 7:37 am UTC
doh, you are right not 32k, 2^31, 2billion.
I just had a thought about "show me whats changed". Do you need to support UPDATED information as well as just newly added?
On the edge of my seat....
Graham, April 14, 2004 - 9:53 am UTC
>I just had a thought about "show me whats changed". Do you need to support >UPDATED information as well as just newly added?
Yes! I did say that in the first sentence of the original post, but that was a while ago.
Got a new idea? I'm all ears!
April 14, 2004 - 10:56 am UTC
I'm being the devils advocate here.
then what happens in the "other" databases when
you start reading "all new stuff"
while you are in "the middle", someone updates the first row (takes away 50$ from savings) and puts it into the last row (adds 50$ to checking).
you'll return a set of new rows that is totally inconsistent -- as you'll return the last row, but not the first.
On the second get (next time back in) you'll return the first row for the first time and last row again. What the heck would a client do with that? Totally inconsistent results.
See, I'm still trying to fathom the general purpose usage of this in the real world -- what does your application really do here.
Oops
Graham, April 14, 2004 - 9:56 am UTC
Meant to say "paragraph" instead of "sentence" in above review.
Sigh.
Graham, April 14, 2004 - 3:14 pm UTC
That's an example of the standard repeatable read problem. You can get non-repeatable reads with Oracle's default isolation too (though admittedly not for a single query like you can with a blocking-read database using read committed isolation).
You're right about the inconsistency after the first query, but wrong about the final consistency. Remember, we are selecting *values* for the rows, not replicating transactions or deltas. In terms of your example, after the first query the client will have the old value for the 1st row and the new value for the 2nd row. After the second query, the client will have the new value for both rows. So after the second query the client will have the same values as the database for both rows. That's perfectly consistent.
Yes the new value of the 2nd row will have been fetched twice, but as I've said, we can handle that kind of duplication. The window for duplication is the length of the query's transaction (or the timestamp precision, whichever is longer).
A customer can choose repeatable read or serializable isolation if they don't want to have that temporary inconsistency. (In Oracle one would have to use a read-only transaction or serializable isolation to get repeatable reads and avoid phantoms if one was querying multiple tables or the same table more than once.)
Note that the standard database issues of non-repeatable reads and phantoms are not peculiar to our application (or which DBMS one is using)!
I'm glad you're thinking about this, but I hope you can switch back to thinking about the best way to select "what's new since I last checked" with Oracle! ;-)
FYI, the real world problem is data synchronization (but the clients aren't Oracle databases so Oracle replication isn't an option).
April 14, 2004 - 3:47 pm UTC
I'd like to know how "values" can be useful -- really -- I'm very curious.
what is your (not hypothetical) application of this, what do you actually USE it for?
if you tell me that -- perhaps I can answer the question.
data syncronization screams to me "streams" (which is yet another messaging architecture, pub/sub) or CDC -- change data capture, both of which are features of the database.
Would ORA_ROWSCN help here
Justin Cave, April 14, 2004 - 6:17 pm UTC
It seems to me that the new 10g ORA_ROWSCN feature would help here. From the 10g SQL Reference </code>
http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10759/pseudocolumns006.htm#sthref706 <code>
"For each row, ORA_ROWSCN returns the conservative upper bound system change number (SCN) of the most recent change to the row. This pseudocolumn is useful for determining approximately when a row was last updated. It is not absolutely precise, because Oracle tracks SCNs by transaction committed for the block in which the row resides. You can obtain a more fine-grained approximation of the SCN by creating your tables with row-level dependency tracking."
April 15, 2004 - 8:12 am UTC
only if you build/rebuild all tables with "row dependencies" -- rebuild, not alter -- on.
It is a possibility in 10g.
Here is a demo I have, we'll copy EMP:
scott@ORA10G> create table temp as select * from emp;
Table created.
scott@ORA10G> select ename, ora_rowscn from temp where empno = 7788;
ENAME ORA_ROWSCN
---------- ----------
SCOTT 3476986
by default the ora_rowscn is tracked loosely, at the block, so, we update all of the rows on that block EXCEPT the one of interest:
scott@ORA10G>
scott@ORA10G> begin
2 for x in ( select empno from temp where empno <> 7788 )
3 loop
4 update temp set ename=ename where empno=x.empno;
5 commit;
6 end loop;
7 end;
8 /
PL/SQL procedure successfully completed.
scott@ORA10G>
scott@ORA10G>
scott@ORA10G>
scott@ORA10G> select ename, ora_rowscn from temp where empno = 7788;
ENAME ORA_ROWSCN
---------- ----------
SCOTT 3477030
and its row scn floated up with everyone elses. Rebuild the table with row dependencies (used by replication as an optimization, to limit the changes that need to be propagated)
scott@ORA10G>
scott@ORA10G>
scott@ORA10G>
scott@ORA10G> drop table temp;
Table dropped.
scott@ORA10G> create table temp rowdependencies as select * from emp ;
Table created.
scott@ORA10G>
scott@ORA10G>
scott@ORA10G>
scott@ORA10G> select ename, ora_rowscn from temp where empno = 7788;
ENAME ORA_ROWSCN
---------- ----------
SCOTT 3477044
scott@ORA10G>
scott@ORA10G> begin
2 for x in ( select empno from temp where empno <> 7788 )
3 loop
4 update temp set ename=ename where empno=x.empno;
5 commit;
6 dbms_lock.sleep(1);
7 end loop;
8 end;
9 /
PL/SQL procedure successfully completed.
scott@ORA10G>
scott@ORA10G>
scott@ORA10G>
scott@ORA10G> select ename, ora_rowscn from temp where empno = 7788;
ENAME ORA_ROWSCN
---------- ----------
SCOTT 3477044
and we can see the scn does not (in fact, will not) change for that row unless that row is modified. the reason for the sleep is to show the +/- 3 second rule:
scott@ORA10G>
scott@ORA10G>
scott@ORA10G>
scott@ORA10G> select empno, ora_rowscn, scn_to_timestamp(ora_rowscn)
2 from temp
3 order by empno
4 /
EMPNO ORA_ROWSCN SCN_TO_TIMESTAMP(ORA_ROWSCN)
---------- ---------- ---------------------------------------------------------------------------
7369 3477052 15-APR-04 08.09.37.000000000 AM
7499 3477055 15-APR-04 08.09.37.000000000 AM
7521 3477058 15-APR-04 08.09.37.000000000 AM
7566 3477061 15-APR-04 08.09.37.000000000 AM
7654 3477064 15-APR-04 08.09.40.000000000 AM
7698 3477067 15-APR-04 08.09.40.000000000 AM
7782 3477070 15-APR-04 08.09.40.000000000 AM
7788 3477044 15-APR-04 08.09.34.000000000 AM
7839 3477073 15-APR-04 08.09.43.000000000 AM
7844 3477076 15-APR-04 08.09.43.000000000 AM
7876 3477079 15-APR-04 08.09.43.000000000 AM
7900 3477082 15-APR-04 08.09.46.000000000 AM
7902 3477085 15-APR-04 08.09.46.000000000 AM
7934 3477088 15-APR-04 08.09.46.000000000 AM
14 rows selected.
Thanks Justin!
Graham, April 15, 2004 - 10:44 am UTC
Thanks Justin - I hadn't looked at 10g new features yet but that's exactly the kind of thing we could use, especially with rowdependencies enabled for the table. It's useful for our data synchronization for exactly the same reason that it is for Oracle replication - to get only the rows that have changed. Kudos to Oracle for adding that to 10g! Clearly my need wasn't that unusual, since Oracle built in a solution in 10g.
BTW Tom, I don't think any database is useful unless it has "values" in it! :-) I meant the values for the columns in the rows. Sorry if that wasn't clear.
Thanks Tom for pointing out the 3 sec resolution and 5 day limit for SCN to timestamp conversion. I don't see either of those mentioned in 10g documentation for SCN_TO_TIMESTAMP or TIMESTAMP_TO_SCN. Can you please point me to where those limits are documented?
I hope better resolution and range will be in a future release.
For 8i and 9i I think one could get the equivalent to ora_rowscn with row dependency as follows (using the undocumented commitscn):
- add a column to each table to hold the transaction id.
- add a table (eg. called tx_scn) to hold transaction ids, the start SCN, and userenv('commitscn').
- add an update and insert trigger for each table that sets the transaction id column and, only once per transaction, inserts a row in the tx_scn table.
Then you could select using the the commit transaction id in the where clause by joining each table with the tx_scn table.
Does making a table row-dependent have much of a performance hit? (I didn't see that mentioned in the CREATE TABLE docs, and am not yet set up with 10g to try it out.)
April 15, 2004 - 11:09 am UTC
the change wasn't added for what you speak of (that we call STREAMS and CDC)
it was added for lost update detection in a stateless web environment.
My point about values is -- the data you "synchronize" to is wholly inconsistent -- if you deal with VALUES and not transactionally inconsistent results. So I sync up and get "garbage". Sync again "get good stuff". Sync again get garbage.
You cannot look at "values" you have to look at transactionally consistent SETS.
there probably will never be a better resolution in the future (else you would need an exhaustive list of scn -> timestamp for every transaction. imagine the size of that table eh?)
I will still encourage you to look at streams (a message based pub/sub approach). Over and over I will. You still give no real world use case. I'm interested, I'm curious. What is the specific problem you are trying to solve here.
<quote> sql reference:
SCN_TO_TIMESTAMP takes as an argument a number that evaluates to a system change number (SCN), and returns the approximate timestamp associated with that SCN. The returned value is of TIMESTAMP datatype. This function is useful any time you want to know the timestamp associated with an SCN. For example, it can be used in conjunction with the ORA_ROWSCN pseudocolumn to associate a timestamp with the most recent change to a row.
they punt by using approximate timestamp.
if you try one that is too old:
sys@ORA10G> select scn_to_timestamp( 14004 ) from dual;
select scn_to_timestamp( 14004 ) from dual
*
ERROR at line 1:
ORA-08181: specified number is not a valid system change number
ORA-06512: at "SYS.SCN_TO_TIMESTAMP", line 1
ORA-06512: at line 1
Oops again!
Graham, April 15, 2004 - 10:58 am UTC
(Should review more closely before clicking Submit!)
The second last sentence in my last review should have been:
Then you could select using the the commit SCN in the where clause by joining each table with the tx_scn table.
I suspect the performance hit of making a table row-dependent in 10g is less than the hit of adding a trigger like the one I described.
Oracle streams not an option for us
Graham, April 15, 2004 - 1:46 pm UTC
You said:
Rebuild the table with row dependencies (used by replication as an optimization, to limit the changes that need to be propagated)
And from that I said:
It's useful for our data synchronization for exactly the same reason that it is for Oracle replication - to get only the rows that have changed.
I was just taking your word that it was used, and is useful, for Oracle replication!
I believe that with Oracle we can be guaranteed a transaction-consistent set of values iff we select in a read-only transaction (or use table locks or serializable isolation). But with multiple tables and default Oracle isolation there's no guarantee of consistency. It seems that you think that the chance of inconsistent values is *never* acceptable. If so you must also think that Oracle should only offer read-only or serializable transactions! I'm sure you don't, so I think you are being inconsistent. ;-)
Your complaint about inconsistent values from a select, as I keep trying to tell you, is a standard database problem for which isolation levels were invented. Sure that's an issue, but it can only be completely solved by using serializable isolation. Snapshot isolation has advantages, but it introduces the version skew problem too. You probably know that MySql has snapshot isolation and Microsoft has added it for Yukon (but both of those have row versioning instead of Oracle's block versioning). You can (and probably will) argue against other DBMS implementations for handling isolation, but that's irrelevant for how our application works with Oracle.
Since you asked, our application is a server for synchronizing data between a central database and multiple remote (non-Oracle) databases. It works with 5 flavours of central database, including Oracle and the other leading DBMSes. It predates the stream and change data capture features of Oracle - the first release worked with Oracle 8, which was the then-current version.
Oracle streams might be an option if we were starting from scratch and only concerned with Oracle. But we're not.
AFAIK with streams each subscriber will get all changes (for what they have subscribed to). Ours isn't a server push model, for scalability it is client-pull (by default). If there are 1000 updates to a row between a client's syncs, the second sync only gets one new value for the row. I think with Oracle streams or messaging the client would get 1000 values.
FYI, for flexibity the statements which select data from the central database are under the customer's control. Most choose to use timestamp-based synchronization for most of their tables to just get data that has changed since the last synchronization. I'm looking for a beter way to do that for an Oracle central database (without rearchitecting our application or abandonning other types of central database). I think it's in both of our interests to have our application get optimal performance with an Oracle database as the central database. After all, our customers that use Oracle central databases are your customers too.
Note that our current solution for timestamp-based synchronization with Oracle works quite well, and is used successfully by our mutual customers. The only drawback is possibly reduced performance from the greater window for duplication.
Has anyone tested the performance hit in 10g of enabling row dependencies for a table? (I'm sorry I can't yet.)
April 15, 2004 - 2:26 pm UTC
not to nit pick but I was refering to:
"learly my need wasn't that unusual, since Oracle built in a solution in 10g."
row dependencies was added in 9iR1 -- the new feature in 10g is ora_rowscn -- the new feature added in 10g wasn't in support what you are after, rather it is for lost update detection. polling a table using this technique by many users (or just one user) is going to be as expensive as it gets. If your product was let loose on my database -- looking for "whats different", I'd get upset. I'd rather tell IT whats different - back to streams, back to CDC.
I'm saying the results of a table refresh (as you appear to be doing) should be consistent at the very very very least. And yes, when doing multi-table refreshes (in our venecular they are called snapshot or mv refresh groups) they would be done transactionally consistent -- we do that.
Oracle stored in UNDO a change vector. This is what we version with. It is not a before image of a block, it is a small set of changed bytes.
There are lots of replication products on the market today that accomplish what you are trying to code. Have you considered the use of an off the shelf solution (that generally use logs and such as streams does so as to not kill the database by asking "hey whats new" over and over and over)
Alternative solution
Gary, April 15, 2004 - 9:26 pm UTC
Since the root cause of the issue appears to be the interval between a timestamp recorded on a record on insert/update and it becoming 'visible' after a commit, wouldn't a solution be a routine to update the timestamp post-commit.
User_a updates table_1 at 10:15 (trigger sets flag).
User_b updates table_1 at 10:16 (trigger sets flag).
User_b commits change at 10:17.
User_a commits change at 10:19.
Background routine runs every minute (or however frequently required) and sets the timestamp for flagged records and unflags the record.
So at 10:18 it will set the timestamp for user_b's changes and at 10:20 it sets the timestamp for user_a's changes.
You don't get any missing entries or duplicates.
[Obviously the trigger setting the flag should not fire when updating the timestamp, only the 'real' fields of the table.]
Interesting idea!
Graham, April 16, 2004 - 1:09 pm UTC
Thanks Gary. I appreciate you getting back to the fundamental problem.
Interesting idea, but I think it just shifts the problem. You can still get missing rows if the background process transaction overlaps with the start of the query.
However a tweak fixes that: change the query to pick up flagged rows as well as the timestamp criterion. That introduces duplication, but the duplication window is controllable; it's the time between runs of the background process. I think that solution would only be preferred if you were prepared to run the background process with a shorter inter-run interval than the longest transaction time. Otherwise our current solution has a smaller duplication window.
Now back to Tom's concerns...
>row dependencies was added in 9iR1 -- the new feature in 10g is ora_rowscn -- the new feature added in 10g wasn't in support what you are after, rather it is for lost update detection.
Okay I'll rephrase:
Clearly my need wasn't that unusual, since Oracle added a mechanism to track when rows changed in 9i and a change in 10g exposed it for user queries.
Apparently Oracle replication sends a whole block when just one row has changed unless the table was created with row dependencies. Yuck.
>polling a table using this technique by many users (or just one user) is going to be as expensive as it gets. If your product was let loose on my database -- looking for "whats different", I'd get upset. I'd rather tell IT whats different - back to streams, back to CDC.
It doesn't constantly poll. It only queries in response to a sync request from the remote (which can optionally be initiated by the sync server). Rather than generating extra work when the database is most busy (as replication, streams and CDC would) you can choose when these sync requests happen. And if the database is changing the same rows over and over, you only move the latest values, not each intermediate value (unlike replication, streams and CDC).
It's not intended for tight coupling of a few databases. Replication is better for that. It's intended for lots of remote databases that synchronize less often (eg. every few minutes, hourly, daily). It supports syncing different subsets, so if you need to sync a few tables frequently but others less frequently, you can do that and put less load on your database when it is busy. You also only select what is appropriate for that remote. If data is common for many remotes, you can use a single query for all of them. It's really very flexible and scalable. A recent deployment involves over 20,000 remote users synchronizing with a single central Oracle database. Apparently having that many queries run doesn't "kill the database." (FYI, that customer first tried using an Oracle solution, but switched to ours after extensive comparison benchmarking. As you say in your book, that's the only way to determine which is better for one's particular needs.)
>And yes, when doing multi-table refreshes (in our venecular they are called snapshot or mv refresh groups) they would be done transactionally consistent -- we do that.
You DON'T do that by default (for multiple selects)!
Connected to:
Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
SQL> create table tbl1
2 ( pk int primary key,
3 str varchar(20)
4 );
Table created.
SQL> create table tbl2
2 ( pk int primary key,
3 str varchar(20)
4 );
Table created.
SQL> insert into tbl1 ( pk, str ) values ( 1, 'consistent' );
1 row created.
SQL> insert into tbl2 ( pk, str ) values ( 1, 'consistent' );
1 row created.
SQL> commit;
Commit complete.
SQL> select * from tbl2;
PK STR
---------- --------------------
1 consistent
SQL> select * from tbl1;
PK STR
---------- --------------------
1 consistent
SQL> declare
2 pragma autonomous_transaction;
3 begin
4 update tbl2 set str = 'inconsistent' where pk = 1;
5 commit;
6 end;
7 /
PL/SQL procedure successfully completed.
SQL> select * from tbl2;
PK STR
---------- --------------------
1 inconsistent
So read-consistency for a whole transaction is NOT guaranteed by default for multiple selects! You have to lock tables or use read only or serializable transactions. The consistency issue you keep bringing up is INHERENT TO THE TYPE OF CENTRAL DATABASE AND THE USERS CHOICE OF ISOLATION; IT'S INDEPENDENT OF OUR APPLICATION! Please stop flogging that dead horse when trying to criticize our application.
(Yes I did get access to a 10g setup since I last wrote.)
>Oracle stored in UNDO a change vector. This is what we version with. It is not a before image of a block, it is a small set of changed bytes.
Okay, but unless I'm mistaken the set of bytes corresponds to the changes for a block rather than a row.
Have fun in Toronto! Maybe we'll discuss this stuff in person there.
April 16, 2004 - 3:14 pm UTC
no, row dependencies is not for that at all! It is for parallel propagation, so we can determine that these N transactions can all be propagated *in parallel*.
Nothing but nothing but nothing is done "block level wise" in replication -- NOTHING.
rowdependencies is for parallel propagation -- to allow us to propagate two unrelated transactions that took place in the source database in a different order (perhaps) then they actually took place -- in parallel -- to increase the replication speed.
So, not "yuck" -- but "good", "better".
Replication propagates only the changes (column by column, row by row) that took place.
Streams is non-intrusive, it works on *redo* to publish data *elsewhere*. Polling the database in response to a sync is *expensive*.
Streams = little to no impact on source system
Polling for a sync = lots of work on source system.
Streams can do the last change stuff as well, streams is non-intrusive -- you might want to give the streams guide a quick glance -- it is fairly short.
<quote>
It's not intended for tight coupling of a few databases. Replication is better
for that. It's intended for lots of remote databases that synchronize less often
(eg. every few minutes, hourly, daily).
</quote>
and so unless you use SERIALIZABLE or REPEATABLE READ, there is a high probability that the "synced" data set in read lock databases is inconsistent (that update issue).
Also, that paragraph very succicently describes the streams architecture as well.
We do snapshot groups when told to. Yes it is a choice.
In oracle -- see what happens to concurrency with serializable.
Then, describe what happens to concurrency with serializable in others.
Tell me what would happen to that update in a read lock database vs oracle.
I'm not trying to criticize your application, I'm saying "read consistency to me is best, read locks are not".
-------------------------- added after driving home...... ------------------
something I just thought of....
Your logic to sync must be something like:
T0 - log into source database;
T1 -
T2 - get NEW_DATE/TIME to be used for next sync;
T3 - get all rows in table(s) where timestamp >= DATE/TIME from last sync;
T4 - logout
Ok, you are worried about missing rows in oracle such that they were in flight transactions (modifications taking place during T1 basically)
They would have a timestamp T1, but you would have a timestamp T2 for the next retrieval and hence would "miss them"
If you change T2 to:
select nvl( min(to_date(start_time,'mm/dd/rr hh24:mi:ss')),sysdate)
from v$transaction;
what you'll have is the date time of either
a) right now if no one is doing a transaction (no chance of a missed record) or
b) the timestamp of the last time no one WAS doing an in flight transaction
sure -- it opens the window for dups (but you have that already -- the dup window is not a nano-second or even a second, it it the time it takes for T3 to complete which could be measurable) but it prevents you from missing anything and leaves your code intact.
ops$tkyte@ORA9IR2> select sysdate,
2 nvl( min(to_date(start_time,'mm/dd/rr hh24:mi:ss')),sysdate),
3 count(*)
4 from V$transaction;
SYSDATE NVL(MIN(TO_DATE(STAR COUNT(*)
-------------------- -------------------- ----------
16-apr-2004 18:12:00 16-apr-2004 18:12:00 0
<b>No transactions -> sysdate is the timestamp</b>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> update t set owner = owner where rownum = 1;
1 row updated.
ops$tkyte@ORA9IR2> exec dbms_lock.sleep(5);
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select sysdate,
2 nvl( min(to_date(start_time,'mm/dd/rr hh24:mi:ss')),sysdate),
3 count(*)
4 from V$transaction;
SYSDATE NVL(MIN(TO_DATE(STAR COUNT(*)
-------------------- -------------------- ----------
16-apr-2004 18:12:05 16-apr-2004 18:11:57 1
<b>active transactions -> their start time is the timestamp</b>
So throw out the database and just use a serial log file?
Graham, April 16, 2004 - 7:14 pm UTC
My turn to play Devil's advocate.
<da mode>
So streams are what you would choose to get 20,000 *different* (but overlapping) sets of data from an Oracle database?
If that's the best way, and as low-impact as you assert, why keep all those database blocks around. All you'd need is the log!
</da mode>
Databases, including Oracle's, are very good at locating and retrieving data that meets specified criteria. Why not use the database for what is intended to do? When properly indexed, an empty selection is extremely fast. If selecting the same stuff over and over, caches work wonders.
I can't believe I have to argue the merits of using a database with you!
Querying a database and scanning a log are two different ways to get data. One isn't always better than the other.
AFAIK, Oracle doesn't publish its log format, and the streams interface to the log file didn't exist when our application was designed. (I was wrong before - it first worked with Oracle 7, not 8.)
If you want to keep camparing isolation implementations, don't ignore write skew and related consistency problems peculiar to snapshot isolation. I hope your talk on Tuesday covers those.
Added after reading your addendum:
That's exactly the solution we already use (and have talked about here several times)! As I've said, it scales very well (we only select the min start_time from V$TRANSACTION once per sync) and the window for duplication is the longest transaction.
That's still the best solution we have so far with timestamps, though Gary's idea (with tweak) might be better if very long transactions were common.
I think an SCN based solution could be better. Maybe you can look back at those ideas if you are bored while travelling. (The tx_scn table could be trimmed of "orphan" rows occassionally if its size became an issue. That way it could never be bigger than the total number of rows for the synchronized tables, and would likely be much, much less.)
BTW, like most servers we use connection pooling instead of making a new connection to the database each sync request.
April 16, 2004 - 7:34 pm UTC
well, then i guess you always had the optimal answer from the get go.
Great thread...
Jon, April 17, 2004 - 5:34 am UTC
Very interesting, fun almost. Like Shakespeare and Redo, things always go full circle ;-)
April 17, 2004 - 9:58 am UTC
yes, this got so long, and ran so long -- that i keep forgetting what was already visited.
What about this...
Jon, April 17, 2004 - 6:48 am UTC
Graham,
Have you tried this approach.
Earlier, Gabriel posted that "In essence you have to return <Latest ResultSet> Minus <Saved ResultSet>".
You expressed concerns about performance and storage. Well the DB stores the Latest and Previous in the RBS (whether via change vectors or not is not important), which we can access via flashback queries.
Why not record the current SCN at each sync request, instead of a timestamp (dbms_flashback.get_system_change_number will do that for you), and then use a flashback query to access you previous table state. I mocked up an example with 100,000 rows, and it worked just fine:
select id, to_char(change_ts,'HH24:MI:SS.FF') change_ts, change_scn
from x
minus
select id, to_char(change_ts,'HH24:MI:SS.FF') change_ts, change_scn
from x as of scn 661860;
ID CHANGE_TS CHANGE_SCN
---------- ------------------ ----------
100001 20:18:47.383000 661834
If you're worried about performance, then possibly you could try recording the timestamp (systimestamp) as well as the SCN, and include:
where change_ts >= (timestamp - time of longest running txn - safety factor)
in each of the queries to reduce the result set to a reasonable size before the exact minus on SCN.
Haven't tried that bit, just thinking out loud - but seems it would work.
An after thought...
Jon, April 17, 2004 - 7:06 am UTC
So, to ensure no overlaps or missed data, your sync queries could look like:
create or replace procedure get_x_delta(
p_old_scn in number,
p_new_scn out number,
p_rc out sys_refcursor)
is
begin
p_new_scn := dbms_flashback.get_system_change_number;
open p_rc for
select * from x as of scn p_new_scn
minus
select * from x as of scn p_old_scn+1;
end;
plus a few tweaks for performance.
This way, you allow the read-consistency model to work for you.
April 17, 2004 - 10:07 am UTC
I will say that Graham has (had) the right answer from the get go.
His code already has to handle dups (no avoid that).
The time of the oldest transaction is going to be near "now".
They never had "just a millisecond of overlap dup window" -- it's always been much larger than that (due to the fact that you have to get timestamp AND THEN sync -- everything you sync, EVERYTHING, could be duplicated on the next sync)
So, using the nvl(min(start_time),sysdate) from v$transaction is the most efficient method for the simple reason that I have to assume most of his queries are using existing indexes to identify subsets (eg: he is not having 20,000 people FULL SCAN the database on recurring basis, they are pulling a small slice of the data -- massive horizontal slicing and dicing going on there)
That would, to me, rule out anything more sophisticated -- the min() is more than good enough -- more than good enough (nothing is going to let you see data you cannot see, you are not going to get the in-flight data this time around no matter what, and if they used a read only transaction, heck -- they would get very very very few dups indeed -- if any.
The flashback query would be something I would not want to suggest/rely on. For one -- it is expensive to flash an entire table back like that (and anything expensive done 20,000 times is truly expensive) and then minus it. Also, in 9i undo_retention is a REQUEST, not a demand (so you might not be able to flashback in all cases -- in 10g you can make it a demand but even so, the further in the past you have to flashback to -- the more expensive it becomes, so the people that sync every day would be truly "expensive"
So, I still think v$transaction and start_time is more than sufficient as transactions are typically pretty short. If you wanted to truly maximize the start_time (get as big of one as possible), they could join v$transaction and v$lock to see what the start_time for an object they are *interested* in is. Eg: if you are syncing t1 and t2 but no one is using t1/t2 -- sysdate is your time. V$LOCK would give you that.
Hope this has helped others as well!
Graham, April 17, 2004 - 2:24 pm UTC
Thanks Tom and Jon! Interesting idea about joining with V$LOCK. Some customers might want to do that if they have long transactions on tables that are not synchronized.
Tom is right that polling the database does put a load on it. In cases where that extra load is a problem, it might well be best for our mutual customers to use streams or CDC to populate a staging database, and have our sync server connect to that staging database instead. My guess is that streams would be better (since CDC's change capture triggers will be firing most when the database is most busy).
He's also right that good performance usually requires that the sync'd tables be indexed properly for the "what's changed?" queries. We're no different than any other database application in that regard - our app works better when the queries it issues run quickly.
In a couple of my earlier posts I misstated the "window" for duplication (though I think I got it right last time). Here's what I think it is for read-blocking databases and the best-so-far Oracle solution:
- For a read-blocking DBMS, it is the greater of the transaction time for the "what's changed" queries or the timestamp resolution.
- For Oracle, using nvl(min(start_time),sysdate) from v$transaction in a read-only transaction, the window is that_value_for_sysdate - min(start_time), and the largest possible value for that is longest transaction time. So the window is the greater of the longest transaction time or the timestamp resolution.
- Without using a read-only transaction in Oracle, as Tom pointed out, the window is larger (assuming multiple tables). It is the longest transaction time plus the transaction time for the "what's changed" queries, or the timestamp resolution if that is greater.
FYI, the only way that everything would be duplicated is if it were all inserted/updated in the duplication window. Our application could handle that, but it's very unlikely to happen.
Using SCNs instead of timestamps could probably eliminate any duplication. I'm loathe to recommend that our customers use an undocumented feature like userenv('commitscn'), so I'll stick with recommending our current timestamp solution (with read-only transactions if possible) for 8i and 9i. However in 10g the combination of ora_rowscn and row dependency might remove the need for timestamp maintaining triggers, but I'd have to investigate how "approximate" the SCN from ora_rowscn is and confirm that the ora_rowscn psuedocolumn can be indexed. (No access to 10g right now.) Not needing triggers would be simpler and should reduce load on the central database.
Unfortunately recent family commitmittents prevent me from going to the IOUG just down the highway in Toronto. FYI, there will be a lot of people glued to TV sets in Toronto on Sunday night watching hockey, and you'll see a lot of celebrating if the Toronto Maple Leafs win the series then or in a 7th game midweek!
April 17, 2004 - 2:46 pm UTC
...
FYI, the only way that everything would be duplicated is if it were all
inserted/updated in the duplication window. Our application could handle that,
but it's very unlikely to happen.
.....
:) concurr (i was wondering if you were going to point about the "low probability of occurence of that case).....
The SCN with rowdependencies is accurate -- it is the resolution of the timestamp that is approximate.
I concurr 100% with your decision on userenv('commitscn').
keep in mind the difference between snapshot and event
Jaromir D.B. Nemec, April 17, 2004 - 6:55 pm UTC
Hi Tom,
I agree with all your argumentation in the follow ups above, except for one point Graham is definitively not alone.
Let me take it into the right context. From the point of view of Oracle streams and CDC are state of the art. But at least in the projects I have an overview the typical database is now slowly migrating to 9i and the mentality of the application (i.e. the crucial features in use) is somewhere between 7.3 and 8.0.
So dont wonder that such any change please? solutions exists.
But I 100% agree with you, that this approach is definitively not optimal. So people using it
a) must be happy with it, or
b) having problems they must accept, that the cause is not the database or anything else but even this approach.
Ill try to demonstrate this pointing out one aspect that I missed in the discussion above.
There are two type of data stored in database: state data (e.g. customer table) and transaction data (e.g. booking entries).
The inherit problem of the whats new? approach is the access of the state data to get the transaction information. Of course you may try to select the customer table balance information and check if something was changed since last check. But the customer table is definitive a bad target. To trace the customer account movement the right target is the account table (i.e. a transaction table with the booking entries).
You may more or less optimise the snapshot table (index on change date column etc.) to derive the transaction information (yes, it is a true derivation) but it is always a workaround.
The proper approach to get the transaction data is to USE the transaction data. The decision how the transaction data is implemented - if you use a database table (and the application is responsible to fill it) or some DB feature as triggers, streams or CDC is only an implementation decision.
The task of getting an event out from a snapshot table will be always a hard work. In other words, you can push or pull events as you like, but it is not comparable with polling for events out of a snapshot.
A short comment to the topic of 1000 updates and one sync only: There are two extreme situations. You have a very small snapshot with tons of changes. It could be acceptable to sync via refreshing the total snapshot to limit the volume of data transfer.
If the snapshot is very big with lots of changes, you can filter or aggregate the events to limit the communication.
At the end of my dispute a question on CDC.
As I know CDC is based on redo log not on triggers as mentioned above. The same concept, I assume, is used in streams. What are the criteria to choose CDC or streams to capture the events?
Thx
Jaromir D.B. Nemec
</code>
http://www.db-nemec.com <code>
April 17, 2004 - 6:59 pm UTC
CDC in 9i is trigger based.
ASYNC CDC in 10g is streams based. SYNC CDC in 10g is trigger based.
(i would always go for streams based, background, non-intrusive, async)
Jon, April 18, 2004 - 1:19 am UTC
I take your point about flashback being a suggestion - but considering the Oracle documentation suggest all sorts of application uses for flashback queries (reports, replacing manual history tables in some situations, etc...), one would hope that it is a very strong suggestion. I practice, I have never come across a situation where a request hasn't be satisfied within the UNDO retention period.
Nevertheless, you missed by point about performance - I was not suggesting full scans at all, rather augmenting this with a timestamp, (which would be approximate, and include a safety factor) to limit the rows being flash'd back. I tested this out, using a large table. From the "syncing" session, I get:
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.5.0 - Production
SQL> set timing on
SQL> select count(*) from big_table;
COUNT(*)
----------
2759801
Elapsed: 00:00:01.04
SQL> select *
2 from big_table as of scn 4303140821778
3 where change_ts >= to_date('18/04/2004 14:59:07','DD/MM/YYYY HH24:MI:SS')-1/60/24
4 minus
5 select *
6 from big_table as of scn 4303140821621
7 where change_ts >= to_date('18/04/2004 14:59:07','DD/MM/YYYY HH24:MI:SS')-1/60/24;
OBJECT_ID OBJECT_NAME CHANGE_TS
---------- ------------------------------ ---------
-1 Test 18/APR/04
Elapsed: 00:00:00.01
That seem efficient to me, elegant, and, unless my logic is off somewhere, 100% no dups. If I'm wrong, please show me, but I can't see how that wouldn't work well. And with no duplicates, no work is required at the client side.
April 18, 2004 - 9:29 am UTC
refreshes are minutes, hours or ===>> day(s) <<===
My major qualms are:
o the undo is not gauranteed and you cannot make it so. I might be there in 5 minutes, 5 hours or 5days from now. Then again -- it might not. (just takes one whopping big transaction to make it go "poof". With no way to get it back again. This very item renders this solution not reasonable in 9i.
o flashing back gets more expensive every minute. Flashing back for point in time reports is great (basically what READ ONLY did/does -- but you get to pick the point in time at which to run the report as of). Every transaction however makes flashing further and further back in time more expensive.
I didn't not say it would not work.
Interesting
Jon, April 18, 2004 - 9:50 am UTC
Interesting, in my experience with using flash back, going back 7 days doesn't have any noticeable performance impact when using keyed/index reads, even on a busy system. A 1000 row delta isn't huge.
As for undo, doesn't correctly sizing (or autoextending) the undo tablespace mitigate this risk?
...
I didn't not say it would not work.
...
Triple negatives. Hmmm, I'm still trying to decode that (its late at night in Sydney - not too sharp at the moment)
But what you did say was:
...
His code already has to handle dups (no avoid that).
...
Well I think this is a no dups solution. Less efficient in the DB, but in totality (depending on client work de-duping) *maybe* better.
April 18, 2004 - 10:27 am UTC
but you cannot flash back 7days. 5 is the limit (unless you shutdown for 2 of the days -- it is 5 days of uptime, or you manually updated a dictionary table)
and to have 7 days of undo *online*? I'm not aware of any systems personally that have that.
A 1,000 row delta on a single row will incurr typically 1,000 lios instead of one (unless you are lucky and the version of the block you need is in the buffer cache, but not likely). Now, do that for 100 changed rows, 20,000 times.
the triple negative was wrong "i did not say it would not work" :)
Heavy on concepts ... but intriguing nonetheless!
Gabriel, April 19, 2004 - 9:38 am UTC
Graham,
One question
you said few times we can handle duplicates (at the client I assume) if they arise. Out of curiosity
how are they handled?
Cheers.
Handling duplicates
Graham, April 19, 2004 - 4:10 pm UTC
Duplicates are indeed handled by the clients, which are databases. We use "upserts" to apply the values (i.e. insert if PK doesn't exist, otherwise update). So a duplicate becomes a redundant update.
FYI, the remote databases we support have built-in upsert support (as does Oracle 9i via MERGE) and they either don't have triggers or won't fire a row-level update trigger if the row is unchanged by the update (unlike Oracle AFAIK, but please let's not launch into a big discussion of that here).
Thanks! From concept to simple example on the trailing edge...
Duke Ganote, April 21, 2004 - 11:13 pm UTC
Tom-- Graham had some excellent questions, and this provoked me to construct a trailing-edge (8.1.7.4-based), simple example. I make no claims about scalability; this is more for conceptual understanding -- suppose there's a table T with some data and we want to track what has changed on T since our last query. Of course, T's LAST_MODIFIED datetime doesn't show the "effective" or transaction commitment timestamp for a row change, rather it's the "clock time" when the user made the row change during the transaction. For brevity, I first show the script setting up the example, then example sessions alternating between 2 users: DGANOTE who makes changes, and EOM who reads them.
In retrospect, I suppose I could add an "approximate SCN timestamp" column to TRANS_INFO and use a background job to populate that column in order to map SCN to datetime.
***** SCN.SQL ****
CREATE TABLE t ( some_data varchar2(10), last_modified date default sysdate,
trans_id varchar2(30) ) tablespace dw1_data_small
/
CREATE TABLE trans_info
(trans_id VARCHAR2(25) NOT NULL,
trans_scn NUMBER,
trans_user_id VARCHAR2(30),
trans_inittn_tms DATE
,
CONSTRAINT PK_TRANS_INFO
PRIMARY KEY (TRANS_ID)
USING INDEX)
/
CREATE OR REPLACE PACKAGE pkge_trans_info
IS
PROCEDURE sp_set_trans_info (curr_trans_id VARCHAR2);
END pkge_trans_info;
/
CREATE OR REPLACE PACKAGE BODY pkge_trans_info
IS
lcl_trans_id VARCHAR2 (25); -- indicates SCN already called once for transaction
lcl_user_id VARCHAR2 (30); -- caches USER in PGA once per session
PROCEDURE sp_set_trans_info (curr_trans_id VARCHAR2) AS
BEGIN
-- only one call for USERENV ('commitscn') per transaction
IF lcl_trans_id != curr_trans_id
OR lcl_trans_id IS NULL
THEN
INSERT INTO trans_info
VALUES (
curr_trans_id, USERENV ('commitscn')
, lcl_user_id, SYSDATE
);
lcl_trans_id := curr_trans_id;
END IF;
END sp_set_trans_info;
BEGIN
-- who is user?
lcl_user_id := USER;
END pkge_trans_info;
/
CREATE OR REPLACE TRIGGER tuidar_t
BEFORE INSERT OR UPDATE OR DELETE ON t
FOR EACH ROW
BEGIN
:new.trans_id := DBMS_TRANSACTION.local_transaction_id;
pkge_trans_info.sp_set_trans_info (:new.trans_id);
END tuidar_t;
/
GRANT SELECT ON T TO EOM
/
GRANT SELECT ON TRANS_INFO TO EOM
/
***** THE SESSIONS ****
DW1D\dganote> @scn
DW1D\dganote> insert into t ( some_data ) values ( 'Hello' );
1 row created.
DW1D\dganote> insert into t ( some_data ) values ( 'World' );
1 row created.
DW1D\dganote> column trans_scn format 99999999999999999999;
DW1D\dganote> ed
Wrote file afiedt.buf
1 SELECT t.*
2 , TO_CHAR (t.last_modified, 'HH:MI:SS') tms
3 , ti.trans_scn
4 FROM dganote.t t
5 , dganote.trans_info ti
6* WHERE t.trans_id = ti.trans_id
DW1D\dganote> /
SOME_DATA LAST_MODI TRANS_ID TMS TRANS_SCN
---------- --------- ------------------------------ -------- ---------------------
Hello 21-APR-04 6.92.276048 04:19:38 7728900791301 -- "provisional" SCN!
World 21-APR-04 6.92.276048 04:19:44 7728900791301
DW1D\dganote> commit;
Commit complete.
DW1D\dganote> ed
Wrote file afiedt.buf
1 SELECT t.*
2 , TO_CHAR (t.last_modified, 'HH:MI:SS') tms
3 , ti.trans_scn
4 FROM dganote.t t
5 , dganote.trans_info ti
6* WHERE t.trans_id = ti.trans_id
DW1D\dganote> /
SOME_DATA LAST_MODI TRANS_ID TMS TRANS_SCN
---------- --------- ------------------------------ -------- ---------------------
Hello 21-APR-04 6.92.276048 04:19:38 7728900791303 -- true SCN! -----------
World 21-APR-04 6.92.276048 04:19:44 7728900791303
DW1D\dganote> @connect eom@dw1d;
DW1D\eom> column trans_scn format 99999999999999999999;
DW1D\eom> ed
Wrote file afiedt.buf
1 SELECT t.*
2 , TO_CHAR (t.last_modified, 'HH:MI:SS') tms
3 , ti.trans_scn
4 FROM dganote.t t
5 , dganote.trans_info ti
6* WHERE t.trans_id = ti.trans_id
DW1D\eom> /
SOME_DATA LAST_MODI TRANS_ID TMS TRANS_SCN
---------- --------- ------------------------------ -------- ---------------------
Hello 21-APR-04 6.92.276048 04:19:38 7728900791303 -- others see only true/committed SCN
World 21-APR-04 6.92.276048 04:19:44 7728900791303
DW1D\eom> @connect dganote@dw1d;
DW1D\dganote> insert into t ( some_data ) values ( 'Goodbye' );
1 row created.
DW1D\dganote> commit;
Commit complete.
DW1D\dganote> insert into t ( some_data ) values ( 'Y''all' );
1 row created.
DW1D\dganote> commit;
Commit complete.
DW1D\dganote> @connect eom@dw1d;
DW1D\eom> ed
Wrote file afiedt.buf
1 SELECT t.*
2 , TO_CHAR (t.last_modified, 'HH:MI:SS') tms
3 , ti.trans_scn
4 FROM dganote.t t
5 , dganote.trans_info ti
6* WHERE t.trans_id = ti.trans_id and ti.trans_scn > 7728900791303 -- what changed since the last commit?
DW1D\eom> /
SOME_DATA LAST_MODI TRANS_ID TMS TRANS_SCN
---------- --------- ------------------------------ -------- ---------------------
Goodbye 21-APR-04 9.64.268092 04:24:41 7728900791767
Y'all 21-APR-04 10.42.262796 04:24:53 7728900791770
April 22, 2004 - 7:28 am UTC
we discussed mapping the transaction id to the single UNDOCUMENTED, TOTALLY "magic happening here" commitscn and tagging each record with the transaction id.
rejected for the good concern that commitscn is not documented.... that and this is "hard".
using the time from v$transaction is so simple and the probabily of a dup is so near zilch (and their code handles it already) -- seems to me to be the only thing to even consider.
from trailing edge to mainstream...
Duke Ganote, April 22, 2004 - 11:26 pm UTC
Tom-- Ah, but after upgrading from 8i to 9i, the demo still seems to work by swapping the undocumented USERENV('commitscn') function for the documented DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER.
April 23, 2004 - 11:19 am UTC
no, it won't.
that gets the SCN as of *right now*. Not the ending transaction SCN (commit scn) which is what you need.
ops$tkyte@ORA9IR2> create table t ( x number );
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t values ( userenv('commitscn') );
1 row created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select x, dbms_flashback.get_system_change_number from t;
X GET_SYSTEM_CHANGE_NUMBER
---------- ------------------------
24758134 24758134
ops$tkyte@ORA9IR2> exec dbms_lock.sleep(10);
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2> commit;
Commit complete.
ops$tkyte@ORA9IR2> select x, dbms_flashback.get_system_change_number from t;
X GET_SYSTEM_CHANGE_NUMBER
---------- ------------------------
24758136 24758138
userenv('commitscn') is pure magic. it is a value that is magically updated with YOUR transactions commitscn upon commit.
Pre-commitment and post magic
Duke Ganote, April 23, 2004 - 12:29 pm UTC
Tom-- Thank you! So just as T's LAST_MODIFIED datetime shows the "clock time" when the user made the row change during the transaction, DBMS_FLASHBACK.get_system_change_number shows the current SCN at the time the DML is performed.
Hmm. One mechanism that roughly approximates USERENV('commitscn') would be to leave TRANS_SCN null during the insert, and have an background job running like this:
DBMS_JOB.SUBMIT(job#,'UPDATE trans_info SET trans_scn = test_seq.nextval WHERE trans_scn IS NULL;',sysdate,'sysdate+1/24/60/60')
which would "see" and periodically update only committed transactions. It only approximates USERENV('commitscn') because multiple transactions could be tagged with the same approximate SCN.
PS, that is a nice spellcheck for the reviews!
dbms_flashback.get_system_change_number vs. commitscn
Graham, April 23, 2004 - 12:34 pm UTC
Thanks Duke, you implemented basically what I was thinking for an SCN based scheme using userenv('commitscn'). Doesn't solve the issue of precisely mapping timestamps to SCNs, but could be used instead of timestamps if one was prepared to use the undocumented feature.
As Tom points out, Dbms_flashback.get_system_change_number() does not have the same magic behaviour as userenv('commitscn') - it's just the SCN that is current at the time that you call it. However 10g's ora_rowscn (with rowdependencies enabled) does have the commit-time SCN for that row. (Without rowdependencies I think it has the commit-time SCN for most-recently-committed row in that block.)
You can see the above points with the following script:
create table t (
tx_id varchar(25) primary key,
commit_scn number,
flashback_scn number
) rowdependencies;
insert into t ( tx_id, commit_scn, flashback_scn )
values ( dbms_transaction.local_transaction_id,
userenv('commitscn'),
dbms_flashback.get_system_change_number() );
select t.*, ora_rowscn from t;
declare
pragma autonomous_transaction;
begin
insert into t ( tx_id, commit_scn, flashback_scn )
values ( dbms_transaction.local_transaction_id,
userenv('commitscn'),
dbms_flashback.get_system_change_number() );
commit;
end;
/
commit;
select t.*, ora_rowscn from t;
This gives the following output (using 10g's web version of iSQL*Plus):
Table created.
1 row created.
TX_ID COMMIT_SCN FLASHBACK_SCN ORA_ROWSCN
2.36.1100 731651 731651
PL/SQL procedure successfully completed.
Commit complete.
TX_ID COMMIT_SCN FLASHBACK_SCN ORA_ROWSCN
2.36.1100 731653 731651 731654
1.10.1105 731651 731651 731652
(Note that the magic final value for userenv('commitscn') is one less than the "true" commit scn, as others have noticed and posted elsewhere.)
Personally, I doubt that Oracle will remove or change userenv('commitscn') as long as their replication relies on it (or if any important customers rely on it). However the 10g ora_rowscn feature is much, much better if one wants an SCN-based solution instead of a timestamp-based solution.
But for most of our customers I think the oldest-transactions solution will be best.
Oracle documents commitment to SCN
Duke Ganote, April 24, 2004 - 11:50 pm UTC
Graham-- Thanks for the 10g-based example (also thanks to Justin Cave and Tom). Now I can see concisely how Oracle exposed this functionality as a documented feature.
In trying to think of sub-10g, documented-feature workarounds, I toyed with changing my background job (if using 9i) to
DBMS_JOB.SUBMIT(job#,'UPDATE trans_info SET trans_scn = dbms_flashback.get_system_change_number WHERE
trans_scn IS NULL;',sysdate,'sysdate+1/24/60/60')
in order to better approximate the SCN. I'm also exploring use of DBMS_LOGMNR in the background job, since all that's needed is the marriage between SCN and transaction_id, which V$LOGMNR_CONTENTS has. However, I'm sure neither of the latency that LOGMNR would require, nor how much DBA involvement. Chasing that rabbit a bit further, I was surprised to see that SQL_REDO and SQL_UNDO don't use bind variables (p. 1118 of Tom's Expert 1-on-1). And while I'm on that tangent, I confess I find the term 'bind variables' befuddling -- by contrast, the Java documentation on PreparedStatement refers to 'parametric SQL', a term that I find easier to grasp conceptually: simply put, I write SQL statements that take parameters, much like I would write procedures that take parameters. Voila!.
And I also get to explore Streams per Tom's suggestion. Ah well, the adventure continues, to learn more of the capabilities that this DBMS has to offer! :))
Really?
Jon, April 25, 2004 - 9:19 am UTC
Been a busy week; haven't a chance to reply to your last comment. I don't want to flog a dead horse, but I think there is life in this one yet.
You said:
<Quote>
but you cannot flash back 7days. 5 is the limit (unless you shutdown for 2 of
the days -- it is 5 days of uptime, or you manually updated a dictionary table)
</Quote>
Well I think you're mistaken. Of course you can flashback more than 5 days - you just can't flashback using a timestamp more than 5 days. Using SCN, which is what I'm doing, you can flashback as long as you want to. All that's needed is to keep a record of the SCN at the time of the event that interests you. For Graham's problem, it is at the moment of each sync request.
Using this approach removes the whole problem of timestamp => SCN conversion. There is no need for userenv('commitscn') or similar. You change the query from "give me everything from time a to time b" - which is a continuous, approximate measure by definition, to "give me everything from SCN A+1 to SCN B". This uses the precise, quanta like properties of SCN - the DBs heatbeat if you like.
<Quote>
and to have 7 days of undo *online*? I'm not aware of any systems personally that have that.
</Quote>
Well now you do! :-) We frequently go back more than seven days. As you have said many times yourself, disk is cheap. What's 10G of undo?
<Quote>
A 1,000 row delta on a single row will incurr typically 1,000 lios instead of
one (unless you are lucky and the version of the block you need is in the buffer
cache, but not likely). Now, do that for 100 changed rows, 20,000 times.
</Quote>
This is where things get interesting. You maintain that to flashback 1000 changes requires 1000 LIOs. Well I don't think Undo works that way. My understanding is that Oracle maintains a transaction table, that enables a SCN to block translation. So, during normal operation, when an old block is required for read consistency, it looks up the transaction table to find out where this block is within the rollback segments. So 1 LIO become 3 LIOs (initial block read, transaction table read, RBS read) - regardless of how many changes have been made.
If this was not the case, then a long running query over a busy table would take longer to materialise blocks near the end of the query than at the beginning. This would limit the effectiveness of multi-versioning altogether.
From the Concepts manual:
...
Among other information, a rollback entry includes block information (the file number and block ID corresponding to the data that was changed) and the data as it existed before an operation in a transaction.
For each rollback segment, Oracle maintains a transaction table--a list of all transactions that use the associated rollback segment and the rollback entries for each change performed by these transactions. Oracle uses the rollback entries in a rollback segment to perform a transaction rollback and to create read-consistent results for queries.
...
So I've done a test to prove this:
SQL> set linesize 200
SQL>
SQL> set timing on
SQL>
SQL> set serveroutput on
SQL>
SQL> alter session set nls_date_format='DD-MON-RR HH24:MI:SS';
Session altered.
Elapsed: 00:00:00.00
SQL>
SQL> column pk format 9999999999
SQL>
SQL> column change_date format a20
SQL>
SQL> select count(*) from big_table;
COUNT(*)
----------
30765000
Elapsed: 00:00:38.01
(Was suppose to be 3 million, but I got a bit carried away...)
These are the rows 101 rows we'll update:
SQL>
SQL> select * from big_table
2 where pk between 6179200 and 6179300
3 and rownum < 10;
PK SOME_DATA CHANGE_DATE CHANGE_VERSION CHANGE_SCN
----------- ------------------------------ -------------------- -------------- ----------
6179200 DIRECT_DEBIT_TYPE_PK 24-APR-04 15:04:03 1 700815
6179201 DIRECT_DEBIT_TYPE_UK1 24-APR-04 15:04:03 1 700815
6179202 EFEED 24-APR-04 15:04:03 1 700815
6179203 EFEED 24-APR-04 15:04:03 1 700815
6179204 EFEED_E01_IMPORT 24-APR-04 15:04:03 1 700815
6179205 EFEED_EMPIRE_MAP 24-APR-04 15:04:03 1 700815
6179206 EFEED_EMPIRE_MAP_PK 24-APR-04 15:04:03 1 700815
6179207 EFEED_FIELD 24-APR-04 15:04:03 1 700815
6179208 EFEED_FIELD_PK 24-APR-04 15:04:03 1 700815
9 rows selected.
Elapsed: 00:00:00.01
How much redo do we have at the moment:
SQL>
SQL> select
2 (select round(sum(u.undoblks)*8192/1024/1024,2)
3 from v$undostat u) used_mb,
4 (select round(sum(e.bytes)/1024/1024,2)
5 from dba_extents e
6 where e.tablespace_name = 'UNDOTBS1') alloc_mb
7 from dual;
USED_MB ALLOC_MB
---------- ----------
230.35 233.23
Elapsed: 00:00:00.08
Now, lets generate heaps of undo. We'll update these 100 rows 10,000 times, and record the SCN and sysdate at the beginning, middle and end.
SQL> begin
2 commit;
3 dbms_output.put_line('T0: sysdate='||to_char(sysdate,'HH24:MI:SS')||', SCN='||dbms_flashback.get_system_change_number);
4 for i in 1..5000 loop
5 dbms_application_info.set_client_info('i='||i);
6 update big_table
7 set some_data = substr(some_data,2)||substr(some_data,1,1),
8 change_date = sysdate,
9 change_version = change_version+1,
10 change_scn = dbms_flashback.get_system_change_number
11 where pk between 6179200 and 6179300;
12 commit;
13 end loop;
14 dbms_lock.sleep(60);
15 dbms_output.put_line('T1: sysdate='||to_char(sysdate,'HH24:MI:SS')||', SCN='||dbms_flashback.get_system_change_number);
16 for i in 1..5000 loop
17 dbms_application_info.set_client_info('i='||i);
18 update big_table
19 set some_data = substr(some_data,2)||substr(some_data,1,1),
20 change_date = sysdate,
21 change_version = change_version+1,
22 change_scn = dbms_flashback.get_system_change_number
23 where pk between 6179200 and 6179300;
24 commit;
25 end loop;
26 dbms_output.put_line('T2: sysdate='||to_char(sysdate,'HH24:MI:SS')||', SCN='||dbms_flashback.get_system_change_number);
27 end;
28 /
T0: sysdate=22:37:31, SCN=805637
T1: sysdate=22:39:04, SCN=811729
T2: sysdate=22:39:43, SCN=817778
PL/SQL procedure successfully completed.
Elapsed: 00:02:05.02
Lets see how much Undo we have now:
SQL> select
2 (select round(sum(u.undoblks)*8192/1024/1024,2)
3 from v$undostat u) used_mb,
4 (select round(sum(e.bytes)/1024/1024,2)
5 from dba_extents e
6 where e.tablespace_name = 'UNDOTBS1') alloc_mb
7 from dual;
USED_MB ALLOC_MB
---------- ----------
387.77 420.23
Elapsed: 00:00:02.04
So we've added about 200M of Undo - and since this is my laptop I know it is all mine.
Now, lets run the sync query, getting all changes between T0 and T2. I've included a safety factor of 2 minutes in the change_date predicate, for any transactions that may have been running during the T0 sync request.
SQL> select *
2 from big_table as of scn 817778
3 where change_date > to_date('25-APR-2004 22:35:31','DD-MON-YYYY HH24:MI:SS')
4 minus
5 select *
6 from big_table as of scn 805637
7 where change_date > to_date('25-APR-2004 22:35:31','DD-MON-YYYY HH24:MI:SS');
PK SOME_DATA CHANGE_DATE CHANGE_VERSION CHANGE_SCN
----------- ------------------------------ -------------------- -------------- ----------
6179200 DIRECT_DEBIT_TYPE_PK 25-APR-04 22:39:43 10001 817777
6179201 EBIT_TYPE_UK1DIRECT_D 25-APR-04 22:39:43 10001 817777
6179202 EFEED 25-APR-04 22:39:43 10001 817777
6179203 EFEED 25-APR-04 22:39:43 10001 817777
6179204 EFEED_E01_IMPORT 25-APR-04 22:39:43 10001 817777
6179205 EFEED_EMPIRE_MAP 25-APR-04 22:39:43 10001 817777
6179206 _MAP_PKEFEED_EMPIRE 25-APR-04 22:39:43 10001 817777
6179207 EED_FIELDEF 25-APR-04 22:39:43 10001 817777
6179208 ELD_PKEFEED_FI 25-APR-04 22:39:43 10001 817777
6179209 _FIELD_UK1EFEED 25-APR-04 22:39:43 10001 817777
6179210 EED_RAW32EF 25-APR-04 22:39:43 10001 817777
...
101 rows selected.
Elapsed: 00:00:00.08
SQL>
And you get similar results if you query between T0 & T1 or T1 & T2 etc.
So that's 100*10,000 changes - I don't think Oracle is doing 1 million LIOs in 8/100ths of a second.
In fact if we compare the stats on the flashback queries:
SQL> set autotrace traceonly
SQL> select *
2 from big_table as of scn 805637
3 where change_date > to_date('25-APR-2004 22:35:31','DD-MON-YYYY HH24:MI:SS');
Elapsed: 00:00:00.03
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=284 Card=61234 Bytes
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE' (Cost=284 Car
2 1 INDEX (RANGE SCAN) OF 'BIG_TABLE_IX1' (NON-UNIQUE) (Cost
Statistics
----------------------------------------------------------
3 recursive calls
0 db block gets
36 consistent gets
0 physical reads
0 redo size
4159 bytes sent via SQL*Net to client
565 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
101 rows processed
with the "normal" queries:
SQL> select *
2 from big_table
3 where change_date > to_date('25-APR-2004 22:35:31','DD-MON-YYYY HH24:MI:SS');
101 rows selected.
Elapsed: 00:00:00.03
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=284 Card=61234 Bytes
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE' (Cost=284 Car
2 1 INDEX (RANGE SCAN) OF 'BIG_TABLE_IX1' (NON-UNIQUE) (Cost
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
18 consistent gets
0 physical reads
0 redo size
4159 bytes sent via SQL*Net to client
565 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
101 rows processed
There is only double the LIOs for the flashback query. I would guess that because many of the updated rows would be on the same block, the queries would be getting several changes for the price of one.
However it is clear that 18 LIOs don't become 18*10,000 LIOs when using flashback as you were suggesting in you previous post.
So your concerns about performance don't seem correct. Maybe my test is missing something, and if I'm wrong I'll happily capitualate - but you'll need to prove it.
But, if I'm right, well... I expect a large carton of beer to arrive at my door within 48 hrs :-)
April 26, 2004 - 6:04 am UTC
the flashback by scn won't work over 5 days of uptime either!
sys@ORA9IR2> column S new_val S
sys@ORA9IR2>
sys@ORA9IR2> select min(SCN_BAS) S from SMON_SCN_TIME
2 /
S
----------
616905
sys@ORA9IR2>
sys@ORA9IR2> select * from sys.source$ as of scn &S where rownum = 1
2 /
old 1: select * from sys.source$ as of scn &S where rownum = 1
new 1: select * from sys.source$ as of scn 616905 where rownum = 1
OBJ# LINE
---------- ----------
SOURCE
------------------------------------------------------------------------
647 1
package STANDARD AUTHID CURRENT_USER is -- careful on this
line; SED edit occurs!
sys@ORA9IR2>
sys@ORA9IR2> select min(scn_bas)-1 S from smon_scn_time
2 /
S
----------
616904
sys@ORA9IR2>
sys@ORA9IR2> select * from sys.source$ as of scn &S where rownum = 1
2 /
old 1: select * from sys.source$ as of scn &S where rownum = 1
new 1: select * from sys.source$ as of scn 616904 where rownum = 1
select * from sys.source$ as of scn 616904 where rownum = 1
*
ERROR at line 1:
ORA-08181: specified number is not a valid system change number
we use that mapping table and if the scn isn't there, well, the scn isn't there and we cannot (will not) use it.
soooo, unless you manually modify that mapping table (definitely not even near the realm of recommended!! 10g changes the entire thing as well (that table is inifitely more complex and even less "updatable". if you update it wrong -- all bets are 100% off)
10g if undo -- that is a very light weight system, many systems generate that in hours. that is what I meant by "i'm not aware of any production systems with 7 days of undo online".
As for the 1,000's of LIO's, thats a bit off as well. Try this:
drop table t;
create table t ( x int, y int ) tablespace users;
insert into t values (1,1);
commit;
variable a refcursor
variable b refcursor
variable c refcursor
alter session set events '10046 trace name context forever, level 12';
begin
open :a for select * from t a;
open :b for select * from t b;
open :c for select * from t c;
end;
/
print a
begin
for i in 1 .. 10000
loop
update t set x = x+1;
commit;
end loop;
end;
/
print b
print c
(part of a presentation I do to explain read/write consistency).
Now, run tkprof on the trace file. You should see:
query a: single digit LIO's
query b: thousands of LIO's
query c: single digit LIO's
it depends on what is in the cache (and the further back in time we go -- well, the less chance that what you want is in cache!) query C benefits from the CR version of the block that query B built and cached (we keep something like 6 block versions in cache)
Very Interesting
Jon, April 26, 2004 - 7:32 am UTC
2 more questions before I RIP on this issue:
1) If you can only go back 5 days regardless, then why does the Oracle documentation read:
...
Application Developer's Guide - Fundamentals
Currently, the flashback query feature keeps track of times up to a maximum of 5 days. This period reflects server uptime, not wall-clock time. For example, if the server is down for a day during this period, then you can
specify as far back as 6 days. To query data farther back than this, you must specify an SCN rather than a date and time. You must record the SCN yourself at the time of interest, such as before doing a DELETE.
...
It quite clearly states you can query back further using an SCN.
And why can I go back 7 or 8 days? (And no, I haven't updated the data dictionary)
2) You talk about CR blocks being constructed and brought into the cache. Well why do I still get good performance if I restart the DB? Nothing in the cache then. In my example, I had 10,000 updates as well. Still getting fast query time:
SQL> set echo on;
SQL>
SQL> set timing on;
SQL>
SQL> alter tablespace data_app offline;
Tablespace altered.
Elapsed: 00:00:00.08
SQL>
SQL> alter tablespace data_app online;
Tablespace altered.
Elapsed: 00:00:01.02
SQL>
SQL> select *
2 from big_table as of scn 817778
3 where change_date > to_date('25-APR-2004 22:35:31','DD-MON-YYYY HH24:MI:SS')
4 minus
5 select *
6 from big_table as of scn 805637
7 where change_date > to_date('25-APR-2004 22:35:31','DD-MON-YYYY HH24:MI:SS');
101 rows selected.
Elapsed: 00:00:00.09
Still only 9/100ths of a second... nothing to worry about.
(P.S. How's Iceland... cold?)
April 26, 2004 - 7:51 am UTC
1) documentation is incorrect then. i'll file a doc bug.
whats your min(scn_base), that'll be the line. peek at that table.
2) how many rows/block do you have, how many times do you re-read a block? you see, if you read the same block 100 times -- it won't roll it back 100 times -- just once.
also, please measure LIO's vs LIO's -- LIOs = latching. did you test this under load where multiple people doing excessive LIO's are banging into eachother?
i'm not measuring time, its all about "what work am i doing and how will this affect my in a real system"
read rollback from a day ago = PHYSICAL IO's added to system
use N days of rollback to reconstruct block = LOGICAL IO's added to system
in a real system, with lots of users, both of those are things you don't want to do unless you really have to.
don't get me wrong -- flashback is *cool*, just beware of how it physically works (run that little test with queries a,b,c -- see what happens, now multiply that by 100 users, 1000 users, 10000 users maybe near simultaneously.
And finally
Jon, April 26, 2004 - 8:05 am UTC
Yes, that doco is misleading then.
I must say, though, many people are playing with breaking the 5-day barrier. Inserting into SMON_SCN_TIME being one of them, and JL suggested setting event 10311 to bypass the SCN => timestamp lookup.
I like you CR sample script - simple and clear. I was being slightly disingenious with you on that last post. My 9/100ths of a sec after clearing the cache was actually 14,000 LIOs on a single user system, dropping to 36 LIOs on the second run.
Just seeing if you were on your toes... you were ;-)
April 26, 2004 - 9:19 am UTC
so, when does my beer arrive?
:)
Beer
Jon, April 26, 2004 - 10:02 am UTC
Yes indeed. I owe you. When are you next in Australia?
Great discussion but isn't there a simpler answer?
JAG, April 26, 2004 - 1:41 pm UTC
What about this for an alternative:
- Add a column to the query table (tab1) and populate via a sequence (seq1)
- Create a new table (tab2) that contains the max sequence number queried by each user
- Create type (type1) that is equivalent to the definition of tab1
- Create a type (type2) that is a table of types type1
- Create a function (f1) that returns type2. The function queries tab1 and stuffs the data a variable of type2. After this is updates tab2 with the max returned sequence number.
- Use cast etc. to allow the call to f1 to behave like a table.
April 26, 2004 - 2:29 pm UTC
no, because of read consistency (either that or you would be serializing updates against a single row, which is what the lock table in share mode did in a way as well) suffers the same exact problem as reading by timestamp!
the simple answer:
a) use timestamp
b) adjust your time based on v$transaction <<== only difference from "existing code"
Nothing simpler so far ...
Gabriel, April 26, 2004 - 3:16 pm UTC
Jag,
Your sequence-based alternative is no different than the timestamp method (except for not being able to link into v$transaction).
User A inserts new record in Tab1 ... uses sequence 100
no commit
User A updates existing record in Tab1
uses sequence 101
no commit
User B inserts new record in Tab1
uses sequence 102
commit
Sync process (User C) executes with a last max sequence number of 50
it sees the modification 102 but not 100 and 101
moves everything greater than 50 and sets the next max sequence number to 102.
User A commits changes 100 and 101.
Next time User C syncs it will get everything greater than 102
hence it totally misses 100 and 101.
Grahams method of using timestamps and v$transaction to get the smallest time window safe enough to include all potentially changed records seems to be the only method (not requiring serialization at source) so far. It only has one caveat
it may process the same change multiple times (this could be a problem if the target system is being modified by other processes
otherwise it is just a bit more CPU consumption
likely much less CPU than a full sync method would require).
Toms insight into using both v$transaction and v$lock minimizes the possibility of processing the same change more than once. I wouldnt say the probability is near zilch
all it takes is a long open transaction and/or a busy system (add to that trigger-happy sync users). In the example above, if User C comes twice for sync and User A has still not closed that transaction, the record with change 102 will be processed at least 3 times (since in both cases the earliest safe timestamp will be set to the start of User As transaction).
Very good thread indeed.
End up in confusion
Tony, April 27, 2004 - 12:39 am UTC
As there is a requirment for me to load changed data every night to Data warehouse database from 24x7 OLTP, I red this page and ended up in utter confusion.
My 24x7 OLTP has hundreds of tables. Certainly, I do not prefer trigger based solutions. Some of the tables do not have timestamp column. Even if all tables have timestamp column, I might miss out some rows as master and details rows may have different (in seconds) timestamp for a transaction. It seems that we can not use transaction number based queries also.
Can you please give me some reliable and simple solution to capture the changes with example?.
Persuing V$LOCK suggestion
Graham, April 28, 2004 - 1:01 pm UTC
You suggested joining v$lock with v$transaction to get the oldest start time for "in flight" transactions with locks on the tables of interest. I've just started to look into this, and can't see a common column on which to join, or how to get the names of the affected tables.
I guess I could join via v$session, using the binary manipulations you showed here:
</code>
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:839412906735 <code>
Can you please suggest a query to get the oldest start time from v$transaction where the transaction has uncommitted changes on tables called 'T1' and 'T2'?
BTW, does the db that powers this forum store previews?
April 28, 2004 - 7:17 pm UTC
v$transaction(addr) -> v$session(taddr), giving you the sid, v$session(sid) -> v$lock(sid), giving you TM locks and id1 = object_id, id1 -> dba_objects(object_id)
ops$tkyte@ORA9IR2> select nvl(min(start_time),to_char(sysdate,'mm/dd/yy hh24:mi:ss'))
2 from v$transaction
3 where addr in
4 (select taddr
5 from v$session
6 where sid in
7 (select sid
8 from v$lock
9 where type = 'TM'
10 and id1 in
11 (select object_id
12 from dba_objects
13 where object_name in ('T', 'T1', 'T2' )
14 and owner = 'OPS$TKYTE')
15 )
16 )
17 /
NVL(MIN(START_TIME),
--------------------
04/28/04 19:15:10
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> commit;
Commit complete.
ops$tkyte@ORA9IR2> @test
ops$tkyte@ORA9IR2> select nvl(min(start_time),to_char(sysdate,'mm/dd/yy hh24:mi:ss'))
2 from v$transaction
3 where addr in
4 (select taddr
5 from v$session
6 where sid in
7 (select sid
8 from v$lock
9 where type = 'TM'
10 and id1 in
11 (select object_id
12 from dba_objects
13 where object_name in ('T', 'T1', 'T2' )
14 and owner = 'OPS$TKYTE')
15 )
16 )
17 /
NVL(MIN(START_TIME),
--------------------
04/28/04 19:18:31
should do it...
didn't understand the BTW q.
Streaming and summarizing
Duke Ganote, April 28, 2004 - 1:02 pm UTC
Tom-- I found an intriguing, simple example of streams at </code>
http://www.oracle-base.com/articles/9i/Streams9i.php <code>
and it looks great! The article describes streams as an 'extension of a number of existing technologies including Advanced Queuing, LogMinor [sic] and Job Scheduling'. These are all 8i technologies, but it appears that 9i is mandatory because the example script includes:
GRANT EXECUTE ON DBMS_FLASHBACK TO strmadmin;
Could streams be retrofitted onto an 8i database?
Also, would it be fair to summarize the original question and 30-or-so responses as follows--
Q: How can I identify what's committed/changed since the last time I checked a table?
A: Solutions are either 'pre-commit' or 'after commit'.
(1) 'After commit' solutions looked at tying committed changes to an SCN; these were 'hard' but precise solutions.
(2) The 'pre-commit' solution uses V$TRANSACTION and V$LOCK to look for any transaction that *might* commit changes to the table. This is the recommended and simple solution.
April 28, 2004 - 7:19 pm UTC
streams requires 9ir2 -- it cannot be used before.
the solutions were not pre/post commit -- all were POST commit. there is quite simply "no other way" in Oracle.
the techniques were to minimize the duplication (retrieving the same change two times).
the easiest would be timestamp based, assuming a timestamp on each and every table row is maintained. (if you are DIY, streams would be the prefered method)
Thanks for v$lock query!
Graham, April 29, 2004 - 12:13 am UTC
Thanks for the query Tom! That's a cleaner way to go through v$session than matching the transaction id (via bit manipulation) that I envisioned. A bit convoluted but only has to be done once per sync if one wants that extra level of duplicate elimination.
On the streams vs. polling front, I agree that change capture via the log will often be more efficient and less intrusive than querying for changes, but I wouldn't say that streams are always preferred. Which performs best will depend on factors such as frequency of changes vs. frequency of syncs, how similar the syncs are, and when the syncs occur. Other benefits for querying, which are important to our customer base, are portability, simplicity, and transparency. The same technique works for all mainstream SQL RDBMS products.
Sorry for going off-topic with the BTW. I just wondered if the prototype review is recorded when one presses the Preview Review button. I wasn't being paranoid, but if you record those and ever look at them one could argue that you've made use of the equivalent of dirty reads. ;-)
April 29, 2004 - 7:28 am UTC
In general, given a system where you have to be non-intrusive (eg: no timestamps exist, no triggers can be added) there isn't really much of a choice.
The Preview Review uses a "state" table -- as all htmldb applications do. So yes, the preview is stored in a session state table, however we don't really ever look at that table - it is just variable/values for a given htmldb session.
Minor correction
Graham, April 29, 2004 - 7:58 am UTC
One correction to the v$lock query in case anyone else wants to use it - it's comparing date strings (in the hard to compare mm/dd/yy format) instead of dates. So I think it would break across year boundaries. Your first version did it better:
select nvl( min(to_date(start_time,'mm/dd/rr hh24:mi:ss')),sysdate)
from v$transaction
...
Date stringification
Duke Ganote, April 29, 2004 - 9:45 am UTC
I find an almost ISO 8601 date-string format
http://www.w3.org/TR/NOTE-datetime is convenient; for example:
SQL> select to_char(sysdate,'YYYY-MM-DD HH24:MI:SS') dateTime from dual;
DATETIME
-------------------
2004-04-29 08:41:06
v$transaction.start_timestamp?
Graham, April 29, 2004 - 11:41 am UTC
For v$transaction, we're stuck with start_time in the anachronistic 'mm/dd/rr hh24:mi:ss' format. I too am a strong proponent of ISO 8601 (though I prefer a blank instead of a 'T' between date and time, which seems to have become a de facto standard).
I hope that someday Oracle adds a start_timestamp column to v_$transaction, that is a TIMESTAMP (with or without time zone) that has better than 1 sec resolution.
April 29, 2004 - 11:56 am UTC
well, in 10g there is a start_DATE of type date (removes the ambiguity with the date format) but it is a DATE, not a timestamp
Transaction time
MEHMOOD, June 09, 2004 - 3:12 am UTC
Tom:
Is there any way, that we could see the date and time for a transaction which was committed one month before. And the database in no archive log mode.
Thanks in advance
June 09, 2004 - 8:42 am UTC
not unless you recorded it somewhere.
A reader, August 10, 2004 - 11:46 pm UTC
ops$tkyte@ORA9IR2> insert into t values ( userenv('commitscn') );
1 row created.
ops$tkyte@ORA9IR2> select x, dbms_flashback.get_system_change_number from t;
X GET_SYSTEM_CHANGE_NUMBER
---------- ------------------------
24758134 24758134
ops$tkyte@ORA9IR2> exec dbms_lock.sleep(10);
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2> commit;
Commit complete.
ops$tkyte@ORA9IR2> select x, dbms_flashback.get_system_change_number from t;
X GET_SYSTEM_CHANGE_NUMBER
---------- ------------------------
24758136 24758138
I dont get this, how can the column t.x change magically when no updates have happened on that row?
No matter what magic userenv('commitscn') does, it is physically stored in the table, so how does it after the sleep() and commit()?
Thanks
August 11, 2004 - 9:36 am UTC
the database can do whatever magical stuff it wants to, we wrote it after all :)
userenv('commitscn') -- which can be used but once per transaction -- is a flag to the server that says "hey, remember where you just stuffed me ok, cause when we commit, you really gotta come back and update me"
A reader
A, October 28, 2004 - 3:40 am UTC
Hi,
We have got a normal oltp db and datawarehouse db.We need to pull only those data which has changed since last time we pulled from oltp to warehouse.I am aware of the fact that CDC and stream can do,but these are available only on enterprise edition ,we have standard edition(9i).Is there any way with which we can do this.I can think of writing trigger and add a datastamp column in every table to accomplish this,but I don't know if it is good idea to have triggers for every table ??Please advice
October 28, 2004 - 9:12 am UTC
well, if you don't have EE, you are in "DIY" mode (do it yourself). Your options are quite limited.
you have read only materialized views (snapshots).
you have "put a trigger on every table and use the technique described above".
Another DIY suggestion?
Bob B, October 28, 2004 - 10:59 am UTC
I'm not sure if this is what Tom was suggesting way back with AQ, but couldn't the the new/changed rows be inserted into a log table. A job can then comes along, take a row from the log table and duplicate it into a user log table once for each user that would request the information and then delete the row from the log table. Then it is up to you to determine the rules of when a row of the user log table is considered read and then whether to delete or update that row.
Performance wise, this puts a hit on each transaction (it has to insert into the log table) and the database takes a hit proportional to [transaction rate * user count]. If the read rule matches what the users expect, then there will be no missed changes and no duplicates. Changes may take a little longer to show up, but they will show up. I doubt this will scale well at all, but should work ok if the system can handle the combination of the transaction rate with the background job.
If the user ids are numbers or can be easily hashed to a unique number, then a bitmap on each transaction table that is to be tracked could work. On insert/update, the bitmap would be set to 000000000000... (in binary; requires at least cieling( # users/8 ) bytes ). Each time a user checks for updates, you query the tables' bitmaps. If the bit for the user is a 0, display the row and update that bit to 1, otherwise skip it.
I haven't played around with bitmaps, I have no idea how that would perform (I see bad things if the user id is a sequence and has large gaps).
October 28, 2004 - 1:53 pm UTC
that is alot like the timestamp approach with extra work (sort of like using read only MV's
Excellent explanation by Tom
Juan Velez, November 08, 2004 - 3:46 pm UTC
If Tom or others can answer: I still have a doubt. How is possible using the "oldest open transaction" to get duplicates?
November 08, 2004 - 5:42 pm UTC
you tend to "re-pull" the same row.
At 11:00 you did the initial copy -- so the next time you pull everything from 11:00
At 12:00 transaction t1 started.
At 12:01 transaction t2 started.
At 12:03 transaction t2 stopped.
at 12:05 you "pull" -- you remember 12:00 as the next pull time (because of the outstanding transaction). You pull everything from 11:00 on -- that included t2 since it committed.
At 12:30 transaction t1 commits.
At 13:00 you "pull" -- everything changed since 12:00 -- you re-pull transaction T2.
Changed Data Capture
Neelz, March 16, 2005 - 9:37 am UTC
Dear Tom
<quote>
Followup:
well, if you don't have EE, you are in "DIY" mode (do it yourself). Your options are quite limited.
you have read only materialized views (snapshots).
you have "put a trigger on every table and use the technique described above".
</quote>
I too am facing almost similar problem with Oracle9iR2 SE.
Scenario
--------
Our requirement is to replicate 3 tables at a remote country site which will be used for reporting. Time lag allowed is 15 min between production DB and remote DB. Network outages are common.
The approach we were planning
-----------------------------
1. Write triggers on each table for each insert, update and delete which will log these sql statements into a log table.
2. Every 15 min a job will be scheduled to execute these sql statements at the remote country site via dblink.
3. For each successful execution at the remote site, that record will be deleted from the log table.
Is this the correct approach? While executing the sql statements for the remote site, if network outage happen what will be the consequences and how should I take care of it?
Could you please help me on this?
Thanks and Regards
Neelz
March 16, 2005 - 10:05 am UTC
you have read only snapshots for this - for reporting purposes, you need nothing else.
Just use a materialized view (snapshot), it is already done.
How about putting the timestamp in the mview itself?
Anonymous, October 19, 2005 - 7:21 pm UTC
Here's a possible solution we came up with that works great on 9iR2 (our current system) but will not run on 10g. We decided not to use it because we'll be upgrading to 10g early next year.
The idea is to have the local systimestamp added as a column to the fast refresed materialized view. This isn't normally allowed but if you replace it with to_timestamp(to_char(systimestamp)) it -DOES- work (again only on 9iR2).
Check out this example to see how only the changed rows get the new timestamp:
/*******************************************************/
/* Formatted on 2005/10/19 19:12 (Formatter Plus v4.8.0) */
SELECT *
FROM v$version
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
PL/SQL Release 9.2.0.5.0 - Production
CORE 9.2.0.6.0 Production
TNS for IBM/AIX RISC System/6000: Version 9.2.0.5.0 - Production
NLSRTL Version 9.2.0.5.0 - Production
5 rows selected
DROP TABLE tb_test
Table dropped
DROP MATERIALIZED VIEW mv_test
Snapshot dropped
CREATE TABLE tb_test (
a VARCHAR2(45)
,b NUMBER
)
Table created
INSERT INTO tb_test
SELECT object_name
, ROWNUM
FROM all_objects
WHERE ROWNUM < 5
4 rows inserted
COMMIT
Commit complete
SELECT *
FROM tb_test
A B
--------------------------------------------- --
/1005bd30_LnkdConstant 1
/10076b23_OraCustomDatumClosur 2
/10297c91_SAXAttrList 3
/103a2e73_DefaultEditorKitEndP 4
4 rows selected
CREATE MATERIALIZED VIEW LOG ON tb_test WITH ROWID INCLUDING NEW VALUES
Snapshot log created
CREATE MATERIALIZED VIEW mv_test
REFRESH WITH ROWID FAST
AS
SELECT t.*
, TO_TIMESTAMP (TO_CHAR (SYSTIMESTAMP, 'mm/dd/yyyy hh:mi:ssxff am'), 'mm/dd/yyyy hh:mi:ssxff am') row_lup_ts
FROM tb_test t
Snapshot created
COMMIT
Commit complete
SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC
A B TO_CHAR(ROW_LUP_TS)
--------------------------------------------- -- -------------------------------------
/1005bd30_LnkdConstant 1 19-OCT-05 07.13.59.781179000 PM
/10076b23_OraCustomDatumClosur 2 19-OCT-05 07.13.59.781179000 PM
/10297c91_SAXAttrList 3 19-OCT-05 07.13.59.781179000 PM
/103a2e73_DefaultEditorKitEndP 4 19-OCT-05 07.13.59.781179000 PM
4 rows selected
INSERT INTO tb_test
SELECT 'INSERTED --> ' || object_name
, 5 + ROWNUM
FROM all_objects
WHERE ROWNUM < 5
4 rows inserted
COMMIT
Commit complete
begin
DBMS_MVIEW.REFRESH(USER||'.MV_TEST');
end;
PL/SQL procedure successfully completed
SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC
A B TO_CHAR(ROW_LUP_TS)
--------------------------------------------- -- -------------------------------------
INSERTED --> /1005bd30_LnkdConstant 6 19-OCT-05 07.14.00.526721000 PM
INSERTED --> /10076b23_OraCustomDatumClosur 7 19-OCT-05 07.14.00.526721000 PM
INSERTED --> /103a2e73_DefaultEditorKitEndP 9 19-OCT-05 07.14.00.526721000 PM
INSERTED --> /10297c91_SAXAttrList 8 19-OCT-05 07.14.00.526721000 PM
/1005bd30_LnkdConstant 1 19-OCT-05 07.13.59.781179000 PM
/10076b23_OraCustomDatumClosur 2 19-OCT-05 07.13.59.781179000 PM
/10297c91_SAXAttrList 3 19-OCT-05 07.13.59.781179000 PM
/103a2e73_DefaultEditorKitEndP 4 19-OCT-05 07.13.59.781179000 PM
8 rows selected
UPDATE tb_test
SET a = 'UPDATED --> ' || a
WHERE ROWNUM = 1
1 row updated
COMMIT
Commit complete
begin
DBMS_MVIEW.REFRESH(USER||'.MV_TEST');
end;
PL/SQL procedure successfully completed
SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC
A B TO_CHAR(ROW_LUP_TS)
--------------------------------------------- -- -------------------------------------
UPDATED --> /1005bd30_LnkdConstant 1 19-OCT-05 07.14.00.634670000 PM
INSERTED --> /1005bd30_LnkdConstant 6 19-OCT-05 07.14.00.526721000 PM
INSERTED --> /10297c91_SAXAttrList 8 19-OCT-05 07.14.00.526721000 PM
INSERTED --> /10076b23_OraCustomDatumClosur 7 19-OCT-05 07.14.00.526721000 PM
INSERTED --> /103a2e73_DefaultEditorKitEndP 9 19-OCT-05 07.14.00.526721000 PM
/10076b23_OraCustomDatumClosur 2 19-OCT-05 07.13.59.781179000 PM
/10297c91_SAXAttrList 3 19-OCT-05 07.13.59.781179000 PM
/103a2e73_DefaultEditorKitEndP 4 19-OCT-05 07.13.59.781179000 PM
8 rows selected
/*******************************************************/
/******************************************************
Here's just the sql script itself so that you don't have to cut/paste (I also commented out the drop lines):
/*****************************************************/
SELECT *
FROM v$version;
--DROP TABLE tb_test;
--DROP MATERIALIZED VIEW mv_test;
CREATE TABLE tb_test (
a VARCHAR2(100)
,b NUMBER
);
INSERT INTO tb_test
SELECT object_name
, ROWNUM
FROM all_objects
WHERE ROWNUM < 5;
COMMIT ;
SELECT *
FROM tb_test;
CREATE MATERIALIZED VIEW LOG ON tb_test WITH ROWID INCLUDING NEW VALUES;
CREATE MATERIALIZED VIEW mv_test
REFRESH WITH ROWID FAST
AS
SELECT t.*
, TO_TIMESTAMP (TO_CHAR (SYSTIMESTAMP, 'mm/dd/yyyy hh:mi:ssxff am'), 'mm/dd/yyyy hh:mi:ssxff am') row_lup_ts
FROM tb_test t;
COMMIT ;
SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC;
INSERT INTO tb_test
SELECT 'INSERTED --> ' || object_name
, 5 + ROWNUM
FROM all_objects
WHERE ROWNUM < 5;
COMMIT ;
EXEC DBMS_MVIEW.REFRESH(USER||'.MV_TEST');
SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC;
UPDATE tb_test
SET a = 'UPDATED --> ' || a
WHERE ROWNUM = 1;
COMMIT ;
EXEC DBMS_MVIEW.REFRESH(USER||'.MV_TEST');
SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC;
/******************************************************/
On 10g the mview creation gives the same error that 9iR2 gives if you just put SYSTIMESTAMP by itself:
SELECT *
FROM v$version
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.1.0.4.2 - Prod
PL/SQL Release 10.1.0.4.2 - Production
CORE 10.1.0.4.0 Production
TNS for Linux: Version 10.1.0.4.0 - Production
NLSRTL Version 10.1.0.4.2 - Production
5 rows selected
SQL> CREATE MATERIALIZED VIEW mv_test
2 REFRESH WITH ROWID FAST
3 AS
4 SELECT t.*
5 , TO_TIMESTAMP (TO_CHAR (SYSTIMESTAMP, 'mm/dd/yyyy hh:mi:ssxff am'), 'mm/dd/yyyy hh:mi:ssxff am') row_lup_ts
6 FROM tb_test t;
, TO_TIMESTAMP (TO_CHAR (SYSTIMESTAMP, 'mm/dd/yyyy hh:mi:ssxff am'), 'mm/dd/yyyy hh:mi:ssxff am') row_lup_ts
*
ERROR at line 5:
ORA-12015: cannot create a fast refresh materialized view from a complex query
/*********************************************************/
Assumming that the refreshing of the mview is being done by a single job (say every X minutes), is there any way to have a serially updated column (either timestamp or even a sequence #) that will work in both 9iR2 and 10g? Then the process that reads from the mview can always do a greater than check based on the max value that was in that column last time and do it's delta processing using only that chunk (and if nothing changed, then it procesess nothing).
Thanks...
PS: Without using Oracle Streams to capture the changes, I'm talking only about straight mviews here...
October 19, 2005 - 7:59 pm UTC
that means it was a bug in 9i fixed in 10g - the timestamp should never have been allowed.
So is there any legitimate way of getting only the changed records from the mview?
Anonymous, October 20, 2005 - 12:26 am UTC
Is there a "legitimate" way of getting only the changed records for the mview? The DB itself knows what changed (it's a fast refreshable mview) so is there a way to get that info out to the application. This seems like there is a bit more leeway than just updates to a table since there is the serialized step of refreshing the mview (which is where I see some kind of monotonic value being added).
Any ideas?
October 20, 2005 - 8:06 am UTC
no legitimate way - but the materialized view log will have the primary keys of the changed rows........ and a dml type (insert/update/delete)....
unless you direct path, then you'll have rowid ranges instead of primary keys - so it can get tricky.
There is a thing called CDC (change data capture)....
There is a thing called Streams
in current software to help solve the situation you have, of wanting to know about the changes yourself
How about using a deterministic function?
Anonymous, October 20, 2005 - 10:16 am UTC
I changed the script to create a deterministic function that returns systimestamp and it works in both 9iR2 and 10g.
Here's the modified script:
/* =================================================== */
/* Formatted on 2005/10/20 10:05 (Formatter Plus v4.8.0) */
DROP TABLE tb_test
Table dropped
DROP MATERIALIZED VIEW mv_test
Snapshot dropped
SELECT *
FROM v$version
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.1.0.4.2 - Prod
PL/SQL Release 10.1.0.4.2 - Production
CORE 10.1.0.4.0 Production
TNS for Linux: Version 10.1.0.4.0 - Production
NLSRTL Version 10.1.0.4.2 - Production
5 rows selected
CREATE OR REPLACE FUNCTION fdw_tmp_ts
RETURN TIMESTAMP DETERMINISTIC
IS
BEGIN
RETURN SYSTIMESTAMP;
END;
Function created
CREATE TABLE tb_test (
a VARCHAR2(45)
,b NUMBER
)
Table created
INSERT INTO tb_test
SELECT object_name
, ROWNUM
FROM all_objects
WHERE ROWNUM < 5
4 rows inserted
COMMIT
Commit complete
SELECT *
FROM tb_test
A B
--------------------------------------------- --
DUAL 1
DUAL 2
SYSTEM_PRIVILEGE_MAP 3
SYSTEM_PRIVILEGE_MAP 4
4 rows selected
CREATE MATERIALIZED VIEW LOG ON tb_test WITH ROWID INCLUDING NEW VALUES
Snapshot log created
CREATE MATERIALIZED VIEW mv_test
REFRESH WITH ROWID FAST
AS
SELECT t.*
, fdw_tmp_ts row_lup_ts
FROM tb_test t
Snapshot created
COMMIT
Commit complete
SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC
A B TO_CHAR(ROW_LUP_TS)
--------------------------------------------- -- ----------------------------------------
SYSTEM_PRIVILEGE_MAP 4 20-OCT-05 10.06.29.017678000 AM
SYSTEM_PRIVILEGE_MAP 3 20-OCT-05 10.06.29.017600000 AM
DUAL 2 20-OCT-05 10.06.29.017503000 AM
DUAL 1 20-OCT-05 10.06.29.013737000 AM
4 rows selected
INSERT INTO tb_test
SELECT 'INSERTED --> ' || object_name
, 5 + ROWNUM
FROM all_objects
WHERE ROWNUM < 5
4 rows inserted
COMMIT
Commit complete
begin
DBMS_MVIEW.REFRESH(USER||'.MV_TEST');
end;
PL/SQL procedure successfully completed
SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC
A B TO_CHAR(ROW_LUP_TS)
--------------------------------------------- -- ----------------------------------------
INSERTED --> SYSTEM_PRIVILEGE_MAP 9 20-OCT-05 10.06.29.526768000 AM
INSERTED --> SYSTEM_PRIVILEGE_MAP 8 20-OCT-05 10.06.29.526460000 AM
INSERTED --> DUAL 7 20-OCT-05 10.06.29.526100000 AM
INSERTED --> DUAL 6 20-OCT-05 10.06.29.521610000 AM
SYSTEM_PRIVILEGE_MAP 4 20-OCT-05 10.06.29.017678000 AM
SYSTEM_PRIVILEGE_MAP 3 20-OCT-05 10.06.29.017600000 AM
DUAL 2 20-OCT-05 10.06.29.017503000 AM
DUAL 1 20-OCT-05 10.06.29.013737000 AM
8 rows selected
UPDATE tb_test
SET a = 'UPDATED --> ' || a
WHERE ROWNUM = 1
1 row updated
COMMIT
Commit complete
begin
DBMS_MVIEW.REFRESH(USER||'.MV_TEST');
end;
PL/SQL procedure successfully completed
SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC
A B TO_CHAR(ROW_LUP_TS)
--------------------------------------------- -- ----------------------------------------
UPDATED --> DUAL 1 20-OCT-05 10.06.29.675079000 AM
INSERTED --> SYSTEM_PRIVILEGE_MAP 9 20-OCT-05 10.06.29.526768000 AM
INSERTED --> SYSTEM_PRIVILEGE_MAP 8 20-OCT-05 10.06.29.526460000 AM
INSERTED --> DUAL 7 20-OCT-05 10.06.29.526100000 AM
INSERTED --> DUAL 6 20-OCT-05 10.06.29.521610000 AM
SYSTEM_PRIVILEGE_MAP 4 20-OCT-05 10.06.29.017678000 AM
SYSTEM_PRIVILEGE_MAP 3 20-OCT-05 10.06.29.017600000 AM
DUAL 2 20-OCT-05 10.06.29.017503000 AM
8 rows selected
/* =================================================== */
Here's just the script itself (so you don't have to cut/paste):
DROP TABLE tb_test;
DROP MATERIALIZED VIEW mv_test;
SELECT *
FROM v$version;
CREATE OR REPLACE FUNCTION fdw_tmp_ts
RETURN TIMESTAMP DETERMINISTIC
IS
BEGIN
RETURN SYSTIMESTAMP;
END;
/
CREATE TABLE tb_test (
a VARCHAR2(45)
,b NUMBER
);
INSERT INTO tb_test
SELECT object_name
, ROWNUM
FROM all_objects
WHERE ROWNUM < 5;
COMMIT ;
SELECT *
FROM tb_test;
CREATE MATERIALIZED VIEW LOG ON tb_test WITH ROWID INCLUDING NEW VALUES;
CREATE MATERIALIZED VIEW mv_test
REFRESH WITH ROWID FAST
AS
SELECT t.*
, fdw_tmp_ts row_lup_ts
FROM tb_test t;
COMMIT ;
SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC;
INSERT INTO tb_test
SELECT 'INSERTED --> ' || object_name
, 5 + ROWNUM
FROM all_objects
WHERE ROWNUM < 5;
COMMIT ;
EXEC DBMS_MVIEW.REFRESH(USER||'.MV_TEST');
SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC;
UPDATE tb_test
SET a = 'UPDATED --> ' || a
WHERE ROWNUM = 1;
COMMIT ;
EXEC DBMS_MVIEW.REFRESH(USER||'.MV_TEST');
SELECT a
, b
, TO_CHAR (row_lup_ts)
FROM mv_test
ORDER BY row_lup_ts DESC;
/* =================================================== */
How consistent do you think this would be in actual use? The function is only called on the changed records so only those timestamps are being updated (as in the example above)...
October 20, 2005 - 4:32 pm UTC
you are lying to us, it is not deterministic.
Play with fire.
You get burned.
Think about it.
Query on the above thread..
A reader, January 22, 2007 - 1:23 pm UTC
Hi,
Pardon for the ignorance, but will the above example not suffice if there would only be conventional inserts (no direct path) and no updates AND the refresh is done once a day.
Regards,
last modified record
D, October 10, 2007 - 6:40 am UTC
Hi Tom,
Could you please let me know is there any way to find out the last inserted record in a table?
Thanks,
Dheeraj
v$transaction solution
A reader, October 29, 2009 - 6:23 am UTC
Tom,
"select nvl( min(to_date(start_time,'mm/dd/rr hh24:mi:ss')),sysdate)
from v$transaction;"
Does this solution work in a RAC config ?
I'm having problems with it on Release 11.1.0.7 with 2 RAC nodes. User has performed an update via 1 node (not committed), the batch job which selects from v$transaction can run on either node, if it happens to run on the node where the interactive user has performed the update, it can see a value in V$TRANSACTION, otherwise it can't.
Any idea how I can get round this problem?
October 29, 2009 - 10:10 am UTC
use gv$transaction in RAC
Which changes are being keep track by CDC
A reader, May 28, 2010 - 5:38 pm UTC
Hi Tom,
If I have a table A with columns C1,C2,C3,C4,C5 and I configure CDC to keep track of changes (I/U/D) for table A only for column C5, will CDC capture changes if an update is done for table A on column C3?
My understanding is that during update statements, CDC only capture changes only if the column on which CDC is set up to track down changes is modified (in my example CDC only keeps track of changes done via update for table A only if column C5 is modified)
Am I right?
Thanks in advance for your help.
A reader, May 04, 2011 - 2:42 pm UTC
We have a query in our application that is -
SELECT /*+ FIRST_ROWS (100) */
a_ordered.output_sequence_no
,a_ordered.message_type
,a_ordered.message_data
FROM
(
SELECT
a.output_sequence_no,
a.message_type,
a.message_data
FROM message_audit a
WHERE a.trading_day = :trading_day
AND a.exchange_code = :exchange_code
AND a.class IN ( SELECT ucp_message_class
FROM tramp_ucp_message_class
WHERE tramp_message_class = :message_class
)
and a.firm IN ( SELECT g.firm from operator_group g
WHERE g.group_name = :operator_group_name
AND g.trading_day = :trading_day
AND g.exchange_code = :exchange_code
)
AND a.output_sequence_no > :output_sequence_no
AND a.ora_rowscn <= ( SELECT NVL( tx.earliest_scn, db.current_scn ) AS query_scn
FROM ( SELECT MIN( start_scn ) AS earliest_scn
FROM v$transaction
WHERE name = USER||'_MESSAGE_AUDIT'
) tx,
v$database db
)
order by a.trading_day, a.exchange_code, a.output_sequence_no
) a_ordered
WHERE rownum < :req_message
/
The requirements for the query are:
1. Trade transactions will be processed and committed in parallel.
2. Each transaction can produce several new message audit records (no updates).
3. The query must return audit records in output_sequence_no (OSN) order, but cant return larger OSNs if a transaction with lower OSNs is still pending.
The problem:
1. When operating right at the point where active transactions are committing, the query is returning a record for a later OSN while earlier transactions are still pending. This results in records with earlier OSNs committed a fraction of a second later being missed from the audit trail because the later OSN is used to drive the next iteration of the query. This is the issue that the V$TRANSACTION data is being used to avoid, but clearly it is not working as expected.
The design is in several parts:
1. The incoming data is pre-processed in PL/SQL, which names the transaction using SET TRANSACTION command.
2. The name appears in V$TRANSACTION view as soon as real transactional work is done (INSERT, UPDATE or SELECT FOR UPDATE).
3. The queries are running on a loop to pick up new audit messages.
4. The queries uses data from V$TRANSACTION to get the MIN(start_scn) for in flight transaction and exclude commited rows with a higher ORA_ROWSCN.
We are making some assumptions as to how V$TRANSACTION works:
1. That all transactions will appear in V$TRANSACTION, if only briefly.
2. That the INSERT to the message_audit table will assign the new MESSAGE_ID from the sequence and add the transaction to V$TRANSACTION as an atomic step. This captures the appropriate start_scn.
3. That the entry stays in V$TRANSACTION until the commit is complete, even when synchronous DataGuard is in place.
I have looked on metalink for any bugs around ORA_ROWSCN and cant seem to see any relevance to the problem we are experiencing.
Any help would be much appreciated.
May 04, 2011 - 3:36 pm UTC
what is an OSN?
I presume you meant "output_sequence_no"?
How is that number assigned? What rule do you have in place to ensure that a higher 'OSN' means "my transaction started after yours"?
That the INSERT to the message_audit table will assign the new MESSAGE_ID
from the sequence and add the transaction to V$TRANSACTION as an atomic step.
This captures the appropriate start_scn.
That cannot happen - it is not even feasible to consider it happening that way. You will have an entry in v$transaction as part of a sequential set of steps - things that take place one after the other.
The other issue is that queries against v$ views are not read consistent - they are volatile in memory data structures.
Instead of trying to say what this query is supposed to do for you - why don't you tell us the requirements of the process you are trying to accomplish
A reader, May 05, 2011 - 2:31 am UTC
Ok well a simple as I can explain this we have a business requirement to send back audit details to our members.
The process is that the member will send in the last output_sequence_no they received, the output_sequence_number (OSN)is a number generated by a sequence in oracle. We then need to provide them with all their audit data since that OSN they provided. They requrest this in batches which is why the rownum limit is in the query.
However it is entirely possible that OSN 10 could be committed before OSN 9 and so our requirement is to look at any in flight transactions on the message_audit table, for inserts into the message_audit table to cater for this we set the transaction name so we can query v$transaction for these transactions.
The problem that is occurring is that when we send OSN 10 and do not include any below this the audit data will never be sent as they will request OSN > 10 on the next feed.
Again any help would be gratefully appreciated.
Regards
Gareth
May 06, 2011 - 9:32 am UTC
why not just send them the last N committed transactions - mark them as you retrieve them as "sent". Wouldn't that be trivial to accomplish and solve the requirement?
Using Materialized View Log instead of CDC
Lal, September 13, 2012 - 8:47 am UTC
Tom,
I have a few queries related to change data capture.
Do you foresee any issues if i use mv logs for
getting the data changes in oltp (Insert,Update,Delete)
and then use the same for populating DWH?
The same thing can be achieved by Triggers also.
Will MV logs be more efficient compared to triggers?
Currently we use CDC and we are facing some issues with CDC. (already raised to Oracle support).
Seems that golden gate is the Oracle recommended solution in future for CDC, but this would need extra license cost.
Is it possible to get the changes as insert update delete (IUD) with golden gate similar to that in CDC?
September 14, 2012 - 6:45 pm UTC
you cannot use the mv logs, plain and simple. they are ours, they are undocumented, do not use them.
I would recommend either
a) streams
b) golden gate
yes, golden gate has the ability to process transformations just like streams (and more)
Extract delta from MV
karma, September 07, 2013 - 12:54 pm UTC
I have declared MV logs on 4 source tables (database A ) and then I have created 4 Materialized Views on this tables (Database B using dblinks ) with FAST Refresh which works fine. But I need to use this Materialized view which would be refreshed every 5 minutes and I want to extract recent changes i.e Delta only and then apply to 4-5 different target tables. This are big tables and target tables needs to refreshed instantly how can I get just delta ?
May seem obvious but I am not able to design reliable process/mechanism. Any help/suggestions ?
September 10, 2013 - 9:38 am UTC
materialized views are not designed to provide you that information - they are designed for us to be able to apply deltas from one set of tables to another. We do not track the "deltas" of the "other" set of tables anywhere at all.
Now, you can have materialized views of materialized views - perhaps these 4-5 other tables should be materialized views themselves - of the materialized views.
Extract delta from MV
karma_hs, September 11, 2013 - 2:38 pm UTC
Ok. Is creating MV log on Materalized View an option to get delta and use them to feed target tables ? Any drawbacks issues with this approach ? Or this is not advisable
September 23, 2013 - 5:51 pm UTC
you may have materialized views (mvs) of mvs yet.
but we do the extraction from the mv logs - you do not.