A reader, March 13, 2013 - 11:15 am UTC
"Assume Datawarehouse Database is in Archivelogmode
a. 1000 Tables
b. 10 different Tablespace
c. Total Size of 2 TB
d. 500 Tables are loaded in nologging mode. (To reducde redo log generation
(These tables can be easily reconstructed in case of loss)
I need to capture change for 10 Tables apply to Datamart Database
a. All these 10 Tables are in 1 Tablespace
b. size of these 10 Tables is around 50GB (1/20 of actual size of Database )
c. These 10 Tables are having loggind mode. "
Tom
What do you suggest apart from Goldengate. We dont have money to buy Goldengate. :) How we can replicate the data with minimum impact on Source database and minimum delay in Target Database.
Thanks
A reader, March 13, 2013 - 12:30 pm UTC
Tom
1. All the tables that I need to replicate are logged. Source Database is in Archivelog mode. There are other tables which are not of interest to me are not logged. So, I dont care about them.
2. Why I need to replicate? Because users on my source Database is 600 hundered. Expected users on the Target database would be 25,000. I dont want to burdern my Source Database with 25k users. They want this database to be highly available.
Thanks
Koshal
March 13, 2013 - 12:47 pm UTC
get a bigger machine, seriously - I'm not kidding. A single big machine (or cluster of machines) would be infinitely preferable to multi-database and replication.
And it would be much more highly available.
You do understand that replication is a burden on the source machine as well - right...
Anyway, when you phrased the question above,you should have just said:
I have 10 tables in a single tablespace that are definitely logged and want to replicate them.
it would make it a lot easier to understand....
everything else you wrote just confused the issue. 1,000 tables - not relevant. 10 different tablespaces, not necessary to know. 2tb - irrelevant. 500 tables (of which the 10 in question are not part of) are done in no-logging mode - well, that doesn't mean anything to us...
You have 10 tables of 50gb in size that you want to replicate. That is all we know - no transaction volumes/rate of change against them. No information on the amount of redo generated by the database (any log based replication is going to have to look at the entire redo log stream, every bit of it).
The link above points you to a paper that describes our future directions with respect to replication. goldengate is the suggested path, materialized views are there as well. CDC is not a path to take at all. streams could be used, but is on a similar trajectory as CDC in the future.
A reader, March 13, 2013 - 1:33 pm UTC
Thanks a lot Tom for your quick response!
So what I understod from your answer is that even if I want to capture change for only 5% of the Tables that generate redo I need to look at each and every log file. This does not seems to me as optimal way of replicating. So, we dont have any sort of Index over here which says this redo log file "RD1" contains data of interest to Target Database "TD1" and "RD2" contains data of interst to "TD2". Why do I need to read 1000 redo log files when the data that i am interested is available in only 1 file?
I know adding an index is overhead. But, in this case, benefit defenitely outweighs the cost.
Reason why I mentioned about 1000 Tables and 2TB is to give you size of Database. 500 Tables to tell you that the database generates lot of redo and only 50% of tables genrate redo. 10 Tables needs to be replicate to tell you that only 1/20 of the information in redo I am interested. Assuming Oracle is smart enough to know which Redo file to read.:)
Thanks a lot for all your help!
March 13, 2013 - 1:42 pm UTC
... Why do I need to read 1000 redo log files when the data that i am
interested is available in only 1 file? ...
redo is a continuous stream of information - what you need is *not* in a single redo log file or even a single set of redo logs - it is spread all over the place and intermingled with all of the other redo being generated. it is not organized, clustered together, it just lands where it falls.
... But, in this case, benefit defenitely
outweighs the cost....
no, no it wouldn't. it would be a huge cost to something that has to be fast - and of very very very limited use in the real world.
... Reason why I mentioned about 1000 Tables and 2TB is to give you size of
Database. ....
but I didn't need to know that, what we'd need to understand would be volume of transactions against your 10 tables and total redo volume for the database.
... 0 Tables needs to be replicate to tell you
that only 1/20 of the information in redo I am interested. Assuming Oracle is
smart enough to know which Redo file to read.:) ...
it is - it *knows* it has to read *all* of the redo since it could be in there all over the place. redo isn't a neat and tidy bit of information - it is a continuous stream of data.
reading the redo isn't going to be a problem, it isn't going to be your bottleneck or overhead. That part doesn't phase me - if you can write it, we can read and process it. don't worry about that.
what you need to worry about is the pain in the butt that is this thing called replication. It will consume your time like you won't believe.
A reader, March 13, 2013 - 2:20 pm UTC
I agree with you that we dont know which file to read if I am simulatneusly updating tables that needs to be replicated and Tables that does not to be replicated
For Example take this Scenario
1st Business day of month - There is going to be huge volume of data. It updated Table T1... T100 and generated redo files RD1..RD1000 Size of this redo log file is 100TGB
2nd Business day till second last business data It updates Table T101...T200 and gerated redo log files RD1001 TO RD1100 and generates 10TB
On the last business day it updates Table T201 to T300 and generates redo log files RD1101 TO RD1200 WITH 10TB. These are tables that needs to be replicated in Target Database. I feel RD1101 should not in any way be dependent on any of the files from RD1..RD1000. I feel defintely there is going to be some cost in reading that 100TB of redo log files which is not required. Because they are old and no way related. I know what I am doing on 1st buiness day and I don't want to read files RD1..RD1000 either on the First business day or on the last business day.
Thanks a lot again for your time!
March 14, 2013 - 8:24 am UTC
but you see that you are a very very very special case, different from pretty much everyone else and that such a micro optimization - which would penalize everyone else (having to "index" redo logs) - wouldn't make sense in general.
the cost of having to read those files is going to be less than maintaining a database of precisely what is recorded in each redo log file over time.
any and all log based replication mechanisms - from Oracle, from 3rd parties, will have to process the redo stream as a continuous thing - that is the only way to read redo and make sense of it. redo is like a story of the database.
tell you what - why don't you just take the inputs and the process you run on database 1 and run *the same exact thing* on database 2? Why bother with replication at all?
Or, since the data is read only most of the time, do your bulk load, make the tablespace read only and transport it.
eg: don't do replication (again, I come back to this!)
Combination of Rman/Transportable Tablespace
A reader, March 14, 2013 - 3:54 pm UTC
I can't beleive the amount of passion you have for Oracle Database.
1. We can't do Transportable Tablespace as Data needs to be refreshed every 1/2 hour
2. Don't want to have mlogs on Source DB. It is going to burden the Source Database
3. Need to have all the changes that happened in Source DB to Target DB.
4. I am looking for solution which is sleek. Something like this
Initial Setup
Using Transportable Table Tablespace I will have exact copy of Tablespace TS1 on both DB1(Source DB) and DB2 (Target DB).
Scedule RMAN every 1/2 hour whatever block changes that happened on TS1 Datafiles of DB1 are applied on Database DB2.
Then export/import the datadictionary for TS1 from for DB1 to DB2
Is it possible. :)
Advantage in this approach is
1. I am not transferring whole datafile I am tranferring only the blocks that got changed
2. Faster/Quicker
3. No network issues.
What am I missing here? Why this does not work?
Thanks
March 25, 2013 - 9:21 am UTC
I will again state: do not replicate, sounds like a single system to me.
I don't know who you are or what your specific needs are - you don't tell us very much... but...
You can either;
a) use active data guard, that will allow you to open the failover database read only for queries. You can freely query the data for reports. assuming that was your (unstated) goal.
b) use a logical standby, you will be able to query the data copied from production and add auxillarily structures like indexes/materialized views - and create read write tables that do not exist at production
c) use golden gate, that would allow you to do log based replication fully.
If you use rman, you would take an outage every half hour to apply an half hour of block changes to the system and then recover them - which in itself could take the half hour - and then you'd be able to open READ ONLY. It wouldn't be practical and it would be hugely labor intensive.
3rd part replication option
Wagner Ramos, March 19, 2013 - 10:12 am UTC
Why do not use some 3rd part replication software that can be configured to collect data changes only of these 10 tables ?
I believe that you can find 3 or more sw that can collect data changes with triggers and solve your needs without changing log modes, etc.
The impact in source database will be only in insert/update/delete operations for these 10 tables.
If you have some test environment and would like to try I can help, my solution is not so expensive like a GG.
March 25, 2013 - 3:20 pm UTC
you mean like golden gate. If will only collect data changes for the 10 tables *from the redo*.
hah, triggers - you want to see something run *slower than slow*, go for triggers!!! (and by the way, advanced replication which you already own does that but violates the "do not impact my production site please".)
anything trigger based will definitely, massively and measurably and in so many bad ways affect your production site.