Home>Question Details



Serge -- Thanks for the question regarding "OLTP to Datawarehose", version 9.2.0.4

Submitted on 27-Feb-2005 20:29 Central time zone
Last updated 16-Apr-2009 10:14

You Asked

Tom,

Given a production OLTP database with several huge partitioned tables, each of them with 
potentially 1 milion transactions per day, we need to copy/transfer nightly to a separate 
Data Warehouse those transactions (occured in OLTP from the previous copy/transfer 
process).
The purpose of the Data Warehouse is reporting, so the tables over here might have a 
similar structure to OLTP tables or might not.
Complex reports and queries will run against Data Warehouse and the execution as fast as 
possible is a requirement.

Which technology would you choose to transfer every night these transactions from OLTP to 
DW, considering that it's a huge volume of data to be copied, loaded, probably 
transformed or adjusted, and the OLTP must remain available to DML/DDL operations all 
this time.

Could you detail as much as possible ?

Thank you, 

and we said...

I might be looking at streams -- while the number 1,000,000 looks very large, it is a 
fairly small number in the year 2005.  This is not an intimidating amount of data to be 
moved. (not trying to downplay it, just saying that it isn't as overwhelming as it 
sounds)


streams will allow you to mine the redo log, capture the changes, stage the changes in a 
"message queue", perhaps even transforming the information (pre-process it).  The data 
warehouse would "subscribe" to this information and dequeue the messages and apply them.

http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96571/toc.htm
 

Reviews    
2 stars what if huge update   February 28, 2005 - 10am Central time zone
Reviewer: Serge from Waterloo, Ontario
Thank you for your answer Tom.

One of my hesitation regarding Oracle Streams is related to updates. I didn't work yet a serious 
project based on Oracle Streams but I saw in one of the replies of your "asktom" threads that there 
is a problem with Streams:

If I have in my OLTP a huge update operation, which is going to update aprox 1 mil. rows, with 
Oracle Streams a sort of log will be generated, and on DW side, not the update operation itself 
will be "reproduced" but based on that log, one by one the rows on the DW table will be updated. 
If on OLTP side, by using a bulk update, doing all kind of tricks to improve performance it takes 
30-45 min to do the update, then on DW side wil take forever to update the rows one by one. That is 
unacceptable.

Please comment on that. 


Followup   February 28, 2005 - 10am Central time zone:

you have 1,000,000 transactions -- do they do bulk updates (like do these transactions update 
10,000 rows at a time??) ?

And even if they do -- short of doing the bulk update in both places (eg: replicate the processing, 
not the data) all replication techniques are going to devolve into row by row modifications. 

2 stars if not Oracle Streams then Exp/Imp   February 28, 2005 - 11am Central time zone
Reviewer: Serge from Waterloo, ONTARIO
Tom,

I understand from your answer that Oracle Stream will take the rows one by one and it will update 
in this manner the DW. It will take unacceptable amount of time.

If that's so, then what do you think about the following approach as an alternative:

Export from OLTP the subset of rows (inserted/updated) occurred since last export. The SELECT used 
over here will use of course, a timestamp: give me the rows updated/inserted between DATE1 and 
DATE2.

The Import will do its job if it is the case into a temporrary stage area, some transformations if 
required, and then in the final tables. We can use in this way performant things like bulk updates 
and inserts.

A set of Korne shell scripts and probably cronjobs will control the export/import.

It is not the classic incremental exp/imp I'm talking about.
It is somehow a "customized" solution, tailed on specific situation.

Please advice.
Thank you 


Followup   February 28, 2005 - 12pm Central time zone:

rather than solve hypotheticals and write lots of stuff to maintain -- I would use the builtin 
capability (no lie there, I would)

millions of transactions is "not a lot"

tell me -- do you update the same row 1,000,000 times in this system?

there are serious issues with that timestamp concept. 

2 stars   February 28, 2005 - 4pm Central time zone
Reviewer: Serge from Waterloo, ONTARIO
No, Tom I am not updating the same row.

Here is a usual daily scenario in OLTP production: files with payments are coming in, they get 
loaded (aprox. 1,000,000 rows/file), a response in a form of xml is going back, the 
client-institution analyses those payments from xml, makes some decisions and generates back 
another file. This new file is arriving to our OLTP, gets loaded as well, but this one updates the 
records of the first one with whatever is inside, based of course, on some business rules. This is 
one complete cycle in our OLTP.

I know that 1,000,000 rows/day/table is not that big deal, however there are about 12 partitioned 
tables involved in 6 similar cycles in OLTP.

1 update process is taking roughly 45 min as I mentioned, using any possible tricks to make it 
faster.

If I replicate with Oracle Streams this operation on DW side we're not gonna finish in reasonable 
time. I mean, if the process of refreshing DW is going to happen during the night, it must be done 
until morning when the client will interogate the DW and run the reports.
 


Followup   February 28, 2005 - 5pm Central time zone:

so, they you have 1,000,000 row updates to propagate -- your concern (move only the last change of 
the row) would be valid perhaps if you update the same row(s) over and over and over -- and instead 
of having 1,000,000 row changes -- you have say 10,000 (you update each row 100 times)


Why won't you finish i a reasonable time?  You'll probably finish faster than the OLTP process as 
you don't have to validate data, parse data, slow by slow process each bit of data.  You just have 
to propagate the changes. 

2 stars   February 28, 2005 - 5pm Central time zone
Reviewer: Serge from Waterloo
I'm sorry Tom, here I'm afraid I don't follow you.
 We are somehow back at the begining of discution. 

How I'm going to propagate 1,000,000 updates by using Oracle Streams in a reasonable time, when on 
DW side, this guy is doing the update one by one ?

Something I miss here ... 


Followup   February 28, 2005 - 6pm Central time zone:

yes, but what else?  you dump and load, you'll be one by one?  or am I missing something? 

2 stars   February 28, 2005 - 6pm Central time zone
Reviewer: Serge from Waterloo
By dumping and loading in DW, at least for inserts (maybe for updates also), I can use bulk 
inserts, external tables, insert as select ... and so on. Inserts in this way will be definitely 
faster than Oracle Stream if OS will insert rows one by one. 

Anyway, I don't know what to say more than that. It looks to me that every modern replication 
concept is not fast enough on the destination side, so, the old out-of-fashioned export/import is 
the guy who can save me. 


Followup   February 28, 2005 - 6pm Central time zone:

have you actually benchmarked it?

export import denote to me "slower than sin"... if you mean the tools export and import. 

2 stars How about CDC?   March 1, 2005 - 2am Central time zone
Reviewer: Sunil Shetty from Malacca, Malaysia
Hello Serge,

 I guess Oracle Change Data Capture will be useful for you.
Check out current Oracle Magazine 

or

http://www.oracle.com/technology/oramag/oracle/03-nov/o63tech_bi.html
 


3 stars Change Data Capture   March 1, 2005 - 10am Central time zone
Reviewer: Serge from Waterloo, ONTARIO
Thank you for this interesting ideea.

I read that article, and I do have 2 questions:

It looks to me that the Data Warehouse is installed just on another Schema within the same database 
as OLTP Schema.

But what if DW is on another differrent physical machine and differrent database ?

For example how you gonna manage the step:

EXECUTE -
DBMS_LOGMNR_CDC_SUBSCRIBE
  .PREPARE_SUBSCRIBER_VIEW( -
    SUBSCRIPTION_HANDLE =>:sh_sales, -
    SOURCE_SCHEMA=>'SALES_OP', -
    SOURCE_TABLE=>'PRICE_LIST', -
    VIEW_NAME => :sv_price_list); 

If DW is on another database ? There is no reference to database. Do we have to use database links 
all over the places, or it just doesn't work ?

Second question: 

In the mentioned article there is the following statement:

"With CDC, data extraction takes place at the same time the INSERT, UPDATE, or DELETE operations 
occur in the source tables, and the change data is stored inside the database in change tables. "

Because I have to load files into the OLTP production database within a 2 hours window, how is 
going to impact performance the above statement ? Is it going to be a big impact or a neglijable 
one ? 


Followup   March 1, 2005 - 11am Central time zone:

only one word for you:

benchmark.


you are rejecting some things out of hand, like streams because it will be too slow, without 
actually knowing if that is true or not.

in 9i, the CDC is syncronous (and row level, at the row level, just like the other technology, only 
it is captured as part of the local transaction instead of being mined from the redo)


http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96520/toc.htm
has the technical details on CDC 

5 stars   March 1, 2005 - 11am Central time zone
Reviewer: A reader 
Tom,

Can Logical Standby Database be a solution for the original poster of the question.

Thanks. 


Followup   March 1, 2005 - 11am Central time zone:

logical standby is built on streams as a foundation technology.  If they want a row by row, column 
by column copy -- logical standby automates this, yes. 

3 stars CDC   March 1, 2005 - 11am Central time zone
Reviewer: Serge from Waterloo, ON
Yes, you're right. Only testing will answer to performance questions. 

However, regarding CDC, I didn't get the answer: is it working with DW sitting on another database 
on another machine or not ? If yes, DB Links should be used ?

Thank you. 


Followup   March 1, 2005 - 11am Central time zone:

you would use db links with CDC in a distributed environment. 

log into the database with the changes and push them through a dblink. 

4 stars Would transportable tablespaces help?   March 1, 2005 - 12pm Central time zone
Reviewer: Rich from Montreal
Hi Tom,

Would it be faster to copy the entire tablespaces (using export) to the Data Warehouse instead?

Rich 


Followup   March 1, 2005 - 12pm Central time zone:

so rebuild the dw ?  thought we were looking for a changes only approach 

2 stars distributed environment   March 1, 2005 - 12pm Central time zone
Reviewer: Serge from Waterloo, ON
On the DW side, is it correct to say:

EXECUTE -
DBMS_LOGMNR_CDC_SUBSCRIBE
  .PREPARE_SUBSCRIBER_VIEW( -
    SUBSCRIPTION_HANDLE =>:sh_sales, -
    SOURCE_SCHEMA=>'SALES_OP', -
/* usage of DBLINK in this way is correct or not ?? */
    SOURCE_TABLE=>'PRICE_LIST@dblink_name_to_OLTP', - 
    VIEW_NAME => :sv_price_list); 

I couldn't find any specific example on using CDC between differrent databases.
 


Followup   March 1, 2005 - 12pm Central time zone:

you log into the local database, you have all of the changes views there, you can dump them out, 
you can push them through a dblink to the DW, whatever. 

4 stars Light comes   March 1, 2005 - 12pm Central time zone
Reviewer: Serge from Waterloo, ON
Now I think I understand what you meant:

In the context of what you just said, is it correct the following approach ?

 - On the same database as OLTP, I can build another Schema (as Subscriber) where I will put the 
subscriber's stuff: CDC Window, subscriber Views and so on. 
 - Keep the change tables and the publisher on OLTP Schema

 - Between the views from Subscriber's Schema and the remote DW database push data through DBLINKS.

Is this the way how you see the usage of CDC in distributed env. ?

Thanks again 


Followup   March 1, 2005 - 12pm Central time zone:

either something like that or a "dump, move, load" 

4 stars   March 1, 2005 - 1pm Central time zone
Reviewer: Serge from Waterloo, ON
Thank you very much for your expertise. 

And also many thanks to Sunil Shetty from Malacca, Malaysia who came with his CDC suggestion.

CDC in distributed environment sounds as a viable solution, at least on my case. 


3 stars OLTP to Datawarehose   March 28, 2005 - 12pm Central time zone
Reviewer: Lalit Dhingra from Gurgaon, Haryana, India
Just got to know 2 news things. Streams and CDC. Let me see how can I use them in my situation. 
Thanks. 


4 stars How to differentiate between a OLTP & DSS system   January 7, 2006 - 5am Central time zone
Reviewer: Baiju_P from India
Sir,

I wanted to know how to differentiate between an OLTP and DSS system.  Eg: Asktom, Goolge, Yahoo 
.... What type of database is it really.. Is it OLTP or DSS type database.

What are the things to know so as to differentiate between such systems.

Thanks 


Followup   January 7, 2006 - 10am Central time zone:

asktom is basically an OLTP system - quick, short, fast, well defined transactions.  
It has some DSS features as well - but you don't see them, I do (my reports).

Google - the search engine - I would classify that as a transactional system.  Run quick search - 
customize output based on search (ads) - show results.

Yahoo - pick one of the applications first (same with google really).  Probably many of them would 
be classified as transactional systems.


 

2 stars CDC - What you see is NOT what you get   May 15, 2006 - 10pm Central time zone
Reviewer: Randy from Chicago,IL
Oracle is packaging CDC under the banner of Real Time ETL process. But the purpose of staging area 
is some what defeated if you have to put subscriber tables in same OLTP location. We need a staging 
database where we can dump data and do some trasformation and of course we want to stay away from 
OLTP database while doing any database intensive job for DW only. Surprisingly oracle's own 
documentation showing staging which is same as subscriber database in CDC as separate database. For 
example 
http://www.nyoug.org/lisigcdc.ppt#2
But in reality they are using both data store (staging and oltp) in same location. I was 
exasperated by reading documentation of CDC and already on my way to a design to managemnt how to 
make real time dw. Now I am fidnding myself in a very compromising situation. Is oracle trying to 
change meaning of staging database ?
 


3 stars cdc ? stream ?   April 12, 2007 - 3pm Central time zone
Reviewer: Yoav 
Hi Tom,

Once per hour , i need to transfer date from OLTP to data warehouse.
The process include joining two huge partitioned tables.
The result of this join is about 100,000 rows.

1. Does cdc or streaming allow joining two tables and
  transfering only the results?
2. What seems to be a proper solution CDC or STREAM ?

Thanks


Followup   April 13, 2007 - 12pm Central time zone:

I don't understand the need for the join - so I cannot answer the question.

cdc and streams both will allow you to capture and process the row level changes to a set of tables. Knowing that - that you have the changes made to the two tables - only you can answer whether that is sufficient to complete your processing....
3 stars cdc ? streams ?   April 13, 2007 - 1pm Central time zone
Reviewer: Yoav 
Hi Tom,

" I don't understand the need for the join - so I cannot answer the question. "

To make it simple, lets say i DONT need to transfer each and every change that made to the following
two table, and i need to transfer only the results of this query:

SELECT emp.empno, emp.ename, emp.hiredate, dept.dname, dept.loc
FROM  emp, dept
WHERE emp.deptno=dept.deptno
AND  emp.hiredate > Last_Day(Add_Months(sysdate,-1))

Can i transfer the query results, using CDC/STREAMS ?

Thank You.


Followup   April 13, 2007 - 7pm Central time zone:

give better example - that would depend on when you executed that query and that just cannot be correct.


using streams, you can get the logical change records, and see in your custom apply routine if you want to apply it or not.


Seems like you should just be using a materialized view perhaps however?
4 stars Use Shareplex from Quest Software   April 15, 2007 - 5am Central time zone
Reviewer: Mihai Ghita from Bucharest, Romania
I would try to test shareplex.

Is platform and oracle version independent. Has been tested for more than 8 years. Works with all platforms from 7.3.4 to 10g R2 in any combination for all hardware platforms oracle certified.

I've done many implementations for heavy OLTP systems to DW and I'll recommend it.

Mihai


5 stars reader   May 1, 2007 - 2pm Central time zone
Reviewer: A reader 
It seems CDC works on the streams technology
That is, it track changes on an oracle table of the changes using steams technology while DMLs are taking place.

Is it possible to SQL*Load a flat file from mainframe into a oracle table T1.

After 0ne week, if I SQL*Load a flat file extractd from the SAME table of a mainframe into an oracle table T2

Can I compare T1 and T2 and find the differences using CDC



Followup   May 1, 2007 - 4pm Central time zone:

... Is it possible to SQL*Load a flat file from mainframe into a oracle table T1. ...

sure...


but you would not use CDC to "find the differences" between two flat files. change data capture is used to capture changes made to an Oracle database, you are not making changes to an Oracle database.

However, if you have a file and load it up into a table, Oracle can certainly "compare two tables" for you using sql quite readily.
5 stars Change tracking with orarow_scn   October 6, 2008 - 2pm Central time zone
Reviewer: bakunian from OC, CA
Tom,

1. Do you think orarow_scn can be used to track changes?
For example on OLTP side enable orarow_scn on staging side create table that will store only last 
orarow_scn value. Then ETL extract from OLTP will only pickup rows that have their orarow_scn newer 
than the staging orarow_scn.

2. What would happen if I use Oracle 10gR2 Asynchronous AutoLog Archive CDC feature and prod DBA 
clears redo log?


Thank you for your feedback.


Followup   October 7, 2008 - 1pm Central time zone:

1) it could be, but at a huge cost (full scan every time). I'd rather use almost any other documented approach (streams for example)

2) if you clear it before it could be used, you will break the stream - just like you would obviate all of your backups
4 stars Change tables, data clean up and sizing considerations   November 14, 2008 - 7am Central time zone
Reviewer: Jayadevan from India
Hi Tom,
1)
How much volume do the change tables occupy? It will definitely depend on the coulmns and data. In 
addition to that, won't it depend on the duration for which data will be kept in those tables?
2) How can we manage when/how the change tables are purged/cleaned?
Regards,
Jay


5 stars CDC and referential integrity   November 24, 2008 - 3am Central time zone
Reviewer: Jay from India
Hi Tom,
We are planning to move changed data from our OLTP system to a data mart using CDC. Here is one 
doubt I have about the design. Let us say we have 2 tables that provide data to the Customer 
dimension - Customer master and Customer address. When the customer record is created with address 
details, both records are available in the subscriber view. The tables are joined and data is 
populated in DIM_CUSTOMER - name coming from master table and address coming from child table. Now, 
the customer address is changed in OLTP. If we extend the window,the subscriber view for customer 
master has no records and view for customer address has one record. We can not join these two views 
to populate DIM_CUSTOMER. Could you please suggest what the right approach is?


Followup   November 24, 2008 - 6pm Central time zone:

I'm not following why you would even join them?
3 stars CDC   November 24, 2008 - 11pm Central time zone
Reviewer: Jay from India
HI,
These are the transaction tables
create table cust_mst(
id number primary key,
name varchar2(50),
dept  varchar2(50));


create table cust_add(
id number
 ,addr varchar2(50),
constraint fk_cust_mst foreign key(id) references cust_mst(id))


DW table
create table dim_customer (ID_CUSTOMER  number , 
-- surrogate key
customer_prod_id number, 
-- production id 
name varchar2(50),
dept  varchar2(50),
addr varchar2(50)  
-- address column
)


I would be doing a 
select 
A.id customer_prod_id ,
A.name name,
A.dept department,
b.addr address
from 
cust_mst_change_view a,
cust_add_change_view b
where a.id=b.id
to get the data to populate DIM_CUSTOMER
The actual tables have many more columns.
It is possible that only cust_mst_change_view has data for a particular customer or 
cust_add_change_view alone has data. 
Usually I would depend on change date/timestamp columns in the source system to pull necessary 
data, ensuring referential integrity. In this scenario, source system is not tracking that, so I am 
thinkng about CDC.


Followup   November 25, 2008 - 10am Central time zone:

Like I said - I'm still not sure why you would join - normally, people would just store in a "star schema" (that is, your existing tables look right) I don't get why you are joining????

You refresh the cust fact table
You refresh the cust_address dimension table

If you feel the need to denormalize, you'll have to basically full outer join (nasty) and pick up the lastest set of information for the missing data from the base tables.

Since you want the latest change for the customer record to be joined with the address record, seems to make sense to join the address change view to the cust_mst table and the customer change view to the address table (to always get the latest/greatest)

What happens when you have two updates to a cust_mst record (and there are two records in the view) and two updates to a cust_addr record (and there are two records in the view) - your join will get 4 records, a cartesian join.

I'm not sure you have thought this all of the way through - the join still confuses me (as to "why")
5 stars CDC   November 26, 2008 - 12am Central time zone
Reviewer: Jay from India
Hi Tom,
You got the reason - denormalising. May be I should have mentioned that word in my explanation and 
it would have been easy. The customer data is available in a few tables in the transactional 
system. It is a frequent flyer program database. It has corporate and individual customers. Since 
the attributes are different, the source has a few tables to capture all the attributes- both for 
individual and corporate customers. That is why I have to denormalise to gather all the information 
that needs to go into DIM_CUSTOMER.
Joining with source tables - is an option. There could be different scenarios - right? Only address 
changed, no changes to other master data, master data changed, address has not changed and so on. 
It will again get a bit complex.
The datawarehouse will be in a data centre in USA and transactional db in UK. So connecting via a 
DB link - not very sure about performance. Is there some way of ensuring referential integrity in 
change tables? To me, that seems to be the best way out. If there is any change in any one table, 
data will get into the change tables for all related tables too (even if data in those tables have 
not changed). Like a mateerialized view with many underlying tables. Any other ideas are welcome. 
As of now I am proceeding with an approach that will involve many steps to load just one dimension, 
not very convincing.



Followup   November 28, 2008 - 4pm Central time zone:

I still don't see why you would join the changes to the changes.


and it seems you really wanted an incrementally maintained materialized view - why do this with cdc - it seems way harder.

there are three scenarios

o customer record changed
o address record changed
o customer and address record changed

seems you would want to process the customer records first

merge into your_copy
using (select * from customer_changes ) cc
on (join keys)
when matched then update (customer information)
when not matched then insert (customer information)


and same with address - no?
5 stars CDC   November 26, 2008 - 12am Central time zone
Reviewer: Jay from India
Hi Tom,
Please read my post just above this post. I am trying to clarify it a bit more. Both the tables I 
mentioned- Custmst and custaddr, have reference data. So I need to join them to get data for my 
DIM_CUSTOMER (denormalising?). The fact data would come from accrual/redemption and related tables. 
I have not reached that far yet.


5 stars CDC   December 4, 2008 - 3am Central time zone
Reviewer: Jay from India
Hi Tom,
Thanks a lto for the reply. I wish there were only 3 tables involved. I am following SCD2 (new record) for some columns like address) , SCD1 for some coulmns (like e-mail, telephone number) and so on. I am not sure how I can use an MV for these. The tables involved are joined like this -
from stg_cusmst A
inner join stg_cusindinf B on B.CUSNUM = A.CUSNUM and
                            B.CMPCOD = A.CMPCOD
inner join stg_cuscntinf C on C.CMPCOD = A.CMPCOD and
                            C.CUSNUM = A.CUSNUM and C.ADRTYP = B.PRFADR
left outer join stg_comctrmst L on L.CTRCOD = C.CTR and L.CMPCOD = C.CMPCOD
left outer join lkp_region N on N.COUNTRY_CODE = C.CTR
left outer join  stg_cuscntinf D on   D.CMPCOD = A.CMPCOD and
                            D.CUSNUM = A.CUSNUM and D.ADRTYP <> B.PRFADR   
left outer join stg_comctrmst M on M.CTRCOD = D.CTR and M.CMPCOD = D.CMPCOD
left outer join lkp_region O on O.COUNTRY_CODE = D.CTR
left outer join stg_comonetim E on E.CMPCOD = B.CMPCOD and E.FLDVAL = B.MEMTLE and 
                           E.fldcod = 'common.member.title'
left outer join stg_comonetim F on F.CMPCOD = B.CMPCOD and F.FLDVAL = B.INCBND and 
                           F.fldcod = 'common.member.incomeBand'
left outer join stg_comonetim G on G.CMPCOD = B.CMPCOD and G.FLDVAL = B.IDTTYP and 
                           G.fldcod = 'customer.profile.typeofindustry'    
left outer join stg_comonetim H on H.CMPCOD = B.CMPCOD and H.FLDVAL = B.MEMGND and 
                           H.fldcod = 'customer.profile.gender' 
left outer join stg_comonetim I on I.CMPCOD = B.CMPCOD and I.FLDVAL = B.MRLSTA and 
                           I.fldcod = 'customer.profile.maritalStatus' 
left outer join stg_comlngmst J on J.LNGCOD = B.PRFLNG and J.CMPCOD = B.CMPCOD
left outer join stg_comctrmst K on K.CTRCOD = B.MEMNLY and K.CMPCOD = B.CMPCOD

I am not going into the business significance of the columns/tables and filters like those on stg_comonetim which you will see many times.
Even with these (especially the SCD - different approach for different columns), would you suggest MV? Please advice

By the way, have a great time in India, wish I could be there in Bangalore to attend your sessions. Our company is in cost-cutting mode, so can't make it.

Followup   December 9, 2008 - 11am Central time zone:

depends on the relationships with the outer joins - are they 1:1 optional or 1:m

and it seems you want MERGE doesn't it, not selecting from a JOIN.
5 stars CDC   December 18, 2008 - 7am Central time zone
Reviewer: Jay from India
Hi Tom,
I still have doubts about MERGE approach since we will have to do merges with 6-7 tables for some 
of the dimensions. 
Anyway, we decided not to use CDC. We split the design into two parts - for cases where we have 
tables with up to a million records, we will use MERGE . We are using a tool called Pentaho for 
ETL. So I am not sure if it uses MERGE command or not - but effectively, that is what will happen. 
For big tables (with more than a million records - you may say those are not big :) ), the source 
system owners agreed to have accurate timpestamping of the records. We sill use those to eliminate 
old records.
Thanks for your suggestions


4 stars Streams on DWH Vs noarchive log mode   April 5, 2009 - 2am Central time zone
Reviewer: Yossi from Israel
Since you offered using streams, I am wondering about the effect on my DWH environment moving it to 
noarchive log mode


Followup   April 7, 2009 - 5am Central time zone:

I couldn't follow your question...

If you move to noarchive log mode, you will not be using streams to capture changes.

If you have a DWH, you typically do not capture changes from it.

So, not really sure what you are asking
2 stars   April 7, 2009 - 9am Central time zone
Reviewer: A reader 
My mistake "noarchive log mode" should be written "archive log mode"
Currently our DWH is running with no archive log mode, I am planning to implement CDC from my 
operational database to my DWH DB.
Since the DWH is going down for a cold backup on weekends, the changes from the operational system 
could not be captured.
It looks like I will have to put the DWH in archive log mode
Is it acceptable that the DWH will work in archive log mode?
is there any other solution ?


Followup   April 13, 2009 - 12pm Central time zone:

... Is it acceptable that the DWH will work in archive log mode? ...

of course, some things will change - you will generate REDO for direct path operations (which you are not doing so now - a create table as select is not generating redo for you now, it will when you go into archive log mode). Expect that, size for that - make sure you have sufficient online redo log capacity and contention free archive destinations.

... Since the DWH is going down for a cold backup on weekends, the changes from the
operational system could not be captured.
...

I did not understand the connection between the two. So the data warehouse goes offline, why does that prevent you from capturing changes?
4 stars Sreams question   April 16, 2009 - 3am Central time zone
Reviewer: yoav ben moha 
Hi Tom ,

"I did not understand the connection between the two. So the data warehouse goes offline, why does 
that prevent you from capturing changes? 
"

If the target instance (data warehouse) goes offline for the weekend , and the source instance 
generate alot of archives during this time - they could not be applied during that time.

After 16 tries the propogation process will change it status to abort :
(http://www.mcs.csueastbay.edu/support/oracle/doc/10.2/server.102/b14229/strms_qpmon.htm)

More than that , since RMAN 10g is stream aware , most of the archive logs in the source instance 
will keep on the disk , and could not backedup to tape - which can lead to 
an archive error in the source instance.



Followup   April 16, 2009 - 9am Central time zone:

and after 16 tries, the job 'breaks' and you reset it - so what????


you would not lose a thing. The database is sort of there to ensure "data is not lost"


so, back to square one - why does that prevent you from ANYTHING.
4 stars Streams - Noarchivelog - DWH   April 16, 2009 - 10am Central time zone
Reviewer: yoav ben moha 
Hi Tom,
Thank you for your feedback.

"
so, back to square one - why does that prevent you from ANYTHING. 
"

Yet , if the source instance is in Archivelog mode and the target instance (DWH) is in NoArchivelog 
mode ,
and as i wrote since RMAN 10g is stream aware , most of the archive logs in the source instance 
will keep on the disk , and could not backedup to tape - which can lead to 
an archive error in the source instance.

The only way i may prevent it from happening is:
1. move the target instance to Archivelog mode - which 
   definitely lead to performance downgrade.
2. Make sure that there is suffient space in the archive 
   log directory in order to prevent the archive log 
   directory to became full. 

Thanks Again






Followup   April 16, 2009 - 10am Central time zone:

if you use streams, streams mines the redo and puts it into a message queue, not sure where you are going here - streams runs with the source database.


1) why? why would you move the target, destination to archive log mode for this??? The SOURCE database needs to be.


2) see initial comments.

Write a Review
 


All information and materials provided here are provided "as-is"; Oracle disclaims all express and implied warranties, including, the implied warranties of merchantability or fitness for a particular use. Oracle shall not be liable for any damages, including, direct, indirect, incidental, special or consequential damages for loss of profits, revenue, data or data use, incurred by you or any third party in connection with the use of this information or these materials.

About Oracle | Legal Notices and Terms of Use | Privacy Statement