Skip to Main Content
  • Questions
  • Copy data's from Transaction database (schema) to Reporting database (Schema)

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Rajeshwaran.

Asked: August 05, 2010 - 2:05 pm UTC

Last updated: June 22, 2011 - 12:13 pm UTC

Version: 10.2.0

Viewed 1000+ times

You Asked

Tom:

We are planning to move a copy of our transaction tables data into another database (or schema) for our reporting applications (say every day midnight refresh will happen).
The Transaction tables are Hash partitioned (8 partitions with 400 million).
To make this to happen i am planning to
1) Truncate data from reporting database (or schema)
2) Direct path Insert into reporting database (or schema) as select * from transaction tables.
3) Rebuild index and Enable constraints.

If the Transaction tables are Range partitioned by some Transaction date, then i could directly pull the latest txn data (partitions) rather than pulling the old 5 yrs data's (that will never get modified).

Do you think any other approach better than the above approach? do we have any special utility / API Available in 10g to handle this. I know data pump are there from 10g to do this smarter.

and Tom said...

I would not consider your path at all.


Paths to consider:


- if you want to do the entire reload, I would transport from backup sets.

http://docs.oracle.com/cd/B19306_01/backup.102/b14191/ontbltrn.htm#CACFBHIE

You would be restoring your backups of you OLTP system to your reporting system basically. A lot faster then unloading and reloading/reindexing all of your data.


- Use data guard. Run it open READ ONLY for reporting in the daytime and put it back into managed recovery mode at night to catch it up.


that way you have a failover site as well. If you wanted to in 11g you could use active data guard and even have the reporting instance maintained during the day.


use streams to replicate the changes only - instead of everything

Less resources all around - just move the changes, sort of like data guard above (data guard would be very automated for you, streams you would "program" a little bit)

use basic replication and materialized views

very simple to set up and maintain, some impact on the existing OLTP system that would need to be tested as you would want materialized view logs.


look into change data capture (CDC) to capture the changes only

and apply to your system.



Rating

  (11 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

How about using expdp/impdp?

Daniel Blondowski, August 05, 2010 - 2:22 pm UTC


Tom Kyte
August 05, 2010 - 3:01 pm UTC

that would be "dump and reload"

I would not want to

read all of the data out
write all of the data back
reindex it all


Just move the changes or just copy the datafiles and apply some redo to them (transport from backup)

Using CDC

Rajeshwaran, Jeyabal, August 05, 2010 - 2:45 pm UTC

Tom:

I thought to use CDC for this data refresh from Transaction database (or schema)to reporting database (or schema).To measure the latency my question remains un-answered.

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:8784259916366#2588011900346985554


Do you have any answer to this?
Tom Kyte
August 05, 2010 - 3:02 pm UTC

I don't have anything to add.

Alexander, August 05, 2010 - 3:02 pm UTC

Rajeshwaran,

A silly question but you are aware Oracle has non blocking reads (not like Sybase and SQLServer) so you can actually run reports against the online system unless they are really nasty Hyperion generated or something like that.

I have seen Tom recommend transport tablespaces using RMAN a few times but I have tested them on every platform patch level imaginable and hit bugs every time. So unless I see a cut and paste of it working in 10g, I don't believe it actually works.

How big is this database? I would not even consider using expdb/impdb for anything over 50GB it will take all night.

I like the dataguard option Tom suggested. And if you can get a server in a different data center, you can kill two birds with one stone.

To Alexander

Rajeshwaran, Jeyabal, August 05, 2010 - 3:23 pm UTC

But you look too silly than me. look at my question The Transaction tables are Hash partitioned (8 partitions with 400 million). But i don't want my reporting tables to be Hash partitioned Instead i need them to be Range partitioned by Created_dt and subpartitioned by Hash (based on primary key). As Tom suggest I Benchmarked the queries against Existing Hash partitioned and this new Range/Hash parition, things looks good with this Range/Hash parition. Hence I need to do data refresh from Existing Transaction tables to reporting tables.
Tom Kyte
August 05, 2010 - 4:42 pm UTC

Rajeshwaren -

"it might be a silly question" is an idiom. Nothing harsh was meant by it - not at all.

Alexander, August 05, 2010 - 3:53 pm UTC

"But you look too silly than me. "

I'm sorry? Do you want to try that again in a language you can speak?

"look at my question..."

Yes I see that. See Tom's response?

"I would not consider your path at all."

Mark, August 05, 2010 - 4:41 pm UTC

I am somewhat familiar with Rajesh's reporting database situation. We have recommended a physical/logical standby to our technical team and have been rejected (DR solution only) which really ties our hands. They are giving us a RAC node solution as a reporting DB solution. This is a multi-Tb 10g DB with some intense IO and batch processing requirements.
I am having a difficult time getting any info on real life case studies using CDC. I also see that it is Tom's last choice. Is CDC viable for a high volume batch processing/IO load?
Tom Kyte
August 05, 2010 - 4:45 pm UTC

I personally would prefer data guard (easier way to implement the logical equivalent of streams)

and then streams

and probably would not use CDC for something as simple as "create a copy"


your technical team rejected data guard because they think it is just disaster recovery? Not very technical of them is it...

Try GoldenGate

VicC, August 06, 2010 - 2:48 am UTC

Hi there,

Have you thought of trying GoldenGate. I have been testing it & it seems to work well. You can filter data using the equivalent of where clauses, etc. The data is mined from the log files and sent over the network to your target database - which can be SQL Server or Sybase etc.

Have a look on OTN for more info.
There are some good examples of how to set it up, and see it in action here:
http://gavinsoorma.com/oracle-goldengate-veridata-web/

I am open to correction, but believe it's going to take over from Streams as the main Replication tool.

Regards,

VicC
Tom Kyte
August 06, 2010 - 8:31 am UTC

they have no real sophisticated needs here. They want a copy of an Oracle database to an Oracle database - without anything fancy. They already own more than enough software to accomplish that in a myriad of ways.

Mark, August 08, 2010 - 9:07 pm UTC

This is correct. In this case we are looking strictly for a simple reporting DB. We have suggested GoldenGate as a potential solution for other applications that are requiring more than a simple reporting DB solution. Tx for the suggestion.

Slight twist on original question.

Aaron, September 14, 2010 - 5:25 pm UTC

Tom,
We are considering the very options you have proposed for
this problem with a slight twist. We are indeed looking to
replicate the data for the purposes of reporting but the
reporting database would reside outside of our network in a
partner's data center. I don't believe this necessarily
prohibits any of the suggestions you've made thus far I was
wondering if you believe it warrants a reordering of
preferences from your prior feedback of:


"I personally would prefer data guard (easier way to implement the logical equivalent of streams)

and then streams

and probably would not use CDC for something as simple as "create a copy" ..."

Thanks for the help.

Tom Kyte
September 15, 2010 - 7:51 am UTC

I don't know what your refresh requirements are and what the network between you and the partner looks like and the volume of data and how much of the source database needs to be available on the remote site and the transaction volumes.... So, I cannot prioritize


the list was not ordered in any sort of priority or preference.

Delivering data for reporting

A reader, June 22, 2011 - 11:06 am UTC

We have a situation that is similar to what's been discussed in this thread with a slight twist. We are a software vendor that provides package software (runs on oracle) to customers in various industries. Every customer has a different reporting need that suits their particular profile and industry. As such, instead of buildling an extensive reporting capability into our main application/database that addresses every customer requirement, we are thinking about providing a set of APIs that allow customers to extact some data out of the main database and import them into a target of their choice, which could be any RDBMS or even spreadsheets or XML if they so desired. The goal is to get them the data and they would do whatever they like with it. Most customer needs can be met with a subset of a few large tables in the database per data extraction. The size of the data to be exported could range anywhere from 100 MB to a few GBs at a time.

What's the best approach to implement this type of utility?

Tom Kyte
June 22, 2011 - 12:13 pm UTC

Insufficient data to answer.

firstly - why would they need to take data out to report on it? The SQL database is perhaps the most powerful ad-hoc tool I know of.

But assuming the data needs to come out - what are the real requirements? Just dumping an ad-hoc query to disk? Getting all of the changes to data since X? Getting sets of read consistent data from multiple queries to dump to disk?

At the end of the day, you'll be writing custom code to do this - a client application basically - and using various database features to accomplish it (like read only transactions, XML queries, bulk fetching, etc)...

Same instance, Two Schemas

Cyril, April 01, 2015 - 7:45 am UTC

Hi Tom

We are actually discussing about create a second schema for reporting (a star schema) to simplify the access of the data and do some "complex" aggregations.

Nowadays what would you recommend to ETL ("real-time") data from a schema to another schema on the same instance ? (Streams, CDC, GoldenGate other ?)

We do not want use triggers because it might be lower the performance on the OLTP schema.

Thanks in advance !

Cyril

More to Explore

Data Pump

All of the database utilities including Data Pump are explained in the Utilities guide.