Thanks for the question, Mikhail.

Asked: February 05, 2021 - 7:24 am UTC

Last updated: February 15, 2021 - 3:49 pm UTC

Version: 12.1.0.2

Viewed 1000+ times

You Asked

Hello Chris, Hello Connor.

Is there any way to do a direct path insert when pushing data via dblink?

Our DWH currently truncates and loads data into our tables and it takes way too long. I'm not sure what exactly takes it so long, whether it's the load on their servers or network or anything else, but I am certain that generating undo on our side is unnecessary because the tables can be either empty or in a complete state.

DWH won't allow us to create a dblink to their base, so we cannot make an insert /*+ append*/ on our side.

In other responses I've stumbled into mentions that dblinks are not meant to be a good way to transfer large amounts of data between databases. What is a good way to do that aside from golden gate?

and Chris said...

As the docs say:

A transaction containing a direct-path INSERT statement cannot be or become distributed.

https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/INSERT.html#GUID-903F8043-0254-4EE9-ACC1-CB8AC0AF3423

So I think you're out of luck here.

I'm not sure what exactly takes it so long

Then find out!

Trace the sessions on the local and remote instances before running the insert to see where all the time is going:

exec dbms_monitor.session_trace_enable@remote ( waits => true, binds => true );
exec dbms_monitor.session_trace_enable ( waits => true, binds => true );

You can then parse the trace files using TKPROF to get the plans and a detailed breakdown of their execution. If you're struggling to understand these, post the formatted plans here.

What is a good way to do that aside from Golden Gate?

What exactly are you trying to do? Is this a one-off load or weekly/daily/hourly transfer? What do you consider to be a "large" amount of data?

Rating

(4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Possible Alternative for Direct Path Happiness.

Tubby, February 09, 2021 - 5:53 pm UTC

https://jonathanlewis.wordpress.com/2010/10/07/distributed-pipelines/

Cheers,

February 10, 2021 - 4:58 am UTC

Nice stuff.

Similar solutions also can be a CTAS into a GTT, and then direct-mode insert from the GTT all on the local instance.

Thanks for the question

Mikhail Onishchenko, February 12, 2021 - 10:54 am UTC

Unfortunately I don't have access to trace files even in our database. The best I can get is AWR.

By large amounts I mean tables around 5-20GB and a total amount of data transfered via DBLINK around 150GB per day.

This is a daily process of updating our datamarts:

1. DWH job truncates + inserts tables into our buffer scheme via their dblink to our DB once per day.
2. Our app is using synonyms. Synonyms look either at %TABLE%_A or %TABLE%_B for any given datamart.
3. Our job loads a table from the buffer scheme into an inactive %TABLE%_A/%TABLE%_B(whichever a synonym is not looking at) with another truncate + insert /*+ append */.
4. Then it rebuilds indexes, calculates statistics and switches the synonym.

There is so much replicating of data because we need it available 24/7 and because our DWH refuses to provide us a dblink that we could use to load data from them.

On that topic I have another question. Is it smart to calculate statistics every time with

      DBMS_STATS.GATHER_TABLE_STATS (
        ownname => '"CRMDM"',
        tabname => v_table_inactive,
        estimate_percent => dbms_stats.auto_sample_size,
        degree => dbms_stats.auto_degree
      );

Right now I'm observing 60-80% spikes in CPU/IO wait when lots of indexes are being rebuilt and statistics are gathered, which is undesirable.

We have GATHER_STATS_JOB running daily, and the data doesn't change too dramatically from day to day.

I'm going to set a low parallel degree for now to flatten the workload, but I'm wondering if gathering stats every time is a lot of useless work.

February 15, 2021 - 3:49 pm UTC

The optimizer already gathers stats during bulk load operations (create table as select, direct-path insert ... select). So you can probably skip the manual gather; there are some restrictions and it omits histograms:

https://docs.oracle.com/database/121/TGSQL/tgsql_statscon.htm#TGSQL344

Thanks for the CTAS GTT trick

Mikhail Onishchenko, February 12, 2021 - 1:18 pm UTC

Thanks for the CTAS GTT + direct path insert trick, Connor. Very simple and very effective.

I just tested it in our DB and it shaved off 35% of the execution time. I'm going to try to persuade our DWH developer to implement that.

Regarding my original post, it would seem that the huge execution time was entirely due to the DWH DB being very busy.

They switched their job execution from prod to standby DB and the execution time went from 3-5 hours to 30 minutes for our most problematic datamart.

February 15, 2021 - 3:49 pm UTC

Glad we could help

Thanks for the link

Mikhail Onishchenko, February 19, 2021 - 2:01 pm UTC

Thanks for the link Chris! I didn't know about the bulk load statistics gathering.

Unfortunately in my case I don't think I can afford to ignore histograms, so I used options => 'GATHER AUTO'. It didn't do much for partitioned tables, but still provided a noticeable execution time reduction on regular tables.

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Possible Alternative for Direct Path Happiness.

Thanks for the question

Thanks for the CTAS GTT trick

Thanks for the link

More to Explore

SQL

Direct path insert via dblink

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Possible Alternative for Direct Path Happiness.

Thanks for the question

Thanks for the CTAS GTT trick

Thanks for the link

More to Explore

SQL