You Asked

Hi
The Oracle DB version I am working on is : Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

Its a RAC with 2 instances.

This is about a query which runs quickly (under 15s) under instance #1 but takes between 10-11 mins when it runs under Instance #2.
The query is utilizing indexes as expected.

User provided me the query and I got the sql_ids of its historical runs via DBA_HIST_SQLTEXT.
That gave me 5 sql_ids of the same query run between 20-Jul and 28-Jul.

select sql_id from   dba_hist_sqltext where  sql_text like '%CLAIMTYP%claimtype%');

Observation 1:

Via following query, I discovered that slower runs are on Instance #2, & faster ones on #1.

select distinct h.instance_number,trunc(h.sample_time) run_date,h.sql_exec_start, h.sql_id , h.session_id, h.sql_child_number, h.sql_full_plan_hash_value,
 to_char(trunc(elapsed_time_total/1000000/60/60/24),'9') || ':' ||to_char(trunc(MOD(elapsed_time_total/1000000/60/60, 24)),'09') ||':' ||
to_char(trunc(MOD(elapsed_time_total/1000000,3600)/60),'09') || ':' ||to_char(MOD(MOD(elapsed_time_total/1000000, 3600),60),'09')  as "Time Taken [dd:hh:mi:ss]"
from dba_hist_active_sess_history h,DBA_HIST_SQLSTAT s
 where s.sql_id=h.sql_id and h.instance_number=s.instance_number
and   h.sql_id in ('73b4smcjjk38s','bt8sw7vj42sp3',........);

Observation 2:
In dba_hist_active_sess_history, under column DELTA_INTERCONNECT_IO_BYTES, I found several non-null values under Instance #2 run, which as per Oracle documentation means "Number of I/O bytes sent over the I/O interconnect over the last DELTA_TIME microseconds".
Value of that column is NULL in all other cases where same query ran on Instance #1.
And there are many rows in ASH for slower run sql_ids but faster runs have only couple of rows (each row indicating 10s interval).

Does above observations indicate some issue with server of Instance #2?

Thanks.

and Connor said...

In RAC we have the concept of "node affinity", namely, in a perfect world, a particular piece of data would always be accessed from the same node

Because in a "just started" RAC system, the cost of accessing the data is the same in all instances, ie, reading it from the shared disk.

However, in an actively running system, the costs can be different across nodes because accessing data from buffer cache of the node a query is running on will will be faster than accessing data from the buffer cache of another node. In particular, if there is a LOT of changes to blocks on "remote" nodes, then the workload on remote nodes can be have an impact on queries on a single node, because

a) cross-instance locks take more coordination, and
b) we need to ship data across the interconnect...which adds overhead, and
b) the remote node might be called upon to construct a consistent read of a block

So (for example) you might be running a query on node A and that node has plenty of spare CPU.
Node B on the other hand is being absolutely smashed by some other job.

If the query on Node A needs consistent reads of blocks from node B, then this could take a long time (because Node B is under stress).

Huge values for your interconnect bytes stats indicate that a lot of cross-node chatter is going on to satisfy that request.

Rating

(2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thanks for reply & explanation

A reader, July 30, 2020 - 10:24 am UTC

Query performance difference in RAC instances.
This makes good sense.

Thanks

A reader, May 19, 2022 - 7:45 am UTC

Is there any solution to avoid or solve this case?

May 23, 2022 - 3:42 am UTC

The database should generally take care of it with dynamic remastering, ie, shuffling data to the instance that uses it most.

Check out MOS note 390483.1

You can help this by using services so that particular applications have a "preference" for a particular node.

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database

Breadcrumb

Question and Answer