Skip to Main Content

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Ingimundur.

Asked: November 11, 2024 - 11:00 am UTC

Last updated: November 13, 2024 - 8:24 am UTC

Version: 19

Viewed 1000+ times

You Asked

Hi Team,

We are doing some testing on heavy loads, but we are experiencing strange behavior.

We have a process that is doing a lot of updates and inserts on a primarily 2 tables. But some of the aggregate queries have a tendency to start being slow (from being very fast) and doing way more Logical I/O than seems normal for it to do.

During testing we are running on quite homogenous data so we should not see very much difference between each run. I have also tested the scripts on similar data and never gotten slow runs.

All code is run from PL/SQL here. During test, we are running 5 session that each works with one entity, that will have from 2000 to 400000 transactions in the run (always the same number for each entity in each run, although this will not be the case when we come to production).

For 2000, 10000, 20000 and 40000 pr entity (10000, 50000, 100000 and 200000 runs), we have no problems. Most of the scripts for select and update are running within second (even less than a second).

When we enter a 500000 run (100000 pr. entity), we start seeing problems. First 2-3 times the scripts are run, it takes seconds and then the next take minutes (for this amount up to 12-18 minutes). Logical I/O rises from almost 0 to 60000000.

When we enter a 1000000 run (200000 pr. entity), things even get worse. Then almost all of the runs take more than 20 minutes and more often than not, some of the runs never end properly, as we have a 1 hour timeout.

Here is an example code:
WITH source AS (
    SELECT
        MAX(lpad(t2.column6, 7, '0') || '0000') AS aggregate_reference,
        t1.column1,
        t1.column2,
        t1.column3,
        t1.column4,
        t1.column5
    FROM
             tab1e1 t1
        INNER JOIN table2 t2 ON t1.column4 = t2.column4
                                AND t1.initiatorbankid = t2.initiatorbankid
                                AND t1.column3 = t2.column3
                                AND t1.column1 = t2.column1
                                AND t1.column2 = t2.column2
                                AND t1.column5 = t2.column5
    WHERE
            t1.column1 <> 'T'
        AND t1.column7= '2544'
        AND t1.column4 = 2081
    GROUP BY
        t1.column3,
        t1.column1,
        t1.column2,
        t1.column4,
        t1.column5
)
SELECT
    appdata.type_for_reference(pk_column, aggregate_reference)
BULK COLLECT
INTO v_ref
FROM
         table1 tx
    INNER JOIN source ON tx.column4 = source.column4
                         AND tx.column3 = source.column3
                         AND tx.column1 = source.column1
                         AND tx.column2 = source.column2
                         AND tx.column5 = source.column5;


In a 500000 run, table1 has 500000 (100000 pr. entity) records, table2 has 200000 (40000 pr. entity) records. The with clause would return about 50000 records (10000 pr. entity) and the total statement about 375000 (75000 pr. entity).

Before we would have the whole statement including update connected to this one. That never worked.

We have now split it up to 3 runs (2 selects and 1 insert), we run the with clause first into another type and then using that type we get the result from here.

SELECT
    MAX(lpad(t2.column6, 7, '0') || '0000') AS aggregate_reference,
    t1.column1,
    t1.column2,
    t1.column3,
    t1.column4,
    t1.column5
BULK COLLECT
INTO v_info
FROM
         table1 t1
    INNER JOIN table2 t2 ON t1.column4 = t2.column4
                            AND t1.initiatorbankid = t2.initiatorbankid
                            AND t1.column3 = t2.column3
                            AND t1.column1 = t2.column1
                            AND t1.column2 = t2.column2
                            AND t1.column5 = t2.column5
WHERE
        t1.column1 <> 'T'
    AND t1.column7 = '2544'
    AND t1.column4 = 2081
GROUP BY
    t1.column3,
    t1.column1,
    t1.column2,
    t1.column4,
    t1.column5;

SELECT
    appdata.type_for_reference(pk_column, aggregate_reference)
BULK COLLECT
INTO v_ref
FROM
         table1 t1
    INNER JOIN TABLE ( v_info ) source ON tx.column4 = source.column4
                                          AND tx.column3 = source.column3
                                          AND tx.column1 = source.column1
                                          AND tx.column2 = source.column2
                                          AND tx.column5 = source.column5;


This seems to work.
My DBA and I differ on opinion of what is happening. Our system has alot of CPUs, alot of Memory and a powerful machine (exadata). But I have a feeling that the memory portions, especially aggregate part are not properly set up. My DBA means to say that our scripts are badly designed.

The 5 columns from table2 group clause are indexed together on table1. The combination column7 and column1 are indexed on table2. I can't see any way to rewrite the scripts (and the scripts anyways show a quick run when run independently).

Now the question.
How can I see what might be causing this and if some memory settings are not properly set up?

I am sorry for no liveSQL things. We are only seeing this with a lot of data so setting things up there would be difficult.

Regards,
Ingimundur Gudmundsson

and Chris said...

I've no idea what you mean by a 500000 run. What exactly is involved in a run? One update changing 500,000 rows? 500,000 updates changing one row each? Something else? You mention inserts and updates, but I only selects in the examples provided.

We'll need more clarity around what the process is doing to give good insights. But here's a complete guess that may be relevant.

Statements can only see changes committed by other sessions. If you run a query and another session has uncommitted DML changes on that table, the database has to reconstruct how the table looked before these changes.

It does this by working back through the undo. If there are lots of uncommitted changes, it has to do lots of work.

For example, this create a table, inserts a row into it, and then counts these rows:

create table t ( c1 int );
insert into t values ( 1 );
commit;

set serveroutput off
alter session set statistics_level = all;
select count(*) from t;
select * from dbms_xplan.display_cursor( format => 'ALLSTATS LAST');

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |       7 |
|   2 |   TABLE ACCESS FULL| T    |      1 |      1 |      1 |00:00:00.01 |       7 |
-------------------------------------------------------------------------------------


As expected, this only needs a few I/O to read the table.

Now start a second session and update the table 1,000 times:

begin 
  for i in 1 .. 1000 loop
    update t set c1 = c1 + 1;
  end loop;
end;
/


Leave these changes uncommitted.

Go back to the first session and rerun the count. The database has to undo all 1,000 of these changes and the query does just over 1,000 I/Os:

select count(*) from t;
select * from dbms_xplan.display_cursor( format => 'ALLSTATS LAST');

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |    1008 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |    1008 |
|   2 |   TABLE ACCESS FULL| T    |      1 |      1 |      1 |00:00:00.01 |    1008 |
-------------------------------------------------------------------------------------


If you increase the number of updates in the second session, you'll see the query in the first do more and more work => slower queries. Eventually it'll start reading from disk which will slow things even further.

Like I said - this is just a guess, but is consistent with the symptoms you describe. Please share more details on what the process is doing to confirm (or deny!) this and discuss solutions.

Rating

  (4 ratings)

Comments

Thanks for the answer, little bit more explaination.

Ingimundur K. Gudmundsson, November 11, 2024 - 3:14 pm UTC

Hi Chris,

Thanks for the quick answer. I will try to explain what I mean by 500000 run. :)

It just means that the bigger table (table1) has 500000 records and the smaller table (table2) has around 40% of that.

The tables are both used for staging and manipulation of that staged data.

5 processes run at once, each one "owning" a part of the data. Each process will never manipulate any other process's data.

The data is changed a number of time during the process and is committed each time the java application takes control.

Can small redo logs cause this problem? We seem to have a huge number of redo switches during the big runs (up to 200 pr. hour).

Regards,
Ingimundur K. Gudmundsson
Chris Saxon
November 11, 2024 - 5:21 pm UTC

This problem is about the number of DML statements in other sessions. Notice in the example above running 1000 updates => the query does an extra 1000 I/Os.

The data is changed a number of time

I think this is the problem - how exactly are the data changed? How many times does it change?

Small redo logs may also be an issue - 200 switches/hour is excessive. Lots of small changes will generate more redo than few large changes though. Changing the process is probably the solution here.

Calling user defined functions..

Rajeshwaran Jeyabal, November 12, 2024 - 11:44 am UTC

Also, please don't forget that you are calling an user defined function in the Select statement

SELECT
    appdata.type_for_reference(pk_column, aggregate_reference)


which could lead to context switches between sql and pl/sql.
so can you tell us a bit more about the implementation of the function to understand why can't that be replaced with pure SQL based solution.

Ingimundur K. Gudmundsson, November 12, 2024 - 1:38 pm UTC

Thanks Chris,

We will look into holding onto some of the changes in collections before updating to the tables. Probably we have too many updates going on at the same time, even though they are not working on the same records.

@Rajeshwaran
This is a type, not a function. This is not causing any context switching.
Chris Saxon
November 12, 2024 - 2:07 pm UTC

Yes, that's probably the problem. For the scenario I've described above, what matters is whether the queries access rows with uncommitted updates in other sessions.

How about GTT

Rajeshwaran, Jeyabal, November 13, 2024 - 4:55 am UTC

the resultset of the first query is populated into this collection "v_info"
and it was used in the second query like this

SELECT
    appdata.type_for_reference(pk_column, aggregate_reference)
BULK COLLECT INTO v_ref
FROM table1 t1
INNER JOIN TABLE ( v_info ) source 
 ON tx.column4 = sourc


so when you read from the collection, the optimizer estimates it to 8K rows by default, possible to get Global Temp table there? with that we can gather statistics to produce better estimates to optimizer.

does this helps here?
Chris Saxon
November 13, 2024 - 8:24 am UTC

I don't think optimizer estimates are the problem here - but I could be wrong. We really need a complete description of the process to give the best help.

More to Explore

DBMS_XPLAN

More on PL/SQL routine DBMS_XPLAN here