You Asked

In regards to materialized view logs. How do they handle bulk inserts?

I saw somewhere it mentions using sequence to establish a range on the materialized view log.

For example:
We have 3 underlying tables

Table A
Table B
Table C

These tables come together to form a materialized view 'ABC'. That materialized view 'ABC'is on demand with a complete refresh. Being refreshed by the nightly batch process that propagates information into Tables A, B, and C.

In order to help improve the nightly refresh process of the materialized view I would like to incorporate materialized view logs on the underlying table and implement a fast refresh instead of a complete refresh. (No need to replicate the same data over and over again since the underlying tables keep growing in size). Lets say we have MVLOG_A now collecting change information on Table A.

My nightly load process of Table A now inserts 10,000 records. If I create my materialized view log on row-id will it simply take that range or process each individual insertion as a records in the MVLOG_A. Which would imply that the underlying trigger is firing multiple times?

Thanks for any input. I just want to make sure I have a thorough grasp of this before I implement anything.
Dave

and Chris said...

Yes, the MV log will record an entry for every single row you insert. It can't rely on rowid-ranges or other tricks because other sessions may be inserting rows at the same time!

Exactly how much data this captures depends on how you create the MV log. A basic log captures minimal information, but you can also tell it to record new values columns and other meta data:

create table t (
  c1 int primary key,
  c2 int,
  c3 date,
  c4 varchar2(100)
);

create materialized view log on t;

desc mlog$_t

Name               Null?    Type            
C1                          NUMBER           
SNAPTIME$$                  DATE             
DMLTYPE$$                   VARCHAR2(1)      
OLD_NEW$$                   VARCHAR2(1)      
CHANGE_VECTOR$$             RAW(255 BYTE)    
XID$$                       NUMBER  

drop materialized view log on t;
create materialized view log on t
  with primary key, rowid, sequence ( c2, c3, c4 )
  including new values;
  
desc mlog$_t

Name               Null?    Type            
C1                          NUMBER           
C2                          NUMBER           
C3                          DATE             
C4                          VARCHAR2(100)    
M_ROW$$                     VARCHAR2(255)    
SEQUENCE$$                  NUMBER           
SNAPTIME$$                  DATE             
DMLTYPE$$                   VARCHAR2(1)      
OLD_NEW$$                   VARCHAR2(1)      
CHANGE_VECTOR$$             RAW(255 BYTE)    
XID$$                       NUMBER

Capturing these extra details can make the refresh faster because it has all the information it needs.

But what effect does it have on the insert?

Let's test:

declare
  start_time pls_integer;
  procedure ins_rows ( num_rows integer ) as
  begin
  insert into t 
    with rws as (
      select level x from dual
      connect by level <= num_rows
    )
      select x, mod ( x, 37 ), sysdate + x, 
             lpad ( 'x', 100, 'x' )
      from   rws;
  end;
begin
  start_time := dbms_utility.get_time ();
  ins_rows ( 10000 );
  dbms_output.put_line ( 
    'Run time = ' || ( dbms_utility.get_time () - start_time ) 
  );
  execute immediate 'drop materialized view log on t';
  execute immediate 'create materialized view log on t';
  execute immediate 'truncate table t';
  start_time := dbms_utility.get_time ();
  ins_rows ( 10000 );
  dbms_output.put_line ( 
    'Run time = ' || ( dbms_utility.get_time () - start_time ) 
  );
  execute immediate 'drop materialized view log on t';
  execute immediate 'truncate table t';
  start_time := dbms_utility.get_time ();
  ins_rows ( 10000 );
  dbms_output.put_line ( 
    'Run time = ' || ( dbms_utility.get_time () - start_time ) 
  );
end;
/

Run time = 246
Run time = 235
Run time =  16

So yes, adding the log does take longer (~2.5s vs one-tenth of a second), but there's minimal difference between the log types. The skinny log is only marginally faster than the one capturing everything.

Is this an acceptable overhead? Only you can answer that!

Test and benchmark to see what effect it has on your tables to assess the trade-off of slower inserts and faster refreshes to faster inserts and (slow?) complete refreshes. Bear in mind that fast refresh will scale better; complete refreshes will get incrementally slower over time (assuming you keep all the data).

Rating

(1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thank you

A reader, February 04, 2021 - 6:13 pm UTC

This was exactly the in depth explanation I was looking for. Makes sense.

Thanks a ton!

February 05, 2021 - 1:53 pm UTC

You're welcome

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Thank you

More to Explore

Design

Materialized View Log Performance on Bulk Loads

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Thank you

More to Explore

Design