Skip to Main Content
  • Questions
  • Materialized view fast refresh with joins - what work is done where?

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, John.

Asked: February 14, 2018 - 8:35 am UTC

Last updated: February 15, 2018 - 11:31 am UTC

Version: 12.1

Viewed 10K+ times! This question is

You Asked

If I have two tables, both with materialized view logs, and I join them together in a fast refreshable materialized view, when that MV is refreshed, where is the work done to eliminate records that have not been deleted, but should no longer be returned due to predicates in the MV? For example, I have a table with a date in it, and I join this to another table where col > date_table.date_column:

create table xxjk_master
as
select level id, date '2017-01-01' + mod(level,365) creation_date from dual connect by level <= 1000000;

alter table xxjk_master add constraint xxjk_master_pk primary key (id);

create table xxjk_date
as
select sysdate - 100 refresh_date from dual;

alter table xxjk_date add constraint xxjk_date primary key (refresh_date);

create materialized view log on xxjk_master with rowid;

create materialized view log on xxjk_date with rowid;

create materialized view xxjk_tst_mv 
refresh
on demand
fast
as
select i.rowid i_rid, d.rowid d_rid, i.id, i.creation_date from xxjk_master i, xxjk_date d
where i.creation_date > d.refresh_date;

select count(*) from  xxjk_tst_mv;


I update rows in the master table and as expected they come through into the MV log.

insert into xxjk_master (id, creation_date) values (-1, sysdate);
insert into xxjk_master (id, creation_date) values (-2, sysdate-500);

select * from  MLOG$_XXJK_MASTER;

select * from  MLOG$_XXJK_DATE;


I understand that on a fast refresh the MV just pulls this delta through.

exec dbms_mview.refresh(list=>'xxjk_tst_mv', method=>'F');

select count(*) from  xxjk_tst_mv;


However if I then update the date in the date table, I will get one row in the MV log for that but I obviously don't get any in the xxmaster log because no rows have been inserted/updated/deleted.

update xxjk_date set refresh_date = date '2017-12-31';

select * from  MLOG$_XXJK_MASTER;

select * from  MLOG$_XXJK_DATE;


So when I do a refresh now:

exec dbms_mview.refresh(list=>'xxjk_tst_mv', method=>'F');

select count(*) from  xxjk_tst_mv;


What is Oracle doing to eliminate those rows that no longer match the predicate "i.creation_date > d.refresh_date"?

* Is the source table full scanned (assuming I have no index on that column ) with the predicate applied and the rows to keep compared with the MV?
* Is the inverse of the predicate applied in order to get a list of now-not-required rows, which are then removed from the MV?
* Is the predicate applied against the existing rows in the MV to eliminate them (i.e. as the MV log is empty, the source master table isn't even visited?)
* Is the whole query just run (i.e. it's not a fast refresh at all)
* Something else?

Clearly how this is done internally is likely to have an impact on the refresh time. If I have a million row table on my master site and I am replicating say only the last 5 days (5K rows) on the remote site, it is going to be more efficient if the third point I put above is used. Or, if the master site is scanned, then whether that is done with the predicate as-is in the MV would matter - I might put an index on the date column for example.

Hope that makes sense. LiveSQL script attached.

Thanks,
John

with LiveSQL Test Case:

and Chris said...

There's an easy way to find out:

Trace the session!

alter session set tracefile_identifier = mv_refresh;
exec dbms_monitor.session_trace_enable(waits => true, binds => true);

exec dbms_mview.refresh(list=>'xxjk_tst_mv', method=>'F');

exec dbms_monitor.session_trace_disable();


Then you can inspect the formatted trace file for references to delete/insert from XXJK_TST_MV.

When running your example I found:

delete from "CHRIS"."XXJK_TST_MV" sna$
where "D_RID" in (
  select /*+ NO_MERGE  HASH_SJ  */ *
  from (
    select chartorowid( "MAS$"."M_ROW$$" ) rid$
    from "CHRIS"."MLOG$_XXJK_DATE" "MAS$"
    where "MAS$".snaptime$$ >:b_st0
    and   not (
      "MAS$".dmltype$$ = 'U'
      and   ( sys.dbms_snapshot_utl.vector_compare(:b_cv0,"MAS$".change_vector$$ ) = 0 )
    )
  ) as of snapshot (:b_scn ) mas$
);

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  DELETE  XXJK_TST_MV (cr=107 pr=0 pw=0 time=192262 us starts=1)
     15062      15062      15062   HASH JOIN RIGHT SEMI (cr=102 pr=0 pw=0 time=40491 us starts=1 cost=33 size=409572 card=1468)
         1          1          1    TABLE ACCESS FULL MLOG$_XXJK_DATE (cr=10 pr=0 pw=0 time=758 us starts=1 cost=6 size=269 card=1)
     15062      15062      15062    MAT_VIEW ACCESS FULL XXJK_TST_MV (cr=92 pr=0 pw=0 time=13815 us starts=1 cost=27 size=150610 card=15061)

insert  /*+ NOAPPEND */ into "CHRIS"."XXJK_TST_MV"
  select /*+ NO_MERGE("JV$") */ "MAS$1".rowid,
         "JV$"."RID$",
         "MAS$1"."ID",
         "MAS$1"."CREATION_DATE"
  from (
    select "MAS$"."ROWID" "RID$",
           "MAS$".*
    from "CHRIS"."XXJK_DATE" "MAS$"
    where rowid in (
      select  /*+ HASH_SJ */ chartorowid( "MAS$"."M_ROW$$" ) rid$
      from   "CHRIS"."MLOG$_XXJK_DATE" "MAS$"
      where  "MAS$".snaptime$$ >:b_st0
      and    not (
        "MAS$".dmltype$$ = 'U'
        and   ( sys.dbms_snapshot_utl.vector_compare(:b_cv0,"MAS$".change_vector$$ ) = 0 )
      )
    )
  ) as of snapshot (:b_scn ) "JV$",
        "XXJK_MASTER" as of snapshot (:b_scn ) "MAS$1"
  where "MAS$1"."CREATION_DATE" > "JV$"."REFRESH_DATE";

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  LOAD TABLE CONVENTIONAL  XXJK_TST_MV (cr=438 pr=0 pw=0 time=55929 us starts=1)
         1          1          1   NESTED LOOPS  (cr=438 pr=0 pw=0 time=55843 us starts=1 cost=271 size=509966 card=14999)
         1          1          1    VIEW  (cr=164 pr=0 pw=0 time=42835 us starts=1 cost=7 size=21 card=1)
         1          1          1     HASH JOIN SEMI (cr=164 pr=0 pw=0 time=42832 us starts=1 cost=7 size=277 card=1)
         1          1          1      INDEX FULL SCAN XXJK_DATE (cr=158 pr=0 pw=0 time=41893 us starts=1 cost=1 size=8 card=1)(object id 80954)
         1          1          1      TABLE ACCESS FULL MLOG$_XXJK_DATE (cr=6 pr=0 pw=0 time=629 us starts=1 cost=6 size=269 card=1)
         1          1          1    TABLE ACCESS FULL XXJK_MASTER (cr=274 pr=0 pw=0 time=12991 us starts=1 cost=264 size=194987 card=14999)


My MV had 15,062 rows in it. So the database first removes everything from it. Then inserting back in the valid rows.

We've left this process undocumented, so you may find differences across versions. In any case, once you've got the refresh SQL you could look to see if there's anything you could do to improve it. Bearing in mind you can't change it and it may change!

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Interesting...

John Keymer, February 14, 2018 - 12:13 pm UTC

Thank-you very much for the super fast response!

OK, so unless I have misunderstood, from that it would seem that having such a predicate pretty much negates the fast refresh benefits completely? I can see why this happens - because a change in the date might either eliminate rows (if the date was brought forward in this case) or alternatively, might bring additional rows in, if the date was pushed backwards. Ultimately this:

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  LOAD TABLE CONVENTIONAL  XXJK_TST_MV (cr=438 pr=0 pw=0 time=55929 us starts=1)
         1          1          1   NESTED LOOPS  (cr=438 pr=0 pw=0 time=55843 us starts=1 cost=271 size=509966 card=14999)
         1          1          1    VIEW  (cr=164 pr=0 pw=0 time=42835 us starts=1 cost=7 size=21 card=1)
         1          1          1     HASH JOIN SEMI (cr=164 pr=0 pw=0 time=42832 us starts=1 cost=7 size=277 card=1)
         1          1          1      INDEX FULL SCAN XXJK_DATE (cr=158 pr=0 pw=0 time=41893 us starts=1 cost=1 size=8 card=1)(object id 80954)
         1          1          1      TABLE ACCESS FULL MLOG$_XXJK_DATE (cr=6 pr=0 pw=0 time=629 us starts=1 cost=6 size=269 card=1)
         1          1          1    TABLE ACCESS FULL XXJK_MASTER (cr=274 pr=0 pw=0 time=12991 us starts=1 cost=264 size=194987 card=14999)


Means the fast refresh is doing what the full refresh would be doing anyway, give or take? So I am at no benefit creating MV logs on the underlying tables to try and force this fast - I may as well just have a standard MV with a complete refresh?

Do you have any tips on replicating subsets of data based on date ranges to a remote site? I.e. I have a master table on my ERP dating back many years, but I only want to replicate a rolling window of that? That window may be say 1% or 50% of the total data (so indexing the creation_date may or may not help). I am ok with the hit on the initial creation - I was hoping I could trickle in the new rows/updates and get rid of the old rows a bit more efficiently. I guess I could refresh the whole thing once, use fast refresh ongoing, and a subsequent delete routine to purge out old records? That way I'm only looking at rows in the MV rather than the master table? In my case, the window would only ever be moved forwards, never backwards, so I wouldn't need to pull in older data if the date changed.

Comments?
Chris Saxon
February 14, 2018 - 5:47 pm UTC

Weeell, it depends ;)

A complete refresh will do full delete, then re-run the base query for the MV. For example, I get:

delete from "CHRIS"."XXJK_TST_MV"


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.10       0.12          0         99      16000       15062
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.11       0.13          0         99      16000       15062

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 131     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  DELETE  XXJK_TST_MV (cr=99 pr=0 pw=0 time=129846 us starts=1)
     15062      15062      15062   MAT_VIEW ACCESS FULL XXJK_TST_MV (cr=99 pr=0 pw=0 time=8727 us starts=1 cost=27 size=0 card=15061)
     
INSERT /*+ BYPASS_RECURSIVE_CHECK */ INTO "CHRIS"."XXJK_TST_MV"("I_RID",
  "D_RID","ID","CREATION_DATE") SELECT "I".ROWID,"D".ROWID,"I"."ID",
  "I"."CREATION_DATE" FROM "XXJK_MASTER" "I","XXJK_DATE" "D" WHERE 
  "I"."CREATION_DATE">"D"."REFRESH_DATE"


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0        273          2           1
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.00          0        273          2           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 131     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  LOAD TABLE CONVENTIONAL  XXJK_TST_MV (cr=273 pr=0 pw=0 time=6531 us starts=1)
         1          1          1   NESTED LOOPS  (cr=273 pr=0 pw=0 time=6449 us starts=1 cost=75 size=313404 card=14924)
         1          1          1    INDEX FULL SCAN XXJK_DATE (cr=1 pr=0 pw=0 time=23 us starts=1 cost=1 size=8 card=1)(object id 81010)
         1          1          1    TABLE ACCESS FULL XXJK_MASTER (cr=272 pr=0 pw=0 time=6416 us starts=1 cost=74 size=194012 card=14924)


So it comes down to whether this query is more efficient than the one internally generated for fast refresh. You'll have to test this out yourself.

Other things you could try:

- Create a fast refreshed MV per table that selects your rolling window:

select * from t@dblink where insert_date > sysdate - ...


then treat these like regular tables for querying, indexing, etc.

- Load the data into a table partitioned by day/week/your chosen granularity. This makes removing old data a snap: drop the oldest partition.

John Keymer, February 15, 2018 - 8:21 am UTC

- Create a fast refreshed MV per table that selects your rolling window:

select * from t@dblink where insert_date > sysdate - ...


Can I do that?

create table xxjktst (a date, constraint xxjktst_pk primary key (a));
create materialized view log on xxjktst;
create materialized view xxjktst_mv refresh fast on demand as select * from xxjktst where a > sysdate;


Table created.
Materialized View log created.
>> create materialized view xxjktst_mv refresh fast on demand as select * from xxjktst where a > sysdate
Error at line 3
ORA-12015: cannot create a fast refresh materialized view from a complex query


In terms of negating the fast refresh, what I meant was that the master table is full scanned in both cases. So if that is significantly large then there is always going to be that overhead - whereas in a fast refresh it will pick just the delta rows via rowid (providing only a sufficiently small percentage have changed).

Thanks for the suggestions - I will look into the partitioning option.
Chris Saxon
February 15, 2018 - 11:31 am UTC

Ahhh, yes, you're right. You can't use sysdate in a fast refresh MV.

I get what you're doing now. Your approach is the workaround for this. :)

In terms of negating the fast refresh, what I meant was that the master table is full scanned in both cases

Well that depends. If you have an index on creation_date in the master, the database could use this.

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.