Recently I have a case with Oracle materialized view (MV later in tekst), which was causing performance problems in our RAC database. In AWR report, SQL ordered by Gets, SQL ordered by CPU Time, SQL ordered by Reads, it was number one event in monitored period:
Quick view in general numbers, characteristic for any Oracle MV, gave a result:
select count(*) from "TAB"."MLOG$_PORTAL_KORISNIK"
38.773.887
select count(*) from "TAB"."PORTAL_KORISNIK"
207.180
Characteristic of this MV (and 99% of all used MV logs) is that materialized view log is not getting purged regularly. Fast refreshes on the materialized views
do not purge the materialized view logs. As detailed in the Oracle documentation, complete refresh on the materialized views is
not purging the materialized view log as well. But when purge process want to perform deletion in a way, that is logged in AWR:
delete from "TAB"."MLOG$_PORTAL_KORISNIK" where snaptime$$ <= :1
a massive full table scan is performed with huge I/O overhead (look closer in "Physical reads" column-
2.1GB). Time to do something!
Before solution
Obviously MV purge needs to be performed. But before purging ... one important step should be done-get wider picture why log is dozen times bigger then parent table? For that I have used next SQL, which query MV log:
SELECT TRUNC(SNAPTIME$$) DATUM, COUNT(*)
FROM TAB.MLOG$_PORTAL_KORISNIK
GROUP BY TRUNC(SNAPTIME$$)
ORDER BY DATUM DESC;
DATUM COUNT(*)
--------- ---------
1.1.4000 99083
19.4.2012 198159
18.4.2012 594574
17.4.2012 594595
16.4.2012 594574
15.4.2012 594408
14.4.2012 594415
13.4.2012 594505
12.4.2012 594578
11.4.2012 594469
10.4.2012 594517
9.4.2012 594192
8.4.2012 594192
7.4.2012 594197
6.4.2012 594249
5.4.2012 594157
4.4.2012 594075
3.4.2012 594038
2.4.2012 593951
1.4.2012 593808
31.3.2012 593817
30.3.2012 593843
29.3.2012 593863
28.3.2012 593820
27.3.2012 593811
26.3.2012 593774
25.3.2012 593622
24.3.2012 593630
23.3.2012 593678
22.3.2012 593624
21.3.2012 593578
20.3.2012 593644
19.3.2012 593613
18.3.2012 593370
17.3.2012 593373
16.3.2012 593458
15.3.2012 593448
14.3.2012 593417
13.3.2012 593375
12.3.2012 593298
11.3.2012 593172
10.3.2012 593178
9.3.2012 593241
8.3.2012 593222
7.3.2012 593153
6.3.2012 593072
5.3.2012 593133
4.3.2012 592980
3.3.2012 592982
2.3.2012 593124
1.3.2012 593054
29.2.2012 593014
28.2.2012 592984
27.2.2012 592897
26.2.2012 592644
25.2.2012 592656
24.2.2012 592727
23.2.2012 592649
22.2.2012 592645
21.2.2012 592598
20.2.2012 592516
19.2.2012 592416
18.2.2012 592417
17.2.2012 592479
16.2.2012 592453
15.2.2012 592397
14.2.2012 592378
SUM 38.773.887
------------------------------------------
Well something is wrong because MV log was refreshed cca 600.000 times a day!! Closer log on smaller segment gave me better picture, in which part that happened. I have use one random day as pattern:
SELECT TRUNC(SNAPTIME$$) DATUM, DMLTYPE$$, COUNT(*)
FROM TAB.MLOG$_PORTAL_KORISNIK
WHERE TRUNC (SNAPTIME$$)='18.04.2012'
GROUP BY TRUNC(SNAPTIME$$), DMLTYPE$$
ORDER BY DATUM DESC;
DATUM DMLTYPE$$ COUNT(*)
--------- --------- --------
18.4.2012 I 49
18.4.2012 U 594525
So problem is in updating MV log part. This was enough for me because finding what is updating parent table through database job is simple task if you go to "OWNER" of PL/SQL code that is dealing with it. And I knew a guy!
The solution
Now I was prepared to purge MV. I have wait until evening (general advice is to avoid to much DBA actions on loaded database) and according previous value, perform next purge statement:
execute DBMS_MVIEW.PURGE_LOG( master => 'TAB.PORTAL_KORISNIK', num=>'40000000', flag => 'delete' ) ;
If you do not have enough time for one time execution repeat n times with smaller chunks (num parameter). If you leave MV in this state when first FAST refresh happened, you'll get this error:
BEGIN dbms_mview.refresh( list => 'TAB.MLOG$_PORTAL_KORISNIK', method => 'F' ); END;
*
ERROR at line 1:
ORA-12034: materialized view log on "TAB"."PORTAL_KORISNIK" younger than last refresh
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2537
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2743
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2712
ORA-06512: at line 1
This is because Oracle MV rules says that after MV log purge, first MV refresh must be COMPLETE, not FAST. So be it:
QL> execute dbms_mview.refresh( list => 'TAB.PORTAL_KORISNIK ', method => 'C' );
BEGIN dbms_mview.refresh( list => 'TAB.PORTAL_KORISNIK ', method => 'C' ); END;
*
ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-02292: integrity constraint (TAB.RAC_KOR_FK) violated - child record found
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2563
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2776
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2745
ORA-06512: at line 1
This is first error that unprepared DBA might face (and frankly I was in this case-admit). Because COMPLETE refresh delete all records in MV, if this table is used as parent table for some other table, action cannot be performed before disable action on that constraint is performed. This constraint is, obviously, on child table, so let us disable:
SQL> ALTER TABLE TAB.SOME_CHILD_TABLE
2 MODIFY CONSTRAINT RAC_KOR_FK
3 DISABLE;
In this case I was lucky to have just one table with just one level connected constrains. In real life you might get in real trouble when trying to disable all needed constraints around database. But as stated, I was lucky. After that COMPLETE refresh pass:
SQL> execute dbms_mview.refresh( list => 'TAB.PORTAL_KORISNIK ', method => 'C' );
PL/SQL procedure successfully completed.
Do not forget to enable previously disabled constraint:
SQL> ALTER TABLE TAB.SOME_CHILD_TABLE
2 MODIFY CONSTRAINT RAC_KOR_FK
3 ENABLE;
Table altered.
SQL>
And for the end make test with FAST refresh type:
execute dbms_mview.refresh( list => 'PORTAL_KORISNIK', method => 'F' );
PL/SQL procedure successfully completed.
Looks OK. Promissing.
Is that all? No!
In this moment many MV purge guides finish, but this one goes one step beyond. Happy DBA might check MV log in a way to see number of records (that all are purged). And this is moment of happiness and sadness in the same moment.
COUNT(*)
----------
0
Elapsed: 00:00:31.00
SQL>
0 records is expected result but time to gather that result is not-31 seconds for empty segment! MV left in this state will show minor performance gain-because reclaimable storage from previos data left and is still active.
So shrinking space is needed because space for MV is not shorten (remember, before purge was performed on MV log with 38 millions of records in it). This may be done in two steps.
First step is one performed where MV table is placed, where MV will be shrink:
SQL> alter table tab.portal_korisnik enable row movement;
Table altered.
SQL> ALTER MATERIALIZED VIEW "TAB"."PORTAL_KORISNIK" shrink space cascade;
Materialized view altered.
SQL>
Next step should be done on side where MV log is placed. This is where MV log will be shrinked:
SQL> alter MATERIALIZED VIEW LOG ON TAB.PORTAL_KORISNIK ENABLE ROW MOVEMENT;
Materialized view log altered.
Elapsed: 00:00:00.87
SQL> alter MATERIALIZED VIEW LOG ON TAB.PORTAL_KORISNIK SHRINK SPACE CASCADE;
Materialized view log altered.
SQL> alter MATERIALIZED VIEW LOG ON TAB.PORTAL_KORISNIK DISABLE ROW MOVEMENT;
Materialized view log altered.
Elapsed: 00:00:00.15
SQL>
SQL> alter table tab.portal_korisnik disable row movement;
Table altered.
SQL>
Notice that these two sides actions, mostly differ in "LOG" part only. Now all seems to be OK, let us try once again problematic count(*) statement:
SQL> select count(*) from TAB.MLOG$_PORTAL_KORISNIK;
COUNT(*)
----------
0
Elapsed: 00:00:00.09
SQL>
This is just fine now.
Performance gain
If you ask how much this affect in performance, here are some jobs statistic query, based on custom jobs_diary tables, which has column SECS_RUN which shows how much seconds has job lasted. First let me show job which was inserting records (minor impact in MV log):
ID INST_ID JOB_INFO SECS_RUN CREATE_D CREATE_USER EDIT_DAT EDIT_USER
---------- ---------- -------------------------------- ---------- -------- -------------------------------- -------- ----------
2809 1 423 snapshots 385 19.04.12 TAB 19.04.12 TAB
2816 1 423 snapshots 277 19.04.12 TAB 19.04.12 TAB
2825 1 423 snapshots 322 19.04.12 TAB 19.04.12 TAB
2832 1 423 snapshots 326 19.04.12 TAB 19.04.12 TAB
2842 1 423 snapshots 277 19.04.12 TAB 19.04.12 TAB
2849 1 423 snapshots 342 19.04.12 TAB 19.04.12 TAB
2858 1 423 snapshots 356 19.04.12 TAB 19.04.12 TAB
2866 1 423 snapshots 286 19.04.12 TAB 19.04.12 TAB
2878 1 423 snapshots 528 19.04.12 TAB 19.04.12 TAB
2885 1 423 snapshots 415 19.04.12 TAB 19.04.12 TAB
2894 1 423 snapshots 187 19.04.12 TAB 19.04.12 TAB
2901 1 423 snapshots 263 19.04.12 TAB 19.04.12 TAB
2911 1 423 snapshots 179 19.04.12 TAB 19.04.12 TAB
2918 1 423 snapshots 339 19.04.12 TAB 19.04.12 TAB
2927 1 423 snapshots 372 19.04.12 TAB 19.04.12 TAB
2934 1 423 snapshots 535 19.04.12 TAB 19.04.12 TAB
2945 1 423 snapshots 437 19.04.12 TAB 19.04.12 TAB
2952 1 423 snapshots 252 19.04.12 TAB 19.04.12 TAB
2961 1 423 snapshots 248 19.04.12 TAB 19.04.12 TAB
2969 1 423 snapshots 268 19.04.12 TAB 19.04.12 TAB
2979 1 423 snapshots 214 19.04.12 TAB 19.04.12 TAB
2986 1 423 snapshots 268 19.04.12 TAB 19.04.12 TAB
2995 1 423 snapshots 260 19.04.12 TAB 19.04.12 TAB
3003 1 423 snapshots 258 19.04.12 TAB 19.04.12 TAB
3014 1 423 snapshots 19.04.12 TAB
3016 1 423 snapshots 19.04.12 TAB
3018 1 423 snapshots 19.04.12 TAB
3021 1 423 snapshots 106 19.04.12 TAB 19.04.12 TAB
3030 1 423 snapshots 159 19.04.12 TAB 19.04.12 TAB
3037 1 423 snapshots 175 19.04.12 TAB 19.04.12 TAB
3047 1 423 snapshots 64 19.04.12 TAB 19.04.12 TAB
3054 1 423 snapshots 60 19.04.12 TAB 19.04.12 TAB
3063 1 423 snapshots 67 19.04.12 TAB 19.04.12 TAB
33 rows selected.
3014, 3016 and 3016 were jobs which failed to run because purging was taking place.
3021, 3030, 3037 were jobs before shrink space happened.
3047, 3054, 3063 were jobs after shrink space happened.
Generally speaking, it looks like:
SQL> select min(secs_run), max(secs_run), avg(secs_run) from tools.jobs_diary where job_info='423 snapshots' ;
MIN(SECS_RUN) MAX(SECS_RUN) AVG(SECS_RUN)
------------- ------------- -------------
60 669 280.133739
This is, on average base, at least 4 times less then before.
Let us look for job which was doing that massive updates. Interesting, that job was running pretty fast (MV log is on remote database) so all seems nice all the time:
SQL> select * from tools.jobs_diary where job_info='3364 Ažuriranje korisnika' order by id;
ID INST_ID JOB_INFO SECS_RUN CREATE_D CREATE_USER EDIT_DAT EDIT_USER
---------- ---------- -------------------------------- ---------- -------- -------------------------------- -------- ----------
710 3 3364 Ažuriranje korisnika 8 13.04.12 TAB 13.04.12 TAB
780 1 3364 Ažuriranje korisnika 8 13.04.12 TAB 13.04.12 TAB
849 2 3364 Ažuriranje korisnika 8 13.04.12 TAB 13.04.12 TAB
919 2 3364 Ažuriranje korisnika 8 13.04.12 TAB 13.04.12 TAB
989 2 3364 Ažuriranje korisnika 16 13.04.12 TAB 13.04.12 TAB
1058 1 3364 Ažuriranje korisnika 9 13.04.12 TAB 13.04.12 TAB
1129 1 3364 Ažuriranje korisnika 8 14.04.12 TAB 14.04.12 TAB
1198 3 3364 Ažuriranje korisnika 16 14.04.12 TAB 14.04.12 TAB
1267 1 3364 Ažuriranje korisnika 7 14.04.12 TAB 14.04.12 TAB
1336 3 3364 Ažuriranje korisnika 13 14.04.12 TAB 14.04.12 TAB
1406 3 3364 Ažuriranje korisnika 8 14.04.12 TAB 14.04.12 TAB
1474 2 3364 Ažuriranje korisnika 10 14.04.12 TAB 14.04.12 TAB
1548 3 3364 Ažuriranje korisnika 13 15.04.12 TAB 15.04.12 TAB
1618 4 3364 Ažuriranje korisnika 18 15.04.12 TAB 15.04.12 TAB
1686 4 3364 Ažuriranje korisnika 14 15.04.12 TAB 15.04.12 TAB
1756 4 3364 Ažuriranje korisnika 17 15.04.12 TAB 15.04.12 TAB
1826 3 3364 Ažuriranje korisnika 11 15.04.12 TAB 15.04.12 TAB
1894 4 3364 Ažuriranje korisnika 12 15.04.12 TAB 15.04.12 TAB
1966 3 3364 Ažuriranje korisnika 12 16.04.12 TAB 16.04.12 TAB
2036 3 3364 Ažuriranje korisnika 10 16.04.12 TAB 16.04.12 TAB
2105 1 3364 Ažuriranje korisnika 21 16.04.12 TAB 16.04.12 TAB
2175 1 3364 Ažuriranje korisnika 6 16.04.12 TAB 16.04.12 TAB
2245 4 3364 Ažuriranje korisnika 19 16.04.12 TAB 16.04.12 TAB
2314 2 3364 Ažuriranje korisnika 9 16.04.12 TAB 16.04.12 TAB
2387 1 3364 Ažuriranje korisnika 11 17.04.12 TAB 17.04.12 TAB
2456 3 3364 Ažuriranje korisnika 12 17.04.12 TAB 17.04.12 TAB
2525 3 3364 Ažuriranje korisnika 23 17.04.12 TAB 17.04.12 TAB
2595 2 3364 Ažuriranje korisnika 41 17.04.12 TAB 17.04.12 TAB
2665 3 3364 Ažuriranje korisnika 9 17.04.12 TAB 17.04.12 TAB
2734 1 3364 Ažuriranje korisnika 24 17.04.12 TAB 17.04.12 TAB
2806 4 3364 Ažuriranje korisnika 7 18.04.12 TAB 18.04.12 TAB
2875 2 3364 Ažuriranje korisnika 13 18.04.12 TAB 18.04.12 TAB
2944 2 3364 Ažuriranje korisnika 8 18.04.12 TAB 18.04.12 TAB
3014 1 3364 Ažuriranje korisnika 9 18.04.12 TAB 18.04.12 TAB
3114 4 3364 Ažuriranje korisnika 7 18.04.12 TAB 18.04.12 TAB
3251 2 3364 Ažuriranje korisnika 108 18.04.12 TAB 18.04.12 TAB
3399 1 3364 Ažuriranje korisnika 13 19.04.12 TAB 19.04.12 TAB
3543 2 3364 Ažuriranje korisnika 8 19.04.12 TAB 19.04.12 TAB
3683 3 3364 Ažuriranje korisnika 10 19.04.12 TAB 19.04.12 TAB
3840 2 3364 Ažuriranje korisnika 0 19.04.12 TAB 19.04.12 TAB
3983 3 3364 Ažuriranje korisnika 1 19.04.12 TAB 19.04.12 TAB
41 rows selected.
3840 and 3983 were jobs that happened after shrink space happened. As stated before, performance gain seems small. Look in general data:
SQL> select min(secs_run), max(secs_run), avg(secs_run) from tools.jobs_diary where job_info='3364 Ažuriranje korisnika';
MIN(SECS_RUN) MAX(SECS_RUN) AVG(SECS_RUN)
------------- ------------- -------------
0 108 14.2682927
If you look in average, then again performance gain is 10 times, even on small numbers this is huge percent value.
The end
But the best gain was when MV log was purged, because FTS was small. This gain may not be exposed in numbers in any mean.
For the end test once again FAST refresh and if all went OK, purging is done in all way with best result.
Hope this helps someone...
Cheers!