Monday, February 22, 2010

Odd CBO behavior


CBO ... So much words has been said for Oracle most curious technology today and still most of us know very little about it.

Until now, I was pretty sure that any SQL that was parsed without errors cannot be wrong at run time... To be more clear I was sure that SQL result may not depend on physical data in tables.

But I was wrong! Following example came from real live problem on 10g database.

The problem

I wanted to write a script which they can query TEMP usage through SQLPlus as ordinary sql query. After some time I managed to get satisfactory query.

I run it as an test in my environment:
SQL> select
  2    s.username,
  3    s.sid, s.serial#,
  4    round(((su.blocks*pa.value)/1024/1024),2) size_mb,
  5    s.status,
  6    su.segfile#,
  7    su.segblk#
  8    from v$session s,
  9    v$sort_usage su,
 10    v$process pr,
 11    v$parameter pa
 12  where 
 13        s.saddr = su.session_addr
 14    and s.paddr = pr.addr
 15    and pa.name='db_block_size'
 16    and ((nvl(su.blocks,0)*pa.value)/1024/1024) > 0
 17  order by size_mb;

no rows selected

SQL>
Result "no rows selected" force me to occupy some TEMP space to test properly. So I run in other SQLPlus session script which do exactly that-occupy TEMP space:
SQL> select * from dba_source order by text desc; 
After some time (when previous script start to output data) I run again mine TEMP script and get an error:
SQL> select
  2    s.username,
  3    s.sid, s.serial#,
  4    round(((su.blocks*pa.value)/1024/1024),2) size_mb,
  5    s.status,
  6    su.segfile#,
  7    su.segblk#
  8    from v$session s,
  9    v$sort_usage su,
 10    v$process pr,
 11    v$parameter pa
 12  where 
 13        s.saddr = su.session_addr
 14    and s.paddr = pr.addr
 15    and pa.name='db_block_size'
 16    and ((nvl(su.blocks,0)*pa.value)/1024/1024) > 0
 17  order by size_mb;
and ((nvl(su.blocks,0)*pa.value)/1024/1024) > 0
                       *
ERROR at line 16:
ORA-01722: invalid number

SQL> 
I was surprised with an error.

Firstly I have rechecked return value from v$parameter query:
SQL> select value
  2    from v$parameter pa
  3   where pa.name='db_block_size';

VALUE
--------------------------------
8192

SQL> 

Then I retest that returned value is correct (it is a number):
SQL> select value * 10
  2    from v$parameter pa
  3   where pa.name='db_block_size';

  VALUE*10
----------
     81920

SQL> 

Then I tried to place to_number() to explicitly cast problematic part-no success!

Then I thought that Oracle database statistic (data dictionary) was too old so I run it manually (dbms_stats.gather_database_stats)-no success!

After that I run explain plan to see what is going on with the query:
SQL> explain plan for
  2  select
  3    s.username,
  4    s.sid, s.serial#,
  5    round(((su.blocks*pa.value)/1024/1024),2) size_mb,
  6    s.status,
  7    su.segfile#,
  8    su.segblk#
  9    from v$session s,
 10    v$sort_usage su,
 11    v$process pr,
 12    v$parameter pa
 13  where 
 14        s.saddr = su.session_addr
 15    and s.paddr = pr.addr
 16    and pa.name='db_block_size'
 17    and ((nvl(su.blocks,0)*pa.value)/1024/1024) > 0
 18  order by size_mb;

Explained.

SQL> set lines 2000
SQL> set head off
SQL> select plan_table_output from table(dbms_xplan.display('PLAN_TABLE',null,'advanced'));

--------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                 |     1 |  2461 |     3 (100)| 00:00:01 |
|   1 |  SORT ORDER BY                 |                 |     1 |  2461 |     3 (100)| 00:00:01 |
|   2 |   NESTED LOOPS                 |                 |     1 |  2461 |     2 (100)| 00:00:01 |
|   3 |    NESTED LOOPS                |                 |     1 |  2448 |     2 (100)| 00:00:01 |
|*  4 |     HASH JOIN                  |                 |     1 |  2422 |     2 (100)| 00:00:01 |
|*  5 |      HASH JOIN                 |                 |     1 |  2390 |     2 (100)| 00:00:01 |
|   6 |       NESTED LOOPS             |                 |     1 |  2283 |     1 (100)| 00:00:01 |
|   7 |        NESTED LOOPS            |                 |     1 |  2270 |     1 (100)| 00:00:01 |
|*  8 |         HASH JOIN              |                 |     1 |  2244 |     1 (100)| 00:00:01 |
|   9 |          NESTED LOOPS          |                 |     1 |  2176 |     1 (100)| 00:00:01 |
|* 10 |           HASH JOIN            |                 |     1 |   142 |     1 (100)| 00:00:01 |
|* 11 |            FIXED TABLE FULL    | X$KSUSE         |     1 |    71 |     0   (0)| 00:00:01 |
|* 12 |            FIXED TABLE FULL    | X$KTSSO         |     1 |    71 |     0   (0)| 00:00:01 |
|* 13 |           FIXED TABLE FULL     | X$KSPPCV        |     5 | 10170 |     0   (0)| 00:00:01 |
|* 14 |          FIXED TABLE FULL      | X$KSPPI         |     1 |    68 |     0   (0)| 00:00:01 |
|* 15 |         FIXED TABLE FIXED INDEX| X$KSLWT (ind:1) |     1 |    26 |     0   (0)| 00:00:01 |
|* 16 |        FIXED TABLE FIXED INDEX | X$KSLED (ind:2) |     1 |    13 |     0   (0)| 00:00:01 |
|* 17 |       FIXED TABLE FULL         | X$KSUSE         |     1 |   107 |     0   (0)| 00:00:01 |
|* 18 |      FIXED TABLE FULL          | X$KSUPR         |     1 |    32 |     0   (0)| 00:00:01 |
|* 19 |     FIXED TABLE FIXED INDEX    | X$KSLWT (ind:1) |     1 |    26 |     0   (0)| 00:00:01 |
|* 20 |    FIXED TABLE FIXED INDEX     | X$KSLED (ind:2) |     1 |    13 |     0   (0)| 00:00:01 |
As it is easy to see, the problem rely in view v$parameter and column value which has mixed values (numbers as well as varchars). So when Oracle's CBO perform Full Table Scan (FTS) (regardless value I was retrieving was pure number) it breaks.
The whole explain plan output can be downloaded from one file .

Solution

Solution was relatively easy when we know the cause. So I'll show one most acceptable way. Placing hint will always prevent Oracle's CBO to go wrong-to make FTS!
SQL> select /*+ rule */
  2    s.username,
  3    s.sid, s.serial#,
  4    round(((su.blocks*pa.value)/1024/1024),2) size_mb,
  5    s.status,
  6    su.segfile#,
  7    su.segblk#
  8    from v$session s,
  9    v$sort_usage su,
 10    v$process pr,
 11    v$parameter pa
 12  where 
 13        s.saddr = su.session_addr
 14    and s.paddr = pr.addr
 15    and pa.name='db_block_size'
 16    and ((nvl(su.blocks,0)*pa.value)/1024/1024) > 0
 17  order by size_mb;

USERNAME             SID    SERIAL#    SIZE_MB STATUS     SEGFILE#    SEGBLK#
-------------- --------- ---------- ---------- -------- ---------- ----------
TAB                  307        905        226 INACTIVE         81      60681

SQL>
The same could be done with rule FIRST_ROWS.

To see that realy CBO plan changed, let us look into it:
SQL> explain plan for
  2  select /*+ rule */
  3    s.username,
  4    s.sid, s.serial#,
  5    round(((su.blocks*pa.value)/1024/1024),2) size_mb,
  6    s.status,
  7    su.segfile#,
  8    su.segblk#
  9    from v$session s,
 10    v$sort_usage su,
 11    v$process pr,
 12    v$parameter pa
 13  where 
 14        s.saddr = su.session_addr
 15    and s.paddr = pr.addr
 16    and pa.name='db_block_size'
 17    and ((nvl(su.blocks,0)*pa.value)/1024/1024) > 0
 18  order by size_mb;

Explained.

SQL> set lines 2000
SQL> set head off
SQL> select plan_table_output from table(dbms_xplan.display('PLAN_TABLE',null,'advanced'));

--------------------------------------------------------
| Id  | Operation                           | Name     |
--------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |
|   1 |  SORT ORDER BY                      |          |
|   2 |   MERGE JOIN                        |          |
|   3 |    SORT JOIN                        |          |
|   4 |     MERGE JOIN                      |          |
|   5 |      SORT JOIN                      |          |
|   6 |       MERGE JOIN                    |          |
|   7 |        SORT JOIN                    |          |
|   8 |         MERGE JOIN                  |          |
|   9 |          SORT JOIN                  |          |
|  10 |           MERGE JOIN                |          |
|  11 |            SORT JOIN                |          |
|  12 |             MERGE JOIN              |          |
|  13 |              SORT JOIN              |          |
|  14 |               MERGE JOIN            |          |
|  15 |                SORT JOIN            |          |
|  16 |                 NESTED LOOPS        |          |
|  17 |                  MERGE JOIN         |          |
|  18 |                   FIXED TABLE FULL  | X$KSPPCV |
|* 19 |                   FILTER            |          |
|* 20 |                    SORT JOIN        |          |
|* 21 |                     FIXED TABLE FULL| X$KSPPI  |
|* 22 |                  FIXED TABLE FULL   | X$KTSSO  |
|* 23 |                SORT JOIN            |          |
|* 24 |                 FIXED TABLE FULL    | X$KSUSE  |
|* 25 |              SORT JOIN              |          |
|  26 |               FIXED TABLE FULL      | X$KSLWT  |
|* 27 |            SORT JOIN                |          |
|  28 |             FIXED TABLE FULL        | X$KSLED  |
|* 29 |          SORT JOIN                  |          |
|* 30 |           FIXED TABLE FULL          | X$KSUSE  |
|* 31 |        SORT JOIN                    |          |
|* 32 |         FIXED TABLE FULL            | X$KSUPR  |
|* 33 |      SORT JOIN                      |          |
|  34 |       FIXED TABLE FULL              | X$KSLWT  |
|* 35 |    SORT JOIN                        |          |
|  36 |     FIXED TABLE FULL                | X$KSLED  |
--------------------------------------------------------
The whole explain plan output can be downloaded from one file

The conclusion

This problem I have reproduced on 10g database as well as 11g (32/64 bit Win/Lin versions) with default init ora parameter optimizer_mode defined as ALL_ROWS.

Because "bug" was too obvious finally I search on Metalink and found exactly the same case: "Bug 6263716: QUERY WORKS FOR ORACLE 9I BUT IT DOESN'T WORK FOR ORACLE 10G-ORA-1722" but it is closed as not a bug but expected behaviour.

So, be careful on tables with mixed values in one column-they might reproduce an unexpected error at run time ... and this is expected behaviour...
;-)

Cheers!

4 comments:

  1. I onece encountered the following:
    system@DB10G> select index_name, degree from dba_indexes where degree > 2;
    select index_name, degree from dba_indexes where degree > 2
    *ERROR at line 1:
    ORA-01722: invalid number

    Until then I realized the degree is VARCHAR2 type.

    ReplyDelete
  2. Yu,
    For certain arithmetic operations Oracle do automatic casting. Look in post for exactly this demo.
    Your case was not that-you are comparing values and here oracle makes no implicit casting.

    Damir Vadas
    http://damir-vadas.blogspot.com/

    ReplyDelete
  3. I think in the col > nubmer case oracle makes implicit casting too:


    TESTDB>create table t(deg varchar2(30));

    Table created.

    TESTDB>insert into t values('1');

    1 row created.

    TESTDB>insert into t values('2');

    1 row created.

    TESTDB>insert into t values('3');

    1 row created.

    TESTDB>commit;

    Commit complete.

    TESTDB>set autotrace traceonly
    TESTDB>select * from t where deg > 2;


    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 2153619298

    --------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    --------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 17 | 3 (0)| 00:00:01 |
    |* 1 | TABLE ACCESS FULL| T | 1 | 17 | 3 (0)| 00:00:01 |
    --------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    1 - filter(TO_NUMBER("DEG")>2)

    Note
    -----
    - dynamic sampling used for this statement


    Statistics
    ----------------------------------------------------------
    28 recursive calls
    0 db block gets
    17 consistent gets
    0 physical reads
    0 redo size
    218 bytes sent via SQL*Net to client
    238 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    1 rows processed

    TESTDB>set autotrace off
    TESTDB>insert into t values('default');

    1 row created.

    TESTDB>select * from t where deg > 2;
    ERROR:
    ORA-01722: invalid number



    no rows selected

    ReplyDelete
  4. Yun,
    Seems you are right...it makes implicit cast.
    ;-)
    ThX for nice explanation and time to make that alive here...
    Damir

    ReplyDelete