Saturday, November 28, 2009

Example how to save deadlock exceptions into table (beside alert log)!

Deadlock is a normal situation in a multi user environments that causes two or more sessions to wait indefinitely for a locked resource. First session needs a resource locked by the second one and the second session needs a resource locked by the first session. You may accept deadlock as a easiest solution from Oracle side to ensure normal functioning in unsolvable situations.



After deadlock occur, beside exception (ORA-00060), alert log entry is entered with message, one trace file in background_dump_dest directory is generated. In that trace file, between all others information about deadlock, it is very clearly written:
...
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a deadlock due to user error in the design of an application or from issuing incorrect ad-hoc SQL. The following information may aid in determining the deadlock
...

Because alert log and trace files are on the server side, where most of developers cannot reach, this is why all such problems fall in DBA hands, regardless most of us do not do design (!?) or program the database which we administer.

9i deadlock monitoring solution

While I was working in 2005 as DBA in one company with Oracle 9i, where deadlock was a serious and constant problem, I was tired from transferring all the time files to developers. This is why I created a standalone solution which capture deadlock and save all important data to two native Oracle tables. This 9i solution is available to download .

Solution is implemented in separate schema (tools) which should have enough privileges for that. So this is one additional step is needed to be done for such a scenario. Because non privileged users can query "%$" views, I have to create replica with similar name and grant select to tools user:
create or replace SYS.VT$OBJ as select * from SYS.OBJ$;
GRANT SELECT ON SYS.VT$OBJ TO TOOLS;
Of course, other way was to implement the whole logic in SYS schema, but then in some future exports...this could be lost (exp doesn't export sys schema).

Deadlock information are recorded in mentioned two tables (DEADLOCK_LOG_LOCKERS, DEADLOCK_LOG_WAITERS). The whole logic is automated through database trigger, which is fired when ORA-00060 occur.
CREATE OR REPLACE TRIGGER db_aserr_tgb
AFTER SERVERERROR
ON DATABASE
BEGIN
  IF (IS_SERVERERROR (60)) THEN
    deadlock_log_prc;
  END IF;
EXCEPTION
  WHEN OTHERS THEN
    null;
END db_aserr_tgb;
/

Finally, referenced deadlock_log_prc procedure in trigger, do the whole job-fill tables with information about deadlocks. If you perform code from picture at the beginning (attached source for this exampel is in RAR files as well) then on 9i database you get something like:

Notice that beside other information from session that caused deadlock, rightmost column contains SQL from session that survived.


Second table describes more with global data what was in the moment of deadlock.

Using ID and DEADLOG_DATE fields you can join tables and analyze the deadlock situation. Because I granted public select on both of tables developers were able to help them self with no intervention from DBA, what makes my work easier.

As you can see those two tables has almost all important data which you can find in trace file.

Non automatic usage

Because deadlock is in core classical exception, following, usual "developers like" code, will prevent deadlock logging:
BEGIN
  UPDATE....
EXCEPTION
  WHEN OTHERS THEN
    null;
END;
/
because exception is silently "crushed" in general exception handler without proper processing interesting part of it.

So it is much, much better next code which will fix previous problem with missed logging, leaving the whole program functionality as it was:
DECLARE
  my_deadlock_exc EXCEPTION;
  PRAGMA EXCEPTION_INIT (my_deadloc_exc, -60);
BEGIN
  UPDATE....
EXCEPTION
  WHEN my_deadlock_exc THEN
    deadlock_log_prc;
  WHEN OTHERS THEN
    null;
END;
As you can see, because trigger cannot be called implicitly, we are calling explicitly deadlock_log_prc which is in the case exception handler for deadlock exception.

Whenever I had a chance, I always tried to learn developers to make some exception handlers at least for some harmless events (resource bussy, record not found etc)... to prevent hiding real problems in WHEN OTHERS part.

11g deadlock changes!

After that I was working as APPS 11i DBA, locking story passed away from my mind.  The reason is very easy: in EBS there is no ref constraints-so deadlocks are very, very rare...frankly if they happened it is usually from the customization part.

Recently I jump into one interesting post 11g is more deadlock sensitive than 10g?, which mentioned something about expanded logic in deadlock. As I have at home my 11gR1, I decided to test that immediately.

Very soon I realize that 11g-9i difference in this part are really huge. As you can see, my solution, which was perfectly working on 9i, on 11g, beside important data which must be captured, was capturing many others unnecessary data, which were not produced in 9i.

11g deadlock monitoring workaround

After some small investigation, the quickest solution can be done by adding additional filters before posting captured sqls. For that i have re changed a little bit logic in deadlock_log_prc procedure. Here is how it now it looks (change code is especially colored):
CREATE OR REPLACE PROCEDURE deadlock_log_prc IS
/*------------------------------------------------------------------------------
 Filename: 3_deadlock_log_prc.sql
 Purpose : Procedure which fill deadlock monitoring tables with data
           when deadlock occur

 Date    : 18.10.2005
 Author  : Damir Vadas
 CR      : 1500

 Remarks : run as priviledged user
Should work on 9i-11g Oracle versions

 Changes (DD.MM.YYYY):
          27.11.2009. Damir Vadas
                      Rechanged logic for 11g databases (not too much tested-looks good)
 ----------------------------------------------------------------------------*/
PRAGMA AUTONOMOUS_TRANSACTION;
  v_id INTEGER;
  v_date DATE;
BEGIN
  SELECT tools_seq.nextVal, sysdate
    INTO v_id, v_date
    FROM dual
  ;

  INSERT INTO tools.deadlock_log_lockers (
      id
    , deadlog_date
    , os_locker
    , locker_schema
    , locker_pid
    , os_waiter
    , waiter_schema
    , waiter_pid
    , sql_text_waiter
  )
    /* TX row locks  */ 
    WITH Q AS (
      SELECT /*+ rule */
          S_LOCKER.OSUSER OS_LOCKER
        , S_LOCKER.USERNAME LOCKER_SCHEMA
        , S_LOCKER.PROCESS LOCK_PID
        , S_WAITER.OSUSER OS_WAITER
        , S_WAITER.USERNAME WAITER_SCHEMA
        , S_WAITER.PROCESS WAITER_PID
        , (SELECT sql_fulltext from v$sql where sql_id = O.SQL_ID AND ROWNUM<=1)
        SQL_TEXT
      FROM
          V$LOCK L_WAITER
        , V$LOCK L_LOCKER
        , V$SESSION S_WAITER
        , V$SESSION S_LOCKER
        , V$_LOCK L1_WAITER
        , V$OPEN_CURSOR O
      WHERE S_WAITER.SID = L_WAITER.SID
         AND L_WAITER.TYPE IN ('TX')
         AND S_LOCKER.sid = L_LOCKER.sid
         AND L_LOCKER.ID1 = L_WAITER.ID1
         AND L_WAITER.REQUEST > 0
         AND L_LOCKER.LMODE > 0
         AND L_WAITER.ADDR != L_LOCKER.ADDR
         AND L1_WAITER.LADDR = L_WAITER.ADDR
         AND L1_WAITER.KADDR = L_WAITER.KADDR
         AND L1_WAITER.SADDR = O.SADDR
         AND O.USER_NAME != 'SYS'
    )
    SELECT
            v_id
          , v_date
          , OS_LOCKER
          , LOCKER_SCHEMA
          , LOCK_PID
          , OS_WAITER
          , WAITER_SCHEMA
          , WAITER_PID
          , CAST (DBMS_LOB.SUBSTR (SQL_TEXT, 4000) AS VARCHAR (4000)) 
      FROM Q
     WHERE 
            SUBSTR(UPPER(
                           CAST (DBMS_LOB.SUBSTR (SQL_TEXT, 4000) AS VARCHAR (4000))
                          )
                          ,1,7)!='SELECT '                   -- 11g
       AND INSTR(UPPER(RTRIM(Q.SQL_TEXT)), 'INSERT INTO SYS.AUD$')<=0  -- auditing
       AND INSTR(UPPER(RTRIM(Q.SQL_TEXT)), 'OBJ#')<=0                  -- 11g
       AND INSTR(UPPER(RTRIM(Q.SQL_TEXT)), 'AUDIT$')<=0                -- 11
       AND INSTR(UPPER(RTRIM(Q.SQL_TEXT)), 'TABLE_1_FF')<=0            -- 11
       AND INSTR(UPPER(RTRIM(Q.SQL_TEXT)), 'DECLARE  TYPE ATTRS_CUR IS REF CURSOR;  M_CUR')<=0            -- 11
       AND INSTR(UPPER(RTRIM(Q.SQL_TEXT)), 'SDO_GEOR_DDL__TABLE$$')<=0 -- 11
       AND INSTR(UPPER(RTRIM(Q.SQL_TEXT)), 'BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END;')<=0 -- 11
       AND INSTR(UPPER(RTRIM(Q.SQL_TEXT)), 'DEADLOCK_LOG_LOCKERS')<=0  -- log tables
       AND INSTR(UPPER(RTRIM(Q.SQL_TEXT)), 'DEADLOCK_LOG_WAITERS')<=0  -- log tables
       AND INSTR(UPPER(RTRIM(Q.SQL_TEXT)), 'IS_SERVERERROR')<=0        -- log trigger which monitor deadlocks
   UNION
    /* table-level locks (TM) and mixed TM/TX TX/TM */
    SELECT  /*+ rule */
        v_id
      , v_date
      , S_LOCKER.OSUSER OS_LOCKER
      , S_LOCKER.USERNAME LOCKER_SCHEMA
      , S_LOCKER.PROCESS LOCKER_PID
      , S_WAITER.OSUSER OS_WAITER
      , S_WAITER.USERNAME WAITER_SCHEMA
      , S_WAITER.PROCESS WAITER_PID
      , 'Table lock (TM): ' || U.NAME || '.' || O.NAME || ' - Mode held: '||
          DECODE(L_LOCKER.LMODE,
            0, 'None', /* same as Monitor */
            1, 'Null', /* N */
            2, 'Row-S (SS)', /* L */
            3, 'Row-X (SX)', /* R */
            4, 'Share', /* S */
            5, 'S/Row-X (SSX)', /* C */
            6, 'Exclusive', /* X */
            '???: ' || to_char(L_LOCKER.LMODE)
          ) || ' / Mode requested: '||
          DECODE(L_WAITER.REQUEST,
            0, 'None', /* same as Monitor */
            1, 'Null', /* N */
            2, 'Row-S (SS)', /* L */
            3, 'Row-X (SX)', /* R */
            4, 'Share', /* S */
            5, 'S/Row-X (SSX)', /* C */
            6, 'Exclusive', /* X */
            '???: '||to_char(RTRIM(L_WAITER.REQUEST))
          )
        SQL_TEXT_WAITER
      FROM
          SYS.VT$OBJ O
        , SYS.USER$ U
        , V$LOCK L_WAITER
        , V$LOCK L_LOCKER
        , V$SESSION S_WAITER
        , V$SESSION S_LOCKER
     WHERE S_WAITER.SID = L_WAITER.SID
       AND L_WAITER.TYPE IN ('TM')
       AND S_LOCKER.sid = L_LOCKER.sid
       AND L_LOCKER.ID1 = L_WAITER.ID1
       AND L_WAITER.REQUEST > 0
       AND L_LOCKER.LMODE > 0
       AND L_WAITER.ADDR != L_LOCKER.ADDR
       AND L_WAITER.ID1 = O.OBJ#
       AND U.USER# = O.OWNER#
   ;

  INSERT INTO tools.deadlock_log_waiters (
      id
    , deadlog_date
    , wait
    , os_user
    , process
    , locker
    , object_owner
    , object_name
    , program)
    SELECT /*+ ordered */
        v_id
      , v_date
      , DECODE(L.REQUEST,0,'NO','YES') WAIT
      , S.OSUSER
      , S.PROCESS
      , S.USERNAME LOCKER
      , U.NAME T_OWNER
      , O.NAME OBJECT_NAME
      , ' '||S.PROGRAM PROGRAM
    FROM
        V$LOCK L
      , SYS.USER$ U
      , SYS.VT$OBJ O
      , V$SESSION S
    WHERE U.USER# = O.OWNER#
       AND S.SID = L.SID
       AND L.ID1 = O.OBJ#
       AND L.TYPE = 'TM'
       AND U.NAME != 'SYS'
       AND O.NAME != 'DEADLOCK_LOG_LOCKERS'
       AND O.NAME != 'DEADLOCK_LOG_WAITERS'
   UNION
    SELECT /*+ ordered */
       v_id
     , v_date
     , DECODE(L.REQUEST,0,'NO','YES') WAIT
     , S.OSUSER
     , S.PROCESS
     , S.USERNAME LOCKER
     , '-'
     , 'Record(s)'
     , ' '||S.PROGRAM PROGRAM
    FROM
        V$LOCK L
      , V$SESSION S
    WHERE S.SID = L.SID
       AND L.TYPE = 'TX'
  ;
  COMMIT;
EXCEPTION
  WHEN OTHERS THEN
    dbms_output.put_line(SQLERRM);
END;
/

The whole 11g changed solution is packed in additional RAR file

What next

I'm perfectly aware that this approach is far from perfect, but for me and a frend of mine, who needed this code ASAP, this is acceptable working solution for a while. When I find some additional not necessary rows are shown in table, I'll just add new filter and it will be quickly fixed.

When (and if!) I get time to re change that code I'll post it. In the meantime I'd like to get some enhancements from someone of you...if you have experience the same problems.
On 12th December I have added a small amount of code according vt$obj which was not clearly described. Thank you for that!

Cheers!

5 comments:

  1. Excellent posting. I think select for update with option nowait or wait <> is best solution for resolution of deadlock or anything else?

    ReplyDelete
  2. I do not think so. Only if you will "select for update" all needed erecords accross the database, what is not so easy.
    However, on small transactions (not too many involved tables) this is the correct solution.

    ReplyDelete
  3. Hi,
    when i tried to compile your script in 11gR1, it says invalid object: vt$obj

    ??
    am i missing something ?

    -- Raj

    ReplyDelete
  4. Rahul Shah,

    Thank you very much...your are welcome!

    ReplyDelete
  5. Sorry...forgot to say that this is custom view created because non privileged user doesn't have right to query any of "v$" views.
    So here is the definition how to correct that. All has to be done in SYS schema:
    create or replace SYS.VT$OBJ as
    select * from SYS.OBJ$;
    then grant select on that view to some user (tools in my case in your case may differ):
    GRANT SELECT ON SYS.VT$OBJ TO TOOLS;

    If you place everything in SYS schema then just replace "VT$OBJ" with "OBJ$" in any appearance!

    Thank you for pointing out minor inconsistency...

    Hope this helps,
    Damir Vadas
    http://damir-vadas.blogspot.com

    ReplyDelete