Wednesday, May 19, 2010

Blocking session ... detect, unblock, notify (Part II)

In my first post on blocking session problem (Blocking session ... detect, unblock, notify) I have shown how to handle these problems pretty clear and, from mine point of view, easy to understand. Recent events on one RAC database and little less experienced DBA, force me to make some updates on this subject to avoid all possible traps that mine solution may have in explanation.

The problem

In the middle of one day on one RAC database (3 nodes), "sb" procedure had output like:
SQL> @sb
Oracle version: 10.1.0.4.0 (10.1.0.4.0)
Blocker         Inst  SID     Serial    [sec]   Lock Type     Module
----------------------------------------------------------------------------------------------------
1. ANAL          1     282        6661     7981   Transaction
       ZLSA        2   372           126     1128     Exclusive 
        SLPE        2   368            48     6836     Exclusive 
        DRI2        3   237           232     6727     Exclusive 
2. DRI2          3     237         232     6724   Transaction 
        DRIA        2   355           195      216     Exclusive  
        NESE        3   114            80     5527     Exclusive  
3. NESE          3     114          80     5521   Transaction
        DRI2        2   379           328      388   Share-Table 
        DRI2        3   115           269     5439   Share-Table 
        DRI2        3   423           619     3529     Exclusive 
        DRI2        2   373           108     5952   Share-Table 
        NESE        1   303          2811     4870     Exclusive 
        DRLO        3   205           749     2862     Exclusive 
        NESE        3   146           220     4766     Exclusive 
        
To kill first from the list, perform:
        
NON RAC (or RAC logged on that node):
---------------------------------
ALTER SYSTEM DISCONNECT  SESSION '282,6661' IMMEDIATE;
ALTER SYSTEM KILL        SESSION '282,6661' IMMEDIATE;
        
        
RAC (logged on any node) :
--------------------------
declare
  v_job binary_integer;
begin
  DBMS_JOB.submit ( job     =>v_job
,what    =>'begin execute immediate ''ALTER SYSTEM KILL SESSION
''''282,6661'''' IMMEDIATE''; end; '
,instance=>1
); 
  commit;
end;
/

PL/SQL procedure successfully completed.

SQL>
Previously mentioned DBA, was using mine solution for a long time without any problems and like in many situation before, submit a job like it was advised.

After some time when he checked blockers with "sb" procedure again, he was astonished that procedure produce the same output like before-like nothing happened. Without thinking why, he submit job again ... and later once again and after third submit that results no "unblocking", he came to me and ask for help.

The analyze part

Like in many situation like this, checking instance alert log file (this is RAC database) is the first step that DBA should do. It's content has at the end entries like:
ORA-12012: error on auto execute of job 1241
ORA-00030: User session ID does not exist.
ORA-06512: at line 1
Checking of dba_jobs shows me that 2 jobs are trying to execute again and again to perform "kill session" for the session that doesn't exist!

Logical question was "How session doesn't exist if "sb" is still showing it? "

After checking, answer was very easy-because of bug in it's logic!

If you look in code of "sb" procedure, you'll find this part in definition of blocking session:
CURSOR c_w IS 
  SELECT  gvw.sid waiter_sid, 
          gvw.inst_id waiter_inst_id 
  FROM gv$lock gvb, gv$lock gvw, gv$session gvs 
  WHERE (gvb.id1, gvb.id2) in ( 
                               SELECT id1, id2 FROM gv$lock WHERE request=0 
                               INTERSECT 
                               SELECT id1, id2 FROM gv$lock WHERE lmode=0 
                              ) 
    AND gvb.id1=gvw.id1 
    AND gvb.id2=gvw.id2 
    AND gvb.request=0 
    AND gvw.lmode=0 
    AND gvb.sid=gvs.sid 
    AND gvb.inst_id=gvs.inst_id 
ORDER BY gvb.CTIME desc ;
All looks OK (and in fact it is!!) but in some real cases, like this one was, it was not properly used. Later in code I loop through that cursor without checking status column (that I do not have at all!) of that session and shows all sessions that are blocked by every cursor entry. So shortly, it was wrong because this cursors shows all sessions regardless their status (which can be "ACTIVE", "INACTIVE", "KILLED" etc.) and some of them might get STATUS = 'KILLED'!
Beside this there was a problem with waiters that are mixed because there was no blockers SID passed as an parameter. So this is why I completely removed c_w cursor and put all in one c_waiters which has two parameters:
CURSOR c_waiters (p_blocker_waiter_sid gv$lock.sid%TYPE, p_blockers_waiter_inst_id gv$lock.inst_id%TYPE) IS
    SELECT 
      nvl(username,'Blocker ???') waiter_user, 
      gvl.sid, 
      gvs.serial# serial, 
      gvl.inst_id, 
      gvl.ctime , 
      gvs.status STATUS,
      module,
      decode(gvl.request, 
                         0, 'None',
                         1, 'NoLock',
                         2, 'Row-Share',
                         3, 'Row-Exclusive',
                         4, 'Share-Table',
                         5, 'Share-Row-Exclusive',
                         6, 'Exclusive',
                            'Nothing-'
      ) lock_req,
      action
     FROM gv$lock gvl, 
          gv$session gvs
    WHERE gvl.request>0
      AND gvl.sid=gvs.sid
      AND gvl.inst_id=gvs.inst_id
      AND gvs.blocking_session=p_blocker_waiter_sid
      AND gvl.inst_id=p_blockers_waiter_inst_id
  ORDER BY gvl.CTIME desc
  ;
And this is exactly what happened in this case. First submitted job really killed that blocking session ("mark for kill") and other two hasn't find anything to kill so they were trying to execute and after some time gone to broken status after n retries.

To make more clear, let me show you the same situation with added STATUS field:
Blocker         Inst  SID     Serial    [sec]   Lock Type     Status       Module
----------------------------------------------------------------------------------------------------
1. ANAL          1     282        6661     7981   Transaction    KILLED
        ZLSA        2   372           126     1128     Exclusive     ACTIVE
        SLPE        2   368            48     6836     Exclusive     ACTIVE
        DRI2        3   237           232     6727     Exclusive     ACTIVE
2. DRI2          3     237         232     6724   Transaction    ACTIVE
        DRIA        2   355           195      216     Exclusive     ACTIVE
        NESE        3   114            80     5527     Exclusive     ACTIVE
3. NESE          3     114          80     5521   Transaction    ACTIVE
        DRI2        2   379           328      388   Share-Table     ACTIVE
        DRI2        3   115           269     5439   Share-Table     ACTIVE
        DRI2        3   423           619     3529     Exclusive     ACTIVE
        DRI2        2   373           108     5952   Share-Table     ACTIVE
        NESE        1   303          2811     4870     Exclusive     ACTIVE
        DRLO        3   205           749     2862     Exclusive     ACTIVE
        NESE        3   146           220     4766     Exclusive     ACTIVE
Now this all looks reasonable...but...then come the second bug. The second bug in sb procedure was that "kill statement" was always constructed from first blocker in list. And this was again wrong because this session may be killed and then this kill command would produce an error!

The solution

When I realize that, I rewrite a new sb procedure:
/* ---------------------------------------------------------------------------
 Filename: check_bloking_session.prc
 CR/TR#  : 
 Purpose : Find blocking sessions and create release statements for RAC and non RAC environment  

 Date    : 08.07.2005.
 Author  : Damir Vadas, damir.vadas@gmail.com
 
 Remarks : Tested on 10g only (should work on 11g as well and lower versions)

 Changes (DD.MM.YYYY, Name, CR/TR#):
          11.12.2009, Damir Vadas
                      added p_to and p_cc as parameters to avoid bad header encodding on hac server
                      (no success!) 
          30.12.2009, Damir Vadas
                      added checking (LENGTH(s_mail_text)+LENGTH(p_text)+1 <= 4000)
                      because "ORA-06502: PL/SQL: numeric or value error: character string buffer too small"
          06.01.2010, Damir Vadas
                      added 32k size of message and exception log handler
          30.04.2010, Damir Vadas
                      fixed bug with showing last blocker not first one in kill statement
                      added STATUS column to show if session is killed (needed for previous fixed)
          01.09.2010, Damir Vadas
                      fixed bug blockers and waiters to be 11g compatible
--------------------------------------------------------------------------- */ 
set serveroutput on size 123456
set linesize 140
set pagesize 100
set trimout on

declare
  cursor c_blockers IS
    SELECT DISTINCT 
       nvl(username,'BLOCKER ???') blocker_user, 
       gvl.sid, 
       gvs.serial# serial, 
       gvl.inst_id, 
       gvl.ctime ,
       gvs.status STATUS,      
       module, 
       action,
       decode(gvl.type, 'MR', 'Media_recovery',
                        'RT', 'Redo_thread',
                        'UN', 'User_name',
                        'TX', 'Transaction',
                        'TM', 'Dml',
                        'UL', 'PLSQL User_lock',
                        'DX', 'Distrted_Transaxion',
                        'CF', 'Control_file',
                        'IS', 'Instance_state',
                        'FS', 'File_set',
                        'IR', 'Instance_recovery',
                        'ST', 'Diskspace Transaction',
                        'IV', 'Libcache_invalidation',
                        'LS', 'LogStaartORswitch',
                        'RW', 'Row_wait',
                        'SQ', 'Sequence_no',
                        'TE', 'Extend_table',
                        'TT', 'Temp_table',
                              'Nothing-'
       ) lock_type
     FROM gv$lock gvl, 
          gv$session gvs
    WHERE gvl.request=0
      AND gvl.block=1
      AND gvl.sid=gvs.sid
      AND gvl.inst_id=gvs.inst_id
  ORDER BY CTIME desc
  ;  
  
  CURSOR c_waiters (p_blocker_waiter_sid gv$lock.sid%TYPE, p_blockers_waiter_inst_id gv$lock.inst_id%TYPE) IS
    SELECT 
      nvl(username,'WAITER ???') waiter_user, 
      gvl.sid, 
      gvs.serial# serial, 
      gvl.inst_id, 
      gvl.ctime , 
      gvs.status STATUS,
      module,
      decode(gvl.request, 
                         0, 'None',
                         1, 'NoLock',
                         2, 'Row-Share',
                         3, 'Row-Exclusive',
                         4, 'Share-Table',
                         5, 'Share-Row-Exclusive',
                         6, 'Exclusive',
                            'Nothing-'
      ) lock_req,
      action
     FROM gv$lock gvl, 
          gv$session gvs
    WHERE gvl.request>0
      AND gvl.sid=gvs.sid
      AND gvl.inst_id=gvs.inst_id
      AND gvs.blocking_session=p_blocker_waiter_sid
      AND gvl.inst_id=p_blockers_waiter_inst_id
  ORDER BY gvl.CTIME desc
  ;

  bHasAny  boolean;
  i        NUMBER := 0;
  v_sid    NUMBER := 0;
  v_serial NUMBER := 0;
  v_first_sid    NUMBER := -1;
  v_first_serial NUMBER := -1;
  v_instance        BINARY_INTEGER := 0;
  v_first_instance  BINARY_INTEGER := 0;
  db_ver            VARCHAR2(128);
  db_ver2           VARCHAR2(128);  
  s_job_command     VARCHAR2 (4000);
  s_mail_text       VARCHAR2 (32000);
  s_subject         VARCHAR2 (256);
  s_smtp_out_server VARCHAR2 (256);
  s_host_name       v$instance.host_name%TYPE;
  s_db_name         v$database.name%TYPE;
  s_instance_name   v$instance.instance_name%TYPE;
  s                 VARCHAR2(1024);
  
  PROCEDURE add_message_line (p_text VARCHAR2) is
  BEGIN
    -- for proceduire version with mail part
    -- s_mail_text := s_mail_text || p_text || chr(10);     
    dbms_output.put_line (p_text);
  END;
  
begin
  dbms_utility.db_version(db_ver,db_ver2);
  add_message_line ('Oracle version: '||db_ver|| ' ('||db_ver2||')');
  bHasAny := FALSE;  
  FOR v_blockers in c_blockers loop
    bHasAny := TRUE;     
    IF (i=0) THEN                  
      add_message_line (rpad('Blocker',12,' ')||'  '||'  Inst  '||rpad('SID',6,' ')||'  '||rpad('Serial',8,' ')||'  '||'  '||rpad('[sec]',6,' ')||' '|| 
                        lpad('Lock Type',10,' ')||' '|| lpad('Status',14, ' ') || lpad('Module',13,' ')
                        );
      add_message_line (rpad('-',120,'-'));
    END IF;
    v_sid := v_blockers.sid;
    v_serial := v_blockers.serial;
    v_instance := v_blockers.inst_id ;
    -- only first blocker (who is not killed!) is important!
    IF v_blockers.status != 'KILLED' AND v_first_sid = -1 THEN
      v_first_sid := v_blockers.sid;
      v_first_serial := v_blockers.serial;
      v_first_instance := v_blockers.inst_id ;
    END IF;
    add_message_line (LPAD(to_char(i+1),3,' ')||'. '||rpad(v_blockers.blocker_user,12,' ')||'  '||rpad(v_blockers.inst_id,4,' ')||'  '||rpad(v_blockers.SID,8,' ' )||'  '||lpad(v_blockers.serial,5,' ')||'  '||lpad(v_blockers.CTIME,7,' ')||' '||
                      lpad(v_blockers.lock_type,13,' ')||lpad(v_blockers.status,14,' ')||lpad(nvl(v_blockers.module,'?'),15,' ')
                     );
    FOR v_waiters IN c_waiters (v_blockers.sid,v_blockers.inst_id ) LOOP 
      add_message_line (lpad(chr(9),9)||rpad(v_waiters.waiter_user,10,' ')||'  '||rpad(to_char(v_waiters.inst_id),4,' ')||rpad(to_char(v_waiters.sid),4,' ')|| ' '||lpad(to_char(v_waiters.serial),12,' ')||' ' ||' '|| 
                  ' ' ||lpad(to_char(v_waiters.ctime),6,' ')||' '|| lpad(to_char(v_waiters.lock_req),13,' ')||' '|| lpad(v_blockers.status,14,' ')|| ' '|| lpad(to_char(nvl(v_waiters.module,'?')),15,' ')
                  );      
    END LOOP;
    IF bHasAny THEN
      i:=i+1;
    END IF;
  END LOOP;
  -- show any kind of blocker (killed one also...as info)
  IF bHasAny THEN
    -- show blocking rows info
    -- add_message_line (chr(9));
    -- show_blocked_records (s_blocked_rows);
    -- dbms_output.put_line(s_blocked_rows);
    -- add_message_line (s_blocked_rows);  
    -- for blockers that are valid (not killed!) create kill statement
    IF ( v_first_sid > 0 ) THEN
      SELECT instance_name, host_name INTO s_instance_name, s_host_name FROM v$instance;
      SELECT name INTO s_db_name FROM v$database; 
      add_message_line (chr(9));
      add_message_line ('To kill first from the list, perform: ');
      add_message_line (chr(9));
      IF db_ver like '10%' THEN
        add_message_line ('NON RAC (or RAC logged on that node):');
        add_message_line ('---------------------------------');
        add_message_line ('ALTER SYSTEM DISCONNECT  SESSION '''||v_first_sid||','||v_first_serial||''' IMMEDIATE;');
        add_message_line ('ALTER SYSTEM KILL        SESSION '''||v_first_sid||','||v_first_serial||''' IMMEDIATE;');
        add_message_line (chr(9));
        add_message_line (chr(9));
        s := '''begin execute immediate ''''ALTER SYSTEM KILL SESSION '''''''''||v_first_sid||','||v_first_serial||''''''''' IMMEDIATE''''; end; ''';     
        add_message_line ('RAC (logged on any node) :');
        add_message_line ('--------------------------');
        s_job_command := 'declare'|| chr(10)||
                         '  v_job binary_integer;'|| chr(10)|| 
                         'begin'|| chr(10)||
                         '  DBMS_JOB.submit ('||
                         ' job     =>v_job'||chr(10)||
                         ',what    =>'||s||chr(10)||
                         ',instance=>'||v_first_instance||chr(10)||
                         '); '||chr(10)||
                         '  commit;'||chr(10)||
                         'end;'|| chr(10) ||
                         '/'
        ;
        add_message_line (s_job_command);
      ELSIF db_ver like '11%' THEN
        add_message_line (chr(9));
        add_message_line ('ALTER SYSTEM DISCONNECT  SESSION '''||v_first_sid||','||v_first_serial||',@'||v_first_instance||''' IMMEDIATE;');
        add_message_line ('ALTER SYSTEM KILL        SESSION '''||v_first_sid||','||v_first_serial||',@'||v_first_instance||''' IMMEDIATE;');    
      ELSE
        add_message_line (chr(9));
        add_message_line ('Untested version... Try to logon as priveledged user on node ('||s_host_name||') and perform:');
        add_message_line ('ALTER SYSTEM DISCONNECT  SESSION '''||v_first_sid||','||v_first_serial||''' IMMEDIATE;');
        add_message_line ('ALTER SYSTEM KILL        SESSION '''||v_first_sid||','||v_first_serial||''' IMMEDIATE;');    
      END IF; 
    ELSE
      add_message_line ('All valid blocker sessions allready killed. Try kill with OS level command!');
    END IF;
  ELSE
    add_message_line ('No blocking sessions found!');
  END IF;
EXCEPTION
  WHEN OTHERS THEN
    -- auto_log_error_core('"check_blocking_session" procedure. ');
    null;
end;
/

Now when you run the sb procedure, the same situation will look like:
SQL> @sb
Oracle version: 10.1.0.4.0 (10.1.0.4.0)
Blocker         Inst  SID     Serial    [sec]   Lock Type     Status       Module
----------------------------------------------------------------------------------------------------
1. ANAL          1     282        6661     7981   Transaction    KILLED
       ZLSA        2   372           126     1128     Exclusive     ACTIVE
        SLPE        2   368            48     6836     Exclusive     ACTIVE
        DRI2        3   237           232     6727     Exclusive     ACTIVE
2. DRI2          3     237         232     6724   Transaction    ACTIVE
        DRIA        2   355           195      216     Exclusive     ACTIVE
        NESE        3   114            80     5527     Exclusive     ACTIVE
3. NESE          3     114          80     5521   Transaction    ACTIVE
        DRI2        2   379           328      388   Share-Table     ACTIVE
        DRI2        3   115           269     5439   Share-Table     ACTIVE
        DRI2        3   423           619     3529     Exclusive     ACTIVE
        DRI2        2   373           108     5952   Share-Table     ACTIVE
        NESE        1   303          2811     4870     Exclusive     ACTIVE
        DRLO        3   205           749     2862     Exclusive     ACTIVE
        NESE        3   146           220     4766     Exclusive     ACTIVE
        
To kill first from the list, perform:
        
NON RAC (or RAC logged on that node):
---------------------------------
ALTER SYSTEM DISCONNECT  SESSION '237,232' IMMEDIATE;
ALTER SYSTEM KILL        SESSION '237,232' IMMEDIATE;
        
        
RAC (logged on any node) :
--------------------------
declare
  v_job binary_integer;
begin
  DBMS_JOB.submit ( job     =>v_job
,what    =>'begin execute immediate ''ALTER SYSTEM KILL SESSION
''''237,232'''' IMMEDIATE''; end; '
,instance=>3
); 
  commit;
end;
/

PL/SQL procedure successfully completed.

SQL>
So it would construct kill statement for second blocker and skip first blocker which is "marked for kill". And now this is regular.

Kill "marked for kill"

If you look in oracle theory, "KILLED" status does not affect the work performed by the command, but it returns control back to the current session immediately, rather than waiting for confirmation of the kill. If the marked session persists for some time you may consider killing the process at the operating system level.

So practically it was still running in Oracle and the only way to immediately kill this session was to perform from OS level command!

Just to remind you differences between DISCONNECT and KILL SESSION
  • IMMEDIATE clause with KILL SESSION
    instruct Oracle to roll back ongoing transactions, release all session locks, recover the entire session state, and return control to you immediately.
  • IMMEDIATE clause with DISCONNECT SESSION
    disconnects the session and recovers the entire session state immediately, without waiting for ongoing transactions to complete.
If you are on Linux this action is peace of cake but on Windows ... good luck with "orakill" command, command which in most of the cases doesn't kill at all!

Be proactive

According this do not forget to replace other part of solution (proactive checking of blocking session-"check_blocking_session.prc") which I place in job with 5 minutes schedule and run automatically. Here is final version for that cause.
CREATE OR REPLACE procedure check_blocking_session (
/* $Header: check_blocking_session.prc 1.12 10/19/2010 20:45 damirv $ */
/* ---------------------------------------------------------------------------

           Copyright(C) 1995-2009 TEB Informatika d.o.o.


 Filename: check_blocking_session.prc
 CR/TR#  : 
 Purpose : Find blocking sessions and create release statements for RAC and non RAC environment  

 Date    : 08.07.2005.
 Author  : Damir Vadas, damir.vadas@gmail.com
 
 Remarks : Tested on 10g only (should work on 11g as well and lower versions)

 Changes (DD.MM.YYYY, Name, CR/TR#):
          11.12.2009, Damir Vadas
                      added p_to and p_cc as parameters to avoid bad header encodding on hac server
                      (no success!) 
          30.12.2009, Damir Vadas
                      added checking (LENGTH(s_mail_text)+LENGTH(p_text)+1 <= 4000)
                      because "ORA-06502: PL/SQL: numeric or value error: character string buffer too small"
          06.01.2010, Damir Vadas
                      added 32k size of message and exception log handler
          30.04.2010, Damir Vadas
                      fixed bug with showing last blocker not first one in kill statement
                      added STATUS column to show if session is killed (needed for previous fixed)
          01.09.2010, Damir Vadas
                      fixed bug blockers and waiters to be 11g compatible
          20.09.2010, Damir Vadas
                      Reunited logic for script/procedure
          19.10.2010, Damir Vadas
                      Removed p_smtp_out_server VARCHAR2 DEFAULT 'mail.iskon.hr' declaration
--------------------------------------------------------------------------- */
                                        p_sender          VARCHAR2 DEFAULT 'dba@teb-informatika.hr',
                                        p_to              VARCHAR2 DEFAULT 'damir.vadas@teb-informatika.hr',
                                        p_cc              VARCHAR2 DEFAULT null,
                                        p_subject         VARCHAR2 DEFAULT 'Blocking session occur!',
                                        p_send_mail       BOOLEAN  DEFAULT TRUE)
AS
  b_is_script BOOLEAN DEFAULT FALSE;
  CURSOR c_blockers IS
    SELECT DISTINCT 
           NVL(username,'BLOCKER ???') blocker_user, 
           gvb.sid, 
           gvs.serial# serial, 
           gvb.inst_id, 
           gvb.ctime ,
           gvs.status STATUS,      
           module, 
           action,
           NVL (
             decode(gvb.type, 
                       'AE', 'Edition Lock',                                          -- Prevent Dropping an edition in use 11g
                       'BL','Buffer hash table',
                       'CF','Control File Transaction',
                       'CI','Cross Instance Call',
                       'CS','Control File Schema',
                       'CU','Bind Enqueue',
                       'DF','Data File',
                       'DL','Direct-loader index-creation',
                       'DM','Mount/startup db primary/secondary instance',
                       'DR','Distributed Recovery Process',
                       'DX','Distributed Transaction Entry',
                       'FI','SGA Open-File Information',
                       'FS','File Set',
                       'IN','Instance Number',
                       'IR','Instance Recovery Serialization',
                       'IS','Instance State',
                       'IV','Library Cache InValidation',
                       'JQ','Job Queue',
                       'KK','Redo Log "Kick"',
                       'LS','Log Start/Log Switch',
                       'MB','Master Buffer hash table',
                       'MM','Mount Definition',
                       'MR','Media Recovery',
                       'PF','Password File',
                       'PI','Parallel Slaves',
                       'PR','Process Startup',
                       'PS','Parallel Slaves Synchronization',
                       'RE','USE_ROW_ENQUEUE Enforcement',
                       'RT','Redo Thread',
                       'RW','Row Wait',
                       'SC','System Commit Number',
                       'SH','System Commit Number HWM',
                       'SM','SMON',
                       'SQ','Sequence Number',
                       'SR','Synchronized Replication',
                       'SS','Sort Segment',
                       'ST','Space Transaction',
                       'SV','Sequence Number Value',
                       'TA','Transaction Recovery',
                       'TD','DDL enqueue',
                       'TE','Extend-segment enqueue',
                       'TM','DML enqueue',
                       'TO', 'Temp Object',                                              --Sync DDL and DML operations on a temp object
                       'TS','Temporary Segment',
                       'TT','Temporary Table',
                       'TX','Transaction',
                       'UL','User-defined Lock',
                       'UN','User Name',
                       'US','Undo Segment Serialization',
                       'WL','Being-written redo log instance',
                       'WS','Write-atomic-log-switch global enqueue',
                       'XA','Instance Attribute',
                       'XI','Instance Registration', 
                       decode(substr(gvb.type,1,1),
                                   'L','Library Cache     ('||substr(gvb.type,2,1)||')', -- Library cache lock instance lock (A..P = namespace)
                                   'N','Library Cache Pin ('||substr(gvb.type,2,1)||')', -- Library cache pin instance (A..Z = namespace)
                                   'Q','Row Cache         ('||substr(gvb.type,2,1)||')', -- Row cache instance (A..Z = cache)
                                   '????'
                       ),
                       'Bl. lt:"'||gvb.type||'"' 
                   ), 'NULL Block Type'
           ) lock_type
     FROM gv$lock gvb, gv$lock gvw, gv$session gvs
    WHERE (gvb.id1, gvb.id2) in (
                                 SELECT id1, id2 FROM gv$lock WHERE request=0
                                 INTERSECT
                                 SELECT id1, id2 FROM gv$lock WHERE lmode=0
                                )
      AND gvb.id1=gvw.id1
      AND gvb.id2=gvw.id2
      AND gvb.request=0
      AND gvw.lmode=0
      AND gvb.sid=gvs.sid
      AND gvb.inst_id=gvs.inst_id
  ORDER BY CTIME desc
  ;
  
  -- waiters 
  CURSOR c_waiters (p_blocker_waiter_sid gv$lock.sid%TYPE, p_blockers_waiter_inst_id gv$lock.inst_id%TYPE) IS
    SELECT 
           NVL(username,'WAITER ???') waiter_user, 
           gvs.sid, 
           gvs.serial# serial, 
           gvs.inst_id, 
           gvw.ctime , 
           gvs.status STATUS,
           module,
           action,
           decode(gvw.request, 
                              0, 'None',
                              1, 'NoLock',
                              2, 'Row-Share',
                              3, 'Row-Exclusive',
                              4, 'Share-Table',
                              5, 'Share-Row-Exclusive',
                              6, 'Exclusive',
                                 'Nothing-'
           ) lock_req
     FROM gv$lock gvw, gv$session gvs
    WHERE 1=1
      AND gvw.request>0
      AND gvw.lmode=0
      AND gvw.sid=gvs.sid
      AND gvw.inst_id=gvs.inst_id
      AND gvs.blocking_session=p_blocker_waiter_sid 
      AND gvs.blocking_instance=p_blockers_waiter_inst_id
  ORDER BY CTIME desc
  ;
  
  bHasAny  boolean;
  i        NUMBER := 0;
  v_sid    NUMBER := 0;
  v_serial NUMBER := 0;
  v_first_sid    NUMBER := -1;
  v_first_serial NUMBER := -1;
  v_instance        BINARY_INTEGER := 0;
  v_first_instance  BINARY_INTEGER := 0;
  db_ver            VARCHAR2(128);
  db_ver2           VARCHAR2(128);  
  s_job_command     VARCHAR2 (4000);
  s_mail_text       VARCHAR2 (32000);
  s_subject         VARCHAR2 (256);
  s_smtp_out_server VARCHAR2 (256);
  s_host_name       v$instance.host_name%TYPE;
  s_db_name         v$database.name%TYPE;
  s_instance_name   v$instance.instance_name%TYPE;
  s                 VARCHAR2(1024);
  
  s_blocked_rows    VARCHAR2(4000);
  
  l_temp            VARCHAR2(64);

  FUNCTION add_indent_char RETURN VARCHAR2 
  IS
  BEGIN
    IF b_is_script THEN
      RETURN chr(255);
    ELSE
      RETURN chr(9);
    END IF;
  END;

  PROCEDURE add_message_line (p_text VARCHAR2) is
  BEGIN
    dbms_output.put_line (p_text);
    s_mail_text := s_mail_text || p_text || chr(10);     
  END;
  
BEGIN
  b_is_script := NOT p_send_mail;
  add_message_line (chr(9));
  dbms_utility.db_version(db_ver,db_ver2);
  add_message_line ('Oracle version: '||db_ver|| ' ('||db_ver2||')');
  dbms_output.put_line (chr(9)); 
  bHasAny := FALSE;  
  FOR r_blockers in c_blockers loop
    bHasAny := TRUE;     
    IF (i=0) THEN       
      add_message_line (chr(9));    
      add_message_line (rpad('Blocker',17,' ')||
                        lpad('Inst',4,' ')||
                        lpad('SID',7,' ')||
                        lpad('Serial',8,' ')||
                        lpad('[sec]',10,' ')|| 
                        lpad('Lock Type/Req. lock',30,' ')||
                        lpad('Status',13, ' ') ||
                        lpad('Module',25,' ')
                        );
      add_message_line (rpad('-',120,'-'));
    END IF;
    v_sid := r_blockers.sid;
    v_serial := r_blockers.serial;
    v_instance := r_blockers.inst_id ;
    -- only first blocker (who is not killed!) is important!
    IF r_blockers.status != 'KILLED' AND v_first_sid = -1 THEN
      v_first_sid := r_blockers.sid;
      v_first_serial := r_blockers.serial;
      v_first_instance := r_blockers.inst_id ;
    END IF;
    -- print blockers sessions
    add_message_line (rpad((to_char(i+1)||'.'||r_blockers.blocker_user),17,' ')||
                      lpad(r_blockers.inst_id,4,' ')||
                      lpad(r_blockers.SID,7,' ' )||
                      lpad(r_blockers.serial,8,' ')||
                      lpad(r_blockers.CTIME,10,' ')||
                      lpad(r_blockers.lock_type,30,' ')||
                      lpad(r_blockers.status,13,' ')||
                      lpad(nvl(r_blockers.module,'?'),25,' ')
                     );
    -- print waiters session for current blocker
    FOR r_waiters IN c_waiters (r_blockers.sid ,r_blockers.inst_id) LOOP 
      IF b_is_script THEN
        l_temp := add_indent_char; -- || add_indent_char;
      ELSE
        l_temp := add_indent_char;
      END IF;
      add_message_line (l_temp||
                        rpad(r_waiters.waiter_user,17,' ')||
                        lpad(r_waiters.inst_id,4,' ')||
                        lpad(r_waiters.sid,7,' ')||
                        lpad(r_waiters.serial,8,' ')||
                        lpad(r_waiters.ctime,10,' ')||
                        lpad(r_waiters.lock_req,30,' ')|| 
                        lpad(r_waiters.status,13,' ')|| 
                        lpad(nvl(r_waiters.module,'?'),25,' ')
                  );      
    END LOOP;
    IF bHasAny THEN
      i:=i+1;
    END IF;
  END LOOP;
  
  IF (v_first_serial = 1 ) THEN
    add_message_line (chr(9));
    add_message_line ('Warning!');
    add_message_line ('-------------------------------------------------------------------------------');
    add_message_line ('Blocking sessions with serial=1 are core ORACLE PROCESSES.');
    add_message_line ('Killing them without REAL NEED and DOUBLE CHECKING may dangerous your instance!');
    add_message_line ('-------------------------------------------------------------------------------');
    add_message_line (chr(9));
  END IF;
  -- show any kind of blocker (killed one also...as info)
  IF bHasAny THEN
    IF NOT b_is_script THEN
      -- show blocking rows info
      add_message_line (chr(9));
      add_message_line ('---------------  Blocking records information  ---------------');
      add_message_line (chr(9));
      show_blocked_records (s_blocked_rows);
      IF s_blocked_rows is null THEN
        add_message_line(chr(9)||chr(9)||'No locked records found.');
      ELSE
        s_mail_text := s_mail_text || s_blocked_rows || chr(10);   
      END IF;
    END IF;
    -- for blockers that are valid (not killed!) create kill statement
    SELECT instance_name, host_name INTO s_instance_name, s_host_name FROM v$instance;
    SELECT name INTO s_db_name FROM v$database; 
    s_subject := p_subject || '  '|| s_db_name || '('|| s_instance_name || ') on ' || s_host_name ||''; 
    add_message_line (chr(9));
    IF (v_first_sid > 0 ) THEN
      add_message_line ('To kill first from the list, perform: ');
      add_message_line (chr(9));
      IF db_ver like '10%' THEN
        add_message_line ('NON RAC (or RAC logged on that node):');
        add_message_line ('---------------------------------');
        add_message_line ('ALTER SYSTEM DISCONNECT  SESSION '''||v_first_sid||','||v_first_serial||''' IMMEDIATE;');
        add_message_line ('ALTER SYSTEM KILL        SESSION '''||v_first_sid||','||v_first_serial||''' IMMEDIATE;');
        add_message_line (chr(9));
        add_message_line (chr(9));
        s := '''begin execute immediate ''''ALTER SYSTEM DISCONNECT SESSION '''''''''||v_first_sid||','||v_first_serial||''''''''' IMMEDIATE''''; end; ''';     
        add_message_line ('RAC (logged on any node) :');
        add_message_line ('--------------------------');
        s_job_command := 'declare'|| chr(10)||
                         '  v_job binary_integer;'|| chr(10)|| 
                         'begin'|| chr(10)||
                         '  DBMS_JOB.submit ('||
                         ' job     =>v_job'||chr(10)||
                         ',what    =>'||s||chr(10)||
                         ',instance=>'||v_first_instance||chr(10)||
                         '); '||chr(10)||
                         '  commit;'||chr(10)||
                         'end;'|| chr(10) ||
                         '/'
        ;
        add_message_line (s_job_command);
      ELSIF db_ver like '11%' THEN
        add_message_line (chr(9));
        add_message_line ('ALTER SYSTEM DISCONNECT  SESSION '''||v_first_sid||','||v_first_serial||',@'||v_first_instance||''' IMMEDIATE;');
        add_message_line ('ALTER SYSTEM KILL        SESSION '''||v_first_sid||','||v_first_serial||',@'||v_first_instance||''' IMMEDIATE;');    
      ELSE
        add_message_line (chr(9));
        add_message_line ('Untested version... Try to logon as priveledged user on node ('||s_host_name||') and perform:');
        add_message_line ('ALTER SYSTEM DISCONNECT  SESSION '''||v_first_sid||','||v_first_serial||''' IMMEDIATE;');
        add_message_line ('ALTER SYSTEM KILL        SESSION '''||v_first_sid||','||v_first_serial||''' IMMEDIATE;');    
      END IF;
    ELSE
      add_message_line ('All valid blocker sessions allready killed. Try kill with OS level command!');
    END IF; 
    IF NOT b_is_script THEN
      s_mail_text := s_mail_text ||  (chr(9)) || chr(10);
      s_mail_text := s_mail_text ||  ('This is automated message so, do not reply to this mail.') || chr(10);
      s_mail_text := s_mail_text ||  ('Regards,')|| chr(10);
      s_mail_text := s_mail_text ||  ('Your auto DBA d.o.o.')|| chr(10);
      s_mail_text := s_mail_text ||  (CHR(9));
      select to_char(sysdate, 'dd.mm.yyyy hh24:mi:ss') into s from dual;
      s_mail_text := s_mail_text ||  ('Zagreb, '||s);
      -- if you run this script from sqlplus then there is no need for this (p_send_mail=FALSE)
      IF p_send_mail THEN
        SYS.UTL_MAIL.send(sender => p_sender,
                  recipients => p_to,
                          cc => p_cc,
                     subject => s_subject,
                     message => s_mail_text,
                   mime_type => 'text/plain; charset=us-ascii');
      END IF;
    END IF;
  ELSE
    dbms_output.put_line ('No blocking sessions found!');
  END IF;
/*
EXCEPTION
  WHEN OTHERS THEN
    dbms_output.put_line(SQLERRM);
    */
END;
/

The end

If you get mail now for blocking session and all blocker sessions are killed, then you'll see output like this:
Blocker         Inst  SID     Serial    [sec]   Lock Type     Status       Module
----------------------------------------------------------------------------------------------------
1. ANAL          1     282        6661     7981   Transaction    KILLED
        ZLSA        2   372           126     1128     Exclusive     ACTIVE
        SLPE        2   368            48     6836     Exclusive     ACTIVE
        DRI2        3   237           232     6727     Exclusive     ACTIVE
2. DRI2          3     237         232     6724   Transaction    KILLED
        DRIA        2   355           195      216     Exclusive     ACTIVE

All valid blocker sessions already killed. Try kill with OS level command!
What is right and correct message!

For the end here is how this rewrite version produce result. Here is mail that you'll get in a case of blocking session:
Oracle version: 11.2.0.1.0 (11.1.0.0.0)

Blocker         Inst   SID    Serial    [sec]   Lock Type        Status          Module
------------------------------------------------------------------------------------------------------------------------
 1. DAMIRV        1     20        6       54   Transaction      INACTIVE         SQL*Plus
       DAMIRV      1     141       40       26     Exclusive       INACTIVE        SQL*Plus
 2. DAMIRV        1     141       4       42   Transaction        ACTIVE         SQL*Plus
       DAMIRV      1     144       58       16     Exclusive         ACTIVE        SQL*Plus

---------------  Blocking records information  ---------------
DAMIRV (1 '20,68')
               DAMIRV    1     141,40           26 sec       Exclusive         ACTIVE        SQL*Plus
               SELECT C1 FROM DAMIRV.X WHERE rowid = 'AAAVoXAAGAAAAS1AAA' ;
                       (delete x                                  )

DAMIRV (1 '141,40')
               DAMIRV    1     144,58           16 sec       Exclusive         ACTIVE        SQL*Plus
               SELECT C1 FROM DAMIRV.XX WHERE rowid = 'AAAVoYAAGAAAAS8AAA' ;
                       (delete xx where c1='1'                    )



To kill first from the list, perform:

ALTER SYSTEM DISCONNECT  SESSION '20,68,@1' IMMEDIATE;
ALTER SYSTEM KILL        SESSION '20,68,@1' IMMEDIATE;
--------------------------------------------------------
This is automated message so, do not reply to this mail.
Regards,
Your auto DBA d.o.o.

Zagreb, 21.05.2010 23:39:33
As you see now you can see which row is causing block and statement that is blocked in waiter session-looks promising!

Keep in mind that these two procedures are called in two separate calls, so sometimes second call may shown no blocking records ... and they do not exists in second case call and was existing in first time call.
Cheers!

11 comments:

  1. Veoma korisno!
    Kako rjesavas slucajeve, kada ti session nije blokiran od nekog drugog sessiona, ali ipak "stoji" (hungs) sa wait eventom "latch: cache buffers chains" ?

    ReplyDelete
  2. "latch: cache buffers chains"
    Nepresušno vrelo Oracle problema.
    Preopširno ali mislim da je dobro pokriveno na WEB-u i zaista ovisi od slučaja do slučaja kao i od verzije i mnoštva detalja.
    U svakom slučaju "pain in the ass" ako se trebaš s time "bakčati"...
    Preporuka za početak:
    http://blog.tanelpoder.com/2009/08/27/latch-cache-buffers-chains-latch-contention-a-better-way-for-finding-the-hot-block/
    Ako nije i upgrade...i onda pooznati bug:
    Bug 5009307: HIGH WAITS ON GC CR GRANT CONGESTED EVENT AFTER 10G UPGRADE

    ReplyDelete
  3. Dear Damir,

    I tried to run check_blocking_session.prc script but there was no locked information display. This is what I did:

    On first session
    ================

    SQL> select instance_name from v$instance;

    INSTANCE_NAME
    ----------------
    histdb1

    SQL> select * from t1;

    C1
    ----------
    5
    2

    SQL> update t1 set c1=8 where c1=5;

    1 row updated.


    On other session
    ================

    SQL> select instance_name from v$instance;

    INSTANCE_NAME
    ----------------
    histdb2

    SQL> update t1 set c1=10 where c1=5;


    On new session
    ==============


    SQL> select instance_name from v$instance;

    INSTANCE_NAME
    ----------------
    histdb1

    SQL> @check_blocking_session.prc
    Oracle version: 10.2.0.4.0 (10.1.0.2.0)
    No blocking sessions found!

    PL/SQL procedure successfully completed.

    Please help why the program not display locked session.

    Regards,
    Jack

    ReplyDelete
  4. Can you prove from Toad that you are prodcing blocking session really?

    ReplyDelete
  5. I have checked "check_bloking_session.prc" script and works as expected. Try firstly that script.

    "procedure check_blocking_session" is working in mine place OK.

    I think you are not generating real blocking session or you are performing execution with users that has not enough privileges. Second script is sending mail.

    ReplyDelete
  6. Hello Damir,
    is it possible to turn your script to be more aggressive, let say. dont just send email but acctually do the Killing itself if the blocker is reaching 60secs. , then send email just for info.
    sometimes these block can block a very central record like Sequence record, and happens during night while system need to go on .
    please help enlight me, thanks a lot

    ReplyDelete
    Replies
    1. @y.g.
      Yes.
      Just place text content of mail which is sent to you inside "execute immediste ''" command ant it will kill immediately.
      But then do make soem additional checking that this is not core oracle process or any sys, system session!

      Rg,
      Damir

      Delete
    2. @Damir, I do it by checking the (session.OSUSER is not 'oracle' OR OSUSER is null) to prevent from killing oracle process.
      many many thanks to you...
      y.g.

      Delete
  7. This comment has been removed by the author.

    ReplyDelete
  8. Hi,

    Just recently I have the unfortunate event of having a blocking session from OEM that has (J001) DBMS_SCHEDULER : GATHER_STATS_JOB blocking MMON_SLAVE : Auto-Purge Slave A.

    I run both versions of sb.sql. The one from part I shows the blocking session but the part II version doesn't? I can't understand why? This is on a 3-node RAC, the blocking session is on NODE 1 and the blocked session is on NDOE 3. I run both versions of the sb.sql script on all nodes and they both give the same result. That is, Part I version shows the blocking session as expected but the Part II version doesn't.

    Not sure if this is something that you may want to test. Unfortunately as these are Oracle's background process, I can't reproduce on demand, the stats run from 10pm to 6am by default and the blocking session stop when the job was stopped. This is the first time that this blocking session happens. The DB is on 10.2.0.5.

    ReplyDelete
  9. are you saying that the version I gives you different result than version II?
    if so, could you output result from both versions by mail?

    ReplyDelete