The problem
In the middle of one day on one RAC database (3 nodes), "sb" procedure had output like:SQL> @sb Oracle version: 10.1.0.4.0 (10.1.0.4.0) Blocker Inst SID Serial [sec] Lock Type Module ---------------------------------------------------------------------------------------------------- 1. ANAL 1 282 6661 7981 Transaction ZLSA 2 372 126 1128 Exclusive SLPE 2 368 48 6836 Exclusive DRI2 3 237 232 6727 Exclusive 2. DRI2 3 237 232 6724 Transaction DRIA 2 355 195 216 Exclusive NESE 3 114 80 5527 Exclusive 3. NESE 3 114 80 5521 Transaction DRI2 2 379 328 388 Share-Table DRI2 3 115 269 5439 Share-Table DRI2 3 423 619 3529 Exclusive DRI2 2 373 108 5952 Share-Table NESE 1 303 2811 4870 Exclusive DRLO 3 205 749 2862 Exclusive NESE 3 146 220 4766 Exclusive To kill first from the list, perform: NON RAC (or RAC logged on that node): --------------------------------- ALTER SYSTEM DISCONNECT SESSION '282,6661' IMMEDIATE; ALTER SYSTEM KILL SESSION '282,6661' IMMEDIATE; RAC (logged on any node) : -------------------------- declare v_job binary_integer; begin DBMS_JOB.submit ( job =>v_job ,what =>'begin execute immediate ''ALTER SYSTEM KILL SESSION ''''282,6661'''' IMMEDIATE''; end; ' ,instance=>1 ); commit; end; / PL/SQL procedure successfully completed. SQL>Previously mentioned DBA, was using mine solution for a long time without any problems and like in many situation before, submit a job like it was advised.
After some time when he checked blockers with "sb" procedure again, he was astonished that procedure produce the same output like before-like nothing happened. Without thinking why, he submit job again ... and later once again and after third submit that results no "unblocking", he came to me and ask for help.
The analyze part
Like in many situation like this, checking instance alert log file (this is RAC database) is the first step that DBA should do. It's content has at the end entries like:ORA-12012: error on auto execute of job 1241 ORA-00030: User session ID does not exist. ORA-06512: at line 1Checking of dba_jobs shows me that 2 jobs are trying to execute again and again to perform "kill session" for the session that doesn't exist!
Logical question was "How session doesn't exist if "sb" is still showing it? "
After checking, answer was very easy-because of bug in it's logic!
If you look in code of "sb" procedure, you'll find this part in definition of blocking session:
CURSOR c_w IS SELECT gvw.sid waiter_sid, gvw.inst_id waiter_inst_id FROM gv$lock gvb, gv$lock gvw, gv$session gvs WHERE (gvb.id1, gvb.id2) in ( SELECT id1, id2 FROM gv$lock WHERE request=0 INTERSECT SELECT id1, id2 FROM gv$lock WHERE lmode=0 ) AND gvb.id1=gvw.id1 AND gvb.id2=gvw.id2 AND gvb.request=0 AND gvw.lmode=0 AND gvb.sid=gvs.sid AND gvb.inst_id=gvs.inst_id ORDER BY gvb.CTIME desc ;All looks OK (and in fact it is!!) but in some real cases, like this one was, it was not properly used. Later in code I loop through that cursor without checking status column (that I do not have at all!) of that session and shows all sessions that are blocked by every cursor entry. So shortly, it was wrong because this cursors shows all sessions regardless their status (which can be "ACTIVE", "INACTIVE", "KILLED" etc.) and some of them might get STATUS = 'KILLED'!
Beside this there was a problem with waiters that are mixed because there was no blockers SID passed as an parameter. So this is why I completely removed c_w cursor and put all in one c_waiters which has two parameters:
CURSOR c_waiters (p_blocker_waiter_sid gv$lock.sid%TYPE, p_blockers_waiter_inst_id gv$lock.inst_id%TYPE) IS SELECT nvl(username,'Blocker ???') waiter_user, gvl.sid, gvs.serial# serial, gvl.inst_id, gvl.ctime , gvs.status STATUS, module, decode(gvl.request, 0, 'None', 1, 'NoLock', 2, 'Row-Share', 3, 'Row-Exclusive', 4, 'Share-Table', 5, 'Share-Row-Exclusive', 6, 'Exclusive', 'Nothing-' ) lock_req, action FROM gv$lock gvl, gv$session gvs WHERE gvl.request>0 AND gvl.sid=gvs.sid AND gvl.inst_id=gvs.inst_id AND gvs.blocking_session=p_blocker_waiter_sid AND gvl.inst_id=p_blockers_waiter_inst_id ORDER BY gvl.CTIME desc ;And this is exactly what happened in this case. First submitted job really killed that blocking session ("mark for kill") and other two hasn't find anything to kill so they were trying to execute and after some time gone to broken status after n retries.
To make more clear, let me show you the same situation with added STATUS field:
Blocker Inst SID Serial [sec] Lock Type Status Module ---------------------------------------------------------------------------------------------------- 1. ANAL 1 282 6661 7981 Transaction KILLED ZLSA 2 372 126 1128 Exclusive ACTIVE SLPE 2 368 48 6836 Exclusive ACTIVE DRI2 3 237 232 6727 Exclusive ACTIVE 2. DRI2 3 237 232 6724 Transaction ACTIVE DRIA 2 355 195 216 Exclusive ACTIVE NESE 3 114 80 5527 Exclusive ACTIVE 3. NESE 3 114 80 5521 Transaction ACTIVE DRI2 2 379 328 388 Share-Table ACTIVE DRI2 3 115 269 5439 Share-Table ACTIVE DRI2 3 423 619 3529 Exclusive ACTIVE DRI2 2 373 108 5952 Share-Table ACTIVE NESE 1 303 2811 4870 Exclusive ACTIVE DRLO 3 205 749 2862 Exclusive ACTIVE NESE 3 146 220 4766 Exclusive ACTIVENow this all looks reasonable...but...then come the second bug. The second bug in sb procedure was that "kill statement" was always constructed from first blocker in list. And this was again wrong because this session may be killed and then this kill command would produce an error!
The solution
When I realize that, I rewrite a new sb procedure:/* --------------------------------------------------------------------------- Filename: check_bloking_session.prc CR/TR# : Purpose : Find blocking sessions and create release statements for RAC and non RAC environment Date : 08.07.2005. Author : Damir Vadas, damir.vadas@gmail.com Remarks : Tested on 10g only (should work on 11g as well and lower versions) Changes (DD.MM.YYYY, Name, CR/TR#): 11.12.2009, Damir Vadas added p_to and p_cc as parameters to avoid bad header encodding on hac server (no success!) 30.12.2009, Damir Vadas added checking (LENGTH(s_mail_text)+LENGTH(p_text)+1 <= 4000) because "ORA-06502: PL/SQL: numeric or value error: character string buffer too small" 06.01.2010, Damir Vadas added 32k size of message and exception log handler 30.04.2010, Damir Vadas fixed bug with showing last blocker not first one in kill statement added STATUS column to show if session is killed (needed for previous fixed) 01.09.2010, Damir Vadas fixed bug blockers and waiters to be 11g compatible --------------------------------------------------------------------------- */ set serveroutput on size 123456 set linesize 140 set pagesize 100 set trimout on declare cursor c_blockers IS SELECT DISTINCT nvl(username,'BLOCKER ???') blocker_user, gvl.sid, gvs.serial# serial, gvl.inst_id, gvl.ctime , gvs.status STATUS, module, action, decode(gvl.type, 'MR', 'Media_recovery', 'RT', 'Redo_thread', 'UN', 'User_name', 'TX', 'Transaction', 'TM', 'Dml', 'UL', 'PLSQL User_lock', 'DX', 'Distrted_Transaxion', 'CF', 'Control_file', 'IS', 'Instance_state', 'FS', 'File_set', 'IR', 'Instance_recovery', 'ST', 'Diskspace Transaction', 'IV', 'Libcache_invalidation', 'LS', 'LogStaartORswitch', 'RW', 'Row_wait', 'SQ', 'Sequence_no', 'TE', 'Extend_table', 'TT', 'Temp_table', 'Nothing-' ) lock_type FROM gv$lock gvl, gv$session gvs WHERE gvl.request=0 AND gvl.block=1 AND gvl.sid=gvs.sid AND gvl.inst_id=gvs.inst_id ORDER BY CTIME desc ; CURSOR c_waiters (p_blocker_waiter_sid gv$lock.sid%TYPE, p_blockers_waiter_inst_id gv$lock.inst_id%TYPE) IS SELECT nvl(username,'WAITER ???') waiter_user, gvl.sid, gvs.serial# serial, gvl.inst_id, gvl.ctime , gvs.status STATUS, module, decode(gvl.request, 0, 'None', 1, 'NoLock', 2, 'Row-Share', 3, 'Row-Exclusive', 4, 'Share-Table', 5, 'Share-Row-Exclusive', 6, 'Exclusive', 'Nothing-' ) lock_req, action FROM gv$lock gvl, gv$session gvs WHERE gvl.request>0 AND gvl.sid=gvs.sid AND gvl.inst_id=gvs.inst_id AND gvs.blocking_session=p_blocker_waiter_sid AND gvl.inst_id=p_blockers_waiter_inst_id ORDER BY gvl.CTIME desc ; bHasAny boolean; i NUMBER := 0; v_sid NUMBER := 0; v_serial NUMBER := 0; v_first_sid NUMBER := -1; v_first_serial NUMBER := -1; v_instance BINARY_INTEGER := 0; v_first_instance BINARY_INTEGER := 0; db_ver VARCHAR2(128); db_ver2 VARCHAR2(128); s_job_command VARCHAR2 (4000); s_mail_text VARCHAR2 (32000); s_subject VARCHAR2 (256); s_smtp_out_server VARCHAR2 (256); s_host_name v$instance.host_name%TYPE; s_db_name v$database.name%TYPE; s_instance_name v$instance.instance_name%TYPE; s VARCHAR2(1024); PROCEDURE add_message_line (p_text VARCHAR2) is BEGIN -- for proceduire version with mail part -- s_mail_text := s_mail_text || p_text || chr(10); dbms_output.put_line (p_text); END; begin dbms_utility.db_version(db_ver,db_ver2); add_message_line ('Oracle version: '||db_ver|| ' ('||db_ver2||')'); bHasAny := FALSE; FOR v_blockers in c_blockers loop bHasAny := TRUE; IF (i=0) THEN add_message_line (rpad('Blocker',12,' ')||' '||' Inst '||rpad('SID',6,' ')||' '||rpad('Serial',8,' ')||' '||' '||rpad('[sec]',6,' ')||' '|| lpad('Lock Type',10,' ')||' '|| lpad('Status',14, ' ') || lpad('Module',13,' ') ); add_message_line (rpad('-',120,'-')); END IF; v_sid := v_blockers.sid; v_serial := v_blockers.serial; v_instance := v_blockers.inst_id ; -- only first blocker (who is not killed!) is important! IF v_blockers.status != 'KILLED' AND v_first_sid = -1 THEN v_first_sid := v_blockers.sid; v_first_serial := v_blockers.serial; v_first_instance := v_blockers.inst_id ; END IF; add_message_line (LPAD(to_char(i+1),3,' ')||'. '||rpad(v_blockers.blocker_user,12,' ')||' '||rpad(v_blockers.inst_id,4,' ')||' '||rpad(v_blockers.SID,8,' ' )||' '||lpad(v_blockers.serial,5,' ')||' '||lpad(v_blockers.CTIME,7,' ')||' '|| lpad(v_blockers.lock_type,13,' ')||lpad(v_blockers.status,14,' ')||lpad(nvl(v_blockers.module,'?'),15,' ') ); FOR v_waiters IN c_waiters (v_blockers.sid,v_blockers.inst_id ) LOOP add_message_line (lpad(chr(9),9)||rpad(v_waiters.waiter_user,10,' ')||' '||rpad(to_char(v_waiters.inst_id),4,' ')||rpad(to_char(v_waiters.sid),4,' ')|| ' '||lpad(to_char(v_waiters.serial),12,' ')||' ' ||' '|| ' ' ||lpad(to_char(v_waiters.ctime),6,' ')||' '|| lpad(to_char(v_waiters.lock_req),13,' ')||' '|| lpad(v_blockers.status,14,' ')|| ' '|| lpad(to_char(nvl(v_waiters.module,'?')),15,' ') ); END LOOP; IF bHasAny THEN i:=i+1; END IF; END LOOP; -- show any kind of blocker (killed one also...as info) IF bHasAny THEN -- show blocking rows info -- add_message_line (chr(9)); -- show_blocked_records (s_blocked_rows); -- dbms_output.put_line(s_blocked_rows); -- add_message_line (s_blocked_rows); -- for blockers that are valid (not killed!) create kill statement IF ( v_first_sid > 0 ) THEN SELECT instance_name, host_name INTO s_instance_name, s_host_name FROM v$instance; SELECT name INTO s_db_name FROM v$database; add_message_line (chr(9)); add_message_line ('To kill first from the list, perform: '); add_message_line (chr(9)); IF db_ver like '10%' THEN add_message_line ('NON RAC (or RAC logged on that node):'); add_message_line ('---------------------------------'); add_message_line ('ALTER SYSTEM DISCONNECT SESSION '''||v_first_sid||','||v_first_serial||''' IMMEDIATE;'); add_message_line ('ALTER SYSTEM KILL SESSION '''||v_first_sid||','||v_first_serial||''' IMMEDIATE;'); add_message_line (chr(9)); add_message_line (chr(9)); s := '''begin execute immediate ''''ALTER SYSTEM KILL SESSION '''''''''||v_first_sid||','||v_first_serial||''''''''' IMMEDIATE''''; end; '''; add_message_line ('RAC (logged on any node) :'); add_message_line ('--------------------------'); s_job_command := 'declare'|| chr(10)|| ' v_job binary_integer;'|| chr(10)|| 'begin'|| chr(10)|| ' DBMS_JOB.submit ('|| ' job =>v_job'||chr(10)|| ',what =>'||s||chr(10)|| ',instance=>'||v_first_instance||chr(10)|| '); '||chr(10)|| ' commit;'||chr(10)|| 'end;'|| chr(10) || '/' ; add_message_line (s_job_command); ELSIF db_ver like '11%' THEN add_message_line (chr(9)); add_message_line ('ALTER SYSTEM DISCONNECT SESSION '''||v_first_sid||','||v_first_serial||',@'||v_first_instance||''' IMMEDIATE;'); add_message_line ('ALTER SYSTEM KILL SESSION '''||v_first_sid||','||v_first_serial||',@'||v_first_instance||''' IMMEDIATE;'); ELSE add_message_line (chr(9)); add_message_line ('Untested version... Try to logon as priveledged user on node ('||s_host_name||') and perform:'); add_message_line ('ALTER SYSTEM DISCONNECT SESSION '''||v_first_sid||','||v_first_serial||''' IMMEDIATE;'); add_message_line ('ALTER SYSTEM KILL SESSION '''||v_first_sid||','||v_first_serial||''' IMMEDIATE;'); END IF; ELSE add_message_line ('All valid blocker sessions allready killed. Try kill with OS level command!'); END IF; ELSE add_message_line ('No blocking sessions found!'); END IF; EXCEPTION WHEN OTHERS THEN -- auto_log_error_core('"check_blocking_session" procedure. '); null; end; /
Now when you run the sb procedure, the same situation will look like:
SQL> @sb Oracle version: 10.1.0.4.0 (10.1.0.4.0) Blocker Inst SID Serial [sec] Lock Type Status Module ---------------------------------------------------------------------------------------------------- 1. ANAL 1 282 6661 7981 Transaction KILLED ZLSA 2 372 126 1128 Exclusive ACTIVE SLPE 2 368 48 6836 Exclusive ACTIVE DRI2 3 237 232 6727 Exclusive ACTIVE 2. DRI2 3 237 232 6724 Transaction ACTIVE DRIA 2 355 195 216 Exclusive ACTIVE NESE 3 114 80 5527 Exclusive ACTIVE 3. NESE 3 114 80 5521 Transaction ACTIVE DRI2 2 379 328 388 Share-Table ACTIVE DRI2 3 115 269 5439 Share-Table ACTIVE DRI2 3 423 619 3529 Exclusive ACTIVE DRI2 2 373 108 5952 Share-Table ACTIVE NESE 1 303 2811 4870 Exclusive ACTIVE DRLO 3 205 749 2862 Exclusive ACTIVE NESE 3 146 220 4766 Exclusive ACTIVE To kill first from the list, perform: NON RAC (or RAC logged on that node): --------------------------------- ALTER SYSTEM DISCONNECT SESSION '237,232' IMMEDIATE; ALTER SYSTEM KILL SESSION '237,232' IMMEDIATE; RAC (logged on any node) : -------------------------- declare v_job binary_integer; begin DBMS_JOB.submit ( job =>v_job ,what =>'begin execute immediate ''ALTER SYSTEM KILL SESSION ''''237,232'''' IMMEDIATE''; end; ' ,instance=>3 ); commit; end; / PL/SQL procedure successfully completed. SQL>So it would construct kill statement for second blocker and skip first blocker which is "marked for kill". And now this is regular.
Kill "marked for kill"
If you look in oracle theory, "KILLED" status does not affect the work performed by the command, but it returns control back to the current session immediately, rather than waiting for confirmation of the kill. If the marked session persists for some time you may consider killing the process at the operating system level.So practically it was still running in Oracle and the only way to immediately kill this session was to perform from OS level command!
Just to remind you differences between DISCONNECT and KILL SESSION
- IMMEDIATE clause with KILL SESSION
instruct Oracle to roll back ongoing transactions, release all session locks, recover the entire session state, and return control to you immediately.
- IMMEDIATE clause with DISCONNECT SESSION
disconnects the session and recovers the entire session state immediately, without waiting for ongoing transactions to complete.
Be proactive
According this do not forget to replace other part of solution (proactive checking of blocking session-"check_blocking_session.prc") which I place in job with 5 minutes schedule and run automatically. Here is final version for that cause.CREATE OR REPLACE procedure check_blocking_session ( /* $Header: check_blocking_session.prc 1.12 10/19/2010 20:45 damirv $ */ /* --------------------------------------------------------------------------- Copyright(C) 1995-2009 TEB Informatika d.o.o. Filename: check_blocking_session.prc CR/TR# : Purpose : Find blocking sessions and create release statements for RAC and non RAC environment Date : 08.07.2005. Author : Damir Vadas, damir.vadas@gmail.com Remarks : Tested on 10g only (should work on 11g as well and lower versions) Changes (DD.MM.YYYY, Name, CR/TR#): 11.12.2009, Damir Vadas added p_to and p_cc as parameters to avoid bad header encodding on hac server (no success!) 30.12.2009, Damir Vadas added checking (LENGTH(s_mail_text)+LENGTH(p_text)+1 <= 4000) because "ORA-06502: PL/SQL: numeric or value error: character string buffer too small" 06.01.2010, Damir Vadas added 32k size of message and exception log handler 30.04.2010, Damir Vadas fixed bug with showing last blocker not first one in kill statement added STATUS column to show if session is killed (needed for previous fixed) 01.09.2010, Damir Vadas fixed bug blockers and waiters to be 11g compatible 20.09.2010, Damir Vadas Reunited logic for script/procedure 19.10.2010, Damir Vadas Removed p_smtp_out_server VARCHAR2 DEFAULT 'mail.iskon.hr' declaration --------------------------------------------------------------------------- */ p_sender VARCHAR2 DEFAULT 'dba@teb-informatika.hr', p_to VARCHAR2 DEFAULT 'damir.vadas@teb-informatika.hr', p_cc VARCHAR2 DEFAULT null, p_subject VARCHAR2 DEFAULT 'Blocking session occur!', p_send_mail BOOLEAN DEFAULT TRUE) AS b_is_script BOOLEAN DEFAULT FALSE; CURSOR c_blockers IS SELECT DISTINCT NVL(username,'BLOCKER ???') blocker_user, gvb.sid, gvs.serial# serial, gvb.inst_id, gvb.ctime , gvs.status STATUS, module, action, NVL ( decode(gvb.type, 'AE', 'Edition Lock', -- Prevent Dropping an edition in use 11g 'BL','Buffer hash table', 'CF','Control File Transaction', 'CI','Cross Instance Call', 'CS','Control File Schema', 'CU','Bind Enqueue', 'DF','Data File', 'DL','Direct-loader index-creation', 'DM','Mount/startup db primary/secondary instance', 'DR','Distributed Recovery Process', 'DX','Distributed Transaction Entry', 'FI','SGA Open-File Information', 'FS','File Set', 'IN','Instance Number', 'IR','Instance Recovery Serialization', 'IS','Instance State', 'IV','Library Cache InValidation', 'JQ','Job Queue', 'KK','Redo Log "Kick"', 'LS','Log Start/Log Switch', 'MB','Master Buffer hash table', 'MM','Mount Definition', 'MR','Media Recovery', 'PF','Password File', 'PI','Parallel Slaves', 'PR','Process Startup', 'PS','Parallel Slaves Synchronization', 'RE','USE_ROW_ENQUEUE Enforcement', 'RT','Redo Thread', 'RW','Row Wait', 'SC','System Commit Number', 'SH','System Commit Number HWM', 'SM','SMON', 'SQ','Sequence Number', 'SR','Synchronized Replication', 'SS','Sort Segment', 'ST','Space Transaction', 'SV','Sequence Number Value', 'TA','Transaction Recovery', 'TD','DDL enqueue', 'TE','Extend-segment enqueue', 'TM','DML enqueue', 'TO', 'Temp Object', --Sync DDL and DML operations on a temp object 'TS','Temporary Segment', 'TT','Temporary Table', 'TX','Transaction', 'UL','User-defined Lock', 'UN','User Name', 'US','Undo Segment Serialization', 'WL','Being-written redo log instance', 'WS','Write-atomic-log-switch global enqueue', 'XA','Instance Attribute', 'XI','Instance Registration', decode(substr(gvb.type,1,1), 'L','Library Cache ('||substr(gvb.type,2,1)||')', -- Library cache lock instance lock (A..P = namespace) 'N','Library Cache Pin ('||substr(gvb.type,2,1)||')', -- Library cache pin instance (A..Z = namespace) 'Q','Row Cache ('||substr(gvb.type,2,1)||')', -- Row cache instance (A..Z = cache) '????' ), 'Bl. lt:"'||gvb.type||'"' ), 'NULL Block Type' ) lock_type FROM gv$lock gvb, gv$lock gvw, gv$session gvs WHERE (gvb.id1, gvb.id2) in ( SELECT id1, id2 FROM gv$lock WHERE request=0 INTERSECT SELECT id1, id2 FROM gv$lock WHERE lmode=0 ) AND gvb.id1=gvw.id1 AND gvb.id2=gvw.id2 AND gvb.request=0 AND gvw.lmode=0 AND gvb.sid=gvs.sid AND gvb.inst_id=gvs.inst_id ORDER BY CTIME desc ; -- waiters CURSOR c_waiters (p_blocker_waiter_sid gv$lock.sid%TYPE, p_blockers_waiter_inst_id gv$lock.inst_id%TYPE) IS SELECT NVL(username,'WAITER ???') waiter_user, gvs.sid, gvs.serial# serial, gvs.inst_id, gvw.ctime , gvs.status STATUS, module, action, decode(gvw.request, 0, 'None', 1, 'NoLock', 2, 'Row-Share', 3, 'Row-Exclusive', 4, 'Share-Table', 5, 'Share-Row-Exclusive', 6, 'Exclusive', 'Nothing-' ) lock_req FROM gv$lock gvw, gv$session gvs WHERE 1=1 AND gvw.request>0 AND gvw.lmode=0 AND gvw.sid=gvs.sid AND gvw.inst_id=gvs.inst_id AND gvs.blocking_session=p_blocker_waiter_sid AND gvs.blocking_instance=p_blockers_waiter_inst_id ORDER BY CTIME desc ; bHasAny boolean; i NUMBER := 0; v_sid NUMBER := 0; v_serial NUMBER := 0; v_first_sid NUMBER := -1; v_first_serial NUMBER := -1; v_instance BINARY_INTEGER := 0; v_first_instance BINARY_INTEGER := 0; db_ver VARCHAR2(128); db_ver2 VARCHAR2(128); s_job_command VARCHAR2 (4000); s_mail_text VARCHAR2 (32000); s_subject VARCHAR2 (256); s_smtp_out_server VARCHAR2 (256); s_host_name v$instance.host_name%TYPE; s_db_name v$database.name%TYPE; s_instance_name v$instance.instance_name%TYPE; s VARCHAR2(1024); s_blocked_rows VARCHAR2(4000); l_temp VARCHAR2(64); FUNCTION add_indent_char RETURN VARCHAR2 IS BEGIN IF b_is_script THEN RETURN chr(255); ELSE RETURN chr(9); END IF; END; PROCEDURE add_message_line (p_text VARCHAR2) is BEGIN dbms_output.put_line (p_text); s_mail_text := s_mail_text || p_text || chr(10); END; BEGIN b_is_script := NOT p_send_mail; add_message_line (chr(9)); dbms_utility.db_version(db_ver,db_ver2); add_message_line ('Oracle version: '||db_ver|| ' ('||db_ver2||')'); dbms_output.put_line (chr(9)); bHasAny := FALSE; FOR r_blockers in c_blockers loop bHasAny := TRUE; IF (i=0) THEN add_message_line (chr(9)); add_message_line (rpad('Blocker',17,' ')|| lpad('Inst',4,' ')|| lpad('SID',7,' ')|| lpad('Serial',8,' ')|| lpad('[sec]',10,' ')|| lpad('Lock Type/Req. lock',30,' ')|| lpad('Status',13, ' ') || lpad('Module',25,' ') ); add_message_line (rpad('-',120,'-')); END IF; v_sid := r_blockers.sid; v_serial := r_blockers.serial; v_instance := r_blockers.inst_id ; -- only first blocker (who is not killed!) is important! IF r_blockers.status != 'KILLED' AND v_first_sid = -1 THEN v_first_sid := r_blockers.sid; v_first_serial := r_blockers.serial; v_first_instance := r_blockers.inst_id ; END IF; -- print blockers sessions add_message_line (rpad((to_char(i+1)||'.'||r_blockers.blocker_user),17,' ')|| lpad(r_blockers.inst_id,4,' ')|| lpad(r_blockers.SID,7,' ' )|| lpad(r_blockers.serial,8,' ')|| lpad(r_blockers.CTIME,10,' ')|| lpad(r_blockers.lock_type,30,' ')|| lpad(r_blockers.status,13,' ')|| lpad(nvl(r_blockers.module,'?'),25,' ') ); -- print waiters session for current blocker FOR r_waiters IN c_waiters (r_blockers.sid ,r_blockers.inst_id) LOOP IF b_is_script THEN l_temp := add_indent_char; -- || add_indent_char; ELSE l_temp := add_indent_char; END IF; add_message_line (l_temp|| rpad(r_waiters.waiter_user,17,' ')|| lpad(r_waiters.inst_id,4,' ')|| lpad(r_waiters.sid,7,' ')|| lpad(r_waiters.serial,8,' ')|| lpad(r_waiters.ctime,10,' ')|| lpad(r_waiters.lock_req,30,' ')|| lpad(r_waiters.status,13,' ')|| lpad(nvl(r_waiters.module,'?'),25,' ') ); END LOOP; IF bHasAny THEN i:=i+1; END IF; END LOOP; IF (v_first_serial = 1 ) THEN add_message_line (chr(9)); add_message_line ('Warning!'); add_message_line ('-------------------------------------------------------------------------------'); add_message_line ('Blocking sessions with serial=1 are core ORACLE PROCESSES.'); add_message_line ('Killing them without REAL NEED and DOUBLE CHECKING may dangerous your instance!'); add_message_line ('-------------------------------------------------------------------------------'); add_message_line (chr(9)); END IF; -- show any kind of blocker (killed one also...as info) IF bHasAny THEN IF NOT b_is_script THEN -- show blocking rows info add_message_line (chr(9)); add_message_line ('--------------- Blocking records information ---------------'); add_message_line (chr(9)); show_blocked_records (s_blocked_rows); IF s_blocked_rows is null THEN add_message_line(chr(9)||chr(9)||'No locked records found.'); ELSE s_mail_text := s_mail_text || s_blocked_rows || chr(10); END IF; END IF; -- for blockers that are valid (not killed!) create kill statement SELECT instance_name, host_name INTO s_instance_name, s_host_name FROM v$instance; SELECT name INTO s_db_name FROM v$database; s_subject := p_subject || ' '|| s_db_name || '('|| s_instance_name || ') on ' || s_host_name ||''; add_message_line (chr(9)); IF (v_first_sid > 0 ) THEN add_message_line ('To kill first from the list, perform: '); add_message_line (chr(9)); IF db_ver like '10%' THEN add_message_line ('NON RAC (or RAC logged on that node):'); add_message_line ('---------------------------------'); add_message_line ('ALTER SYSTEM DISCONNECT SESSION '''||v_first_sid||','||v_first_serial||''' IMMEDIATE;'); add_message_line ('ALTER SYSTEM KILL SESSION '''||v_first_sid||','||v_first_serial||''' IMMEDIATE;'); add_message_line (chr(9)); add_message_line (chr(9)); s := '''begin execute immediate ''''ALTER SYSTEM DISCONNECT SESSION '''''''''||v_first_sid||','||v_first_serial||''''''''' IMMEDIATE''''; end; '''; add_message_line ('RAC (logged on any node) :'); add_message_line ('--------------------------'); s_job_command := 'declare'|| chr(10)|| ' v_job binary_integer;'|| chr(10)|| 'begin'|| chr(10)|| ' DBMS_JOB.submit ('|| ' job =>v_job'||chr(10)|| ',what =>'||s||chr(10)|| ',instance=>'||v_first_instance||chr(10)|| '); '||chr(10)|| ' commit;'||chr(10)|| 'end;'|| chr(10) || '/' ; add_message_line (s_job_command); ELSIF db_ver like '11%' THEN add_message_line (chr(9)); add_message_line ('ALTER SYSTEM DISCONNECT SESSION '''||v_first_sid||','||v_first_serial||',@'||v_first_instance||''' IMMEDIATE;'); add_message_line ('ALTER SYSTEM KILL SESSION '''||v_first_sid||','||v_first_serial||',@'||v_first_instance||''' IMMEDIATE;'); ELSE add_message_line (chr(9)); add_message_line ('Untested version... Try to logon as priveledged user on node ('||s_host_name||') and perform:'); add_message_line ('ALTER SYSTEM DISCONNECT SESSION '''||v_first_sid||','||v_first_serial||''' IMMEDIATE;'); add_message_line ('ALTER SYSTEM KILL SESSION '''||v_first_sid||','||v_first_serial||''' IMMEDIATE;'); END IF; ELSE add_message_line ('All valid blocker sessions allready killed. Try kill with OS level command!'); END IF; IF NOT b_is_script THEN s_mail_text := s_mail_text || (chr(9)) || chr(10); s_mail_text := s_mail_text || ('This is automated message so, do not reply to this mail.') || chr(10); s_mail_text := s_mail_text || ('Regards,')|| chr(10); s_mail_text := s_mail_text || ('Your auto DBA d.o.o.')|| chr(10); s_mail_text := s_mail_text || (CHR(9)); select to_char(sysdate, 'dd.mm.yyyy hh24:mi:ss') into s from dual; s_mail_text := s_mail_text || ('Zagreb, '||s); -- if you run this script from sqlplus then there is no need for this (p_send_mail=FALSE) IF p_send_mail THEN SYS.UTL_MAIL.send(sender => p_sender, recipients => p_to, cc => p_cc, subject => s_subject, message => s_mail_text, mime_type => 'text/plain; charset=us-ascii'); END IF; END IF; ELSE dbms_output.put_line ('No blocking sessions found!'); END IF; /* EXCEPTION WHEN OTHERS THEN dbms_output.put_line(SQLERRM); */ END; /
The end
If you get mail now for blocking session and all blocker sessions are killed, then you'll see output like this:Blocker Inst SID Serial [sec] Lock Type Status Module ---------------------------------------------------------------------------------------------------- 1. ANAL 1 282 6661 7981 Transaction KILLED ZLSA 2 372 126 1128 Exclusive ACTIVE SLPE 2 368 48 6836 Exclusive ACTIVE DRI2 3 237 232 6727 Exclusive ACTIVE 2. DRI2 3 237 232 6724 Transaction KILLED DRIA 2 355 195 216 Exclusive ACTIVE All valid blocker sessions already killed. Try kill with OS level command!What is right and correct message!
For the end here is how this rewrite version produce result. Here is mail that you'll get in a case of blocking session:
Oracle version: 11.2.0.1.0 (11.1.0.0.0) Blocker Inst SID Serial [sec] Lock Type Status Module ------------------------------------------------------------------------------------------------------------------------ 1. DAMIRV 1 20 6 54 Transaction INACTIVE SQL*Plus DAMIRV 1 141 40 26 Exclusive INACTIVE SQL*Plus 2. DAMIRV 1 141 4 42 Transaction ACTIVE SQL*Plus DAMIRV 1 144 58 16 Exclusive ACTIVE SQL*Plus --------------- Blocking records information --------------- DAMIRV (1 '20,68') DAMIRV 1 141,40 26 sec Exclusive ACTIVE SQL*Plus SELECT C1 FROM DAMIRV.X WHERE rowid = 'AAAVoXAAGAAAAS1AAA' ; (delete x ) DAMIRV (1 '141,40') DAMIRV 1 144,58 16 sec Exclusive ACTIVE SQL*Plus SELECT C1 FROM DAMIRV.XX WHERE rowid = 'AAAVoYAAGAAAAS8AAA' ; (delete xx where c1='1' ) To kill first from the list, perform: ALTER SYSTEM DISCONNECT SESSION '20,68,@1' IMMEDIATE; ALTER SYSTEM KILL SESSION '20,68,@1' IMMEDIATE; -------------------------------------------------------- This is automated message so, do not reply to this mail. Regards, Your auto DBA d.o.o. Zagreb, 21.05.2010 23:39:33As you see now you can see which row is causing block and statement that is blocked in waiter session-looks promising!
Keep in mind that these two procedures are called in two separate calls, so sometimes second call may shown no blocking records ... and they do not exists in second case call and was existing in first time call.
Cheers!
Veoma korisno!
ReplyDeleteKako rjesavas slucajeve, kada ti session nije blokiran od nekog drugog sessiona, ali ipak "stoji" (hungs) sa wait eventom "latch: cache buffers chains" ?
"latch: cache buffers chains"
ReplyDeleteNepresušno vrelo Oracle problema.
Preopširno ali mislim da je dobro pokriveno na WEB-u i zaista ovisi od slučaja do slučaja kao i od verzije i mnoštva detalja.
U svakom slučaju "pain in the ass" ako se trebaš s time "bakčati"...
Preporuka za početak:
http://blog.tanelpoder.com/2009/08/27/latch-cache-buffers-chains-latch-contention-a-better-way-for-finding-the-hot-block/
Ako nije i upgrade...i onda pooznati bug:
Bug 5009307: HIGH WAITS ON GC CR GRANT CONGESTED EVENT AFTER 10G UPGRADE
Dear Damir,
ReplyDeleteI tried to run check_blocking_session.prc script but there was no locked information display. This is what I did:
On first session
================
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
histdb1
SQL> select * from t1;
C1
----------
5
2
SQL> update t1 set c1=8 where c1=5;
1 row updated.
On other session
================
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
histdb2
SQL> update t1 set c1=10 where c1=5;
On new session
==============
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
histdb1
SQL> @check_blocking_session.prc
Oracle version: 10.2.0.4.0 (10.1.0.2.0)
No blocking sessions found!
PL/SQL procedure successfully completed.
Please help why the program not display locked session.
Regards,
Jack
Can you prove from Toad that you are prodcing blocking session really?
ReplyDeleteI have checked "check_bloking_session.prc" script and works as expected. Try firstly that script.
ReplyDelete"procedure check_blocking_session" is working in mine place OK.
I think you are not generating real blocking session or you are performing execution with users that has not enough privileges. Second script is sending mail.
Hello Damir,
ReplyDeleteis it possible to turn your script to be more aggressive, let say. dont just send email but acctually do the Killing itself if the blocker is reaching 60secs. , then send email just for info.
sometimes these block can block a very central record like Sequence record, and happens during night while system need to go on .
please help enlight me, thanks a lot
@y.g.
DeleteYes.
Just place text content of mail which is sent to you inside "execute immediste ''" command ant it will kill immediately.
But then do make soem additional checking that this is not core oracle process or any sys, system session!
Rg,
Damir
@Damir, I do it by checking the (session.OSUSER is not 'oracle' OR OSUSER is null) to prevent from killing oracle process.
Deletemany many thanks to you...
y.g.
This comment has been removed by the author.
ReplyDeleteHi,
ReplyDeleteJust recently I have the unfortunate event of having a blocking session from OEM that has (J001) DBMS_SCHEDULER : GATHER_STATS_JOB blocking MMON_SLAVE : Auto-Purge Slave A.
I run both versions of sb.sql. The one from part I shows the blocking session but the part II version doesn't? I can't understand why? This is on a 3-node RAC, the blocking session is on NODE 1 and the blocked session is on NDOE 3. I run both versions of the sb.sql script on all nodes and they both give the same result. That is, Part I version shows the blocking session as expected but the Part II version doesn't.
Not sure if this is something that you may want to test. Unfortunately as these are Oracle's background process, I can't reproduce on demand, the stats run from 10pm to 6am by default and the blocking session stop when the job was stopped. This is the first time that this blocking session happens. The DB is on 10.2.0.5.
are you saying that the version I gives you different result than version II?
ReplyDeleteif so, could you output result from both versions by mail?