Monday, November 9, 2009

Check database instance/listener is running (old fashion way)

There are many implementation of checking status of database instance and listeners. However there are more or less successful works, I had make mine script which has suite mine needs 100% on many database projects. Mine solution has following characteristics:
  • RAC and non RAC environment are covered (ASM instance are covered as well as "ordinary" Oracle database instances)
  • Support for unlimited number of instances on the same server.
  • In /home/oracle directory, $1.env file must exist. This is pure environment initialization file so that Oracle can works with wondered instance. In my case I have several env files - for each instance separate one. Example:
    /home/oracle/ASM.env
    /home/oracle/PROD.env
    /home/oracle/TEST.env
    This initialization is on the start of the script so change it to apply your need.
  • Script is run with only one parameter database name (i.e. ASM, PROD, DEV)
  • Script is looping in endless loop so checking is permanent regardless instance/listener is up or down
  • CHECK_PERIOD is variable where you define number of seconds before next loop of checking is started. Default is 200 seconds (adjust it as you need).
  • For the first time when start mail is sent to user defined with MAIL_TO and MAIL_CC variable with current status. This is very interesting to me to see when someone has started the server. To use mail feature mail command must exists on Linux (what is enabled by default installation).
  • When state of instance/listener is change to "DOWN", mail is sent immediately. Here is an example of mail for database:


    and for listener:

    where real names are deleted because of security reasons.
  • After that, when SEND_AGAIN_PERIOD is reached, (default 5*CHECK_PERIOD seconds) mail will be resend again as another warning. It will stop sending mail only when state change to UP! With easy calculation we come to at least 2 mail per instance/listener per hour in worst case scenario. With ability to place indefinite number of people in warning mail (MAIL_CC), this is fair enough to make a notice official and seen by everyone! Adjust timing and frequency to your need.
  • When listener/instance change status from DOWN to UP only one mail is sent, saying that instance/listener is started again. This mail will happened only once:



    After that, next mail will be sent only when instance/listener state change to DOWN. This make checking proactive to send mail only when things went bad!
  • With LAST_DB_ERROR_IN_SECONDS and LAST_LSNR_ERROR_IN_SECONDS variables script control differently listener and instance status.
  • This intelligent mail sending (according UP/DOWN state and vice verse) is a feature that I haven't seen in other script.
  • Script can be, of course, automatically started if you put in /etc/rc.d/rc.local file. Here is an example:
    "su - oracle -c "/oracle/export/rman_backup/scripts/check_instance ASM &"
    "su - oracle -c "/oracle/export/rman_backup/scripts/check_instance TEST &"
    "su - oracle -c "/oracle/export/rman_backup/scripts/check_instance PROD1 &"
    "su - oracle -c "/oracle/export/rman_backup/scripts/check_instance DEV &"
  •  You can start manually checking scripts as well, using screen Linux capability:
    screen
    check_instance ASM &
    check_instance PROD1 &
    check_instance TEST &
    and then Ctrl+A+D (inside screen) to exit screen and leave scripts running.
  • Each script can be manually terminated and leave other instance/listener checking active with no influence.
  • This is Linux bash script and it will not run on Windows
Here is the code of script:
############################################################################
# NAME    : check_instance ver 1.1
#
# PURPOSE : check_instance and listener status on RAC and non RAC configuration
#
# Date    : 08.09.2008
# Author  : Damir Vadas
#
# Remarks : Should be run on each node in RAC
#           mail support shouold be avilable on OS for oracle user
#           example of ussage:
#           screen
#                /u01/rman_backup/scripts/check_instance ASM &
#                /u01/rman_backup/scripts/check_instance DB1 &
#                /u01/rman_backup/scripts/check_instance DB2 &
#                /u01/rman_backup/scripts/check_instance DB3 &
#
#           and then Ctrl+A+D (inside screen) to exit screen
#
#           or (permanent solution-boot activation)
#           in /etc/rc.d/rc.local
#                su - oracle -c "/oracle/export/rman_backup/scripts/check_instance ASM &"
#                ...
#                su - oracle -c "/oracle/export/rman_backup/scripts/check_instance DB3 &"
#        
# Changes (DD.MM.YYYY, Name, CR/TR#):
#          09.09.2008, Damir Vadas
#                      Changed wrong message in mail
#          10.09.2008, Damir Vadas
#                      Changed logic to send positive mail when instance was previously down
#                      All instances are checked by opened status!
#                      Send_mail is now in function
#          09.11.2009, Damir Vadas
#                      Added listener check and second script parameter as listener port
#############################################################################

# mail definition
MAIL_TO="damir.vadas@gmail.hr"
MAIL_CC="-c user1_cc@gmail.com user2_cc@gmail.com"

function bad_usage ()
{
  echo "Usage:"
  echo "$0 '$1' , where parameter is name of database on this node (i.e. DB1, ASM, DB2 etc.)"
  echo "Error raised (retvalue=$retvalue) for passed parameter '$1' !" | mail -s "`date`  check_instance start script error!"  ${MAIL_TO} ${MAIL_CC}
  exit 1
}

if [ "$1" = "" ]; then
  bad_usage
else
  DB_NAME=$1
fi

. /home/oracle/$1.env
retvalue=$?
if [ $retvalue -ne 0 ]; then
  bad_usage
fi

# initial variable settings for listener as well for database
LAST_DB_ERROR_IN_SECONDS=0
LAST_LSNR_ERROR_IN_SECONDS=0

# Default time interval of checking (seconds)
CHECK_PERIOD=200

# Every 1000 seconds check again when last instance/listener state has changed
let SEND_AGAIN_PERIOD=5*${CHECK_PERIOD}

function f_send_mail ()
{ if [ "$2" = "DB" ]; then
    if [ "$1" = "DOWN" ]; then
      if [ ${LAST_DB_ERROR_IN_SECONDS} -eq 0 ]; then
        # this is firt time for warning mail...
        echo "This is automated message so, please do not reply!" | mail -s "`date`  ${ORACLE_SID} is DOWN!" ${MAIL_TO} ${MAIL_CC}
      else
        # n-th warning mail is going after multiple SEND_AGAIN_PERIOD seconds
        let SEND_AGAIN=${LAST_DB_ERROR_IN_SECONDS}%${SEND_AGAIN_PERIOD}
        if [ ${SEND_AGAIN} -eq 0 ]; then
          echo "This is automated message so, please do not reply!" | mail -s "`date`  ${ORACLE_SID} is DOWN at least ${LAST_DB_ERROR_IN_SECONDS} seconds!"  ${MAIL_TO}  ${MAIL_CC}
        fi
      fi
      let LAST_DB_ERROR_IN_SECONDS=${LAST_DB_ERROR_IN_SECONDS}+${CHECK_PERIOD}
    else
      if [ "$1" = "UP" ]; then 
        echo "This is automated message so, please do not reply!" | mail -s "`date`  ${ORACLE_SID} is UP!" ${MAIL_TO}  ${MAIL_CC}
        LAST_DB_ERROR_IN_SECONDS=0
      else
        echo "Bad first parameter passed to 'f_send_mail' function ($1 - DB). Exiting..."
        exit 1
      fi
    fi
  else
    if [ "$2" = "LSNR" ]; then 
      if [ "$1" = "DOWN" ]; then
        if [ ${LAST_LSNR_ERROR_IN_SECONDS} -eq 0 ]; then
          # this is firt time for warning mail...
          echo "This is automated message so, please do not reply!" | mail -s "`date`  ${ORACLE_SID} LISTENER is DOWN!" ${MAIL_TO} ${MAIL_CC}
        else
          # n-th warning mail is going after multiple SEND_AGAIN_PERIOD seconds
          let SEND_AGAIN=${LAST_LSNR_ERROR_IN_SECONDS}%${SEND_AGAIN_PERIOD}
          if [ ${SEND_AGAIN} -eq 0 ]; then
            echo "This is automated message so, please do not reply!" | mail -s "`date`  ${ORACLE_SID} LISTENER is DOWN at least ${LAST_LSNR_ERROR_IN_SECONDS} seconds!"  ${MAIL_TO}  ${MAIL_CC}
          fi
        fi
        let LAST_LSNR_ERROR_IN_SECONDS=${LAST_LSNR_ERROR_IN_SECONDS}+${CHECK_PERIOD}
      else
        if [ "$1" = "UP" ]; then 
          echo "This is automated message so, please do not reply!" | mail -s "`date`  ${ORACLE_SID} LISTENER is UP!" ${MAIL_TO}  ${MAIL_CC}
          LAST_LSNR_ERROR_IN_SECONDS=0
        else
          echo "Bad first parameter passed to 'f_send_mail' function ($1 - LSNR). Exiting..."
          exit 1
        fi
      fi
    else
      echo "Bad second parameter passed to 'f_send_mail' function ("$1" , "$2"). Exiting..."
      exit 1     
    fi
  fi
}

while true; do
  pslist="`ps -ef | grep pmon`"
  echo  ${pslist} | grep  "pmon_${ORACLE_SID}"  >> /dev/null 2>&1
  retvalue=$?
  # echo "retval=$retvalue"
  if [ $retvalue -eq 0  ] ; then
    INSTANCE_STATUS=`echo "select 'xX '||status from v\\$instance;" | sqlplus -s "/ as sysdba" | grep "xX" | awk '{print $2}'`
    # echo $INSTANCE_STATUS
    if [ "$DB_NAME" = "ASM" ]; then
      if [ "$INSTANCE_STATUS" = "STARTED" ]; then
        if [ ${LAST_DB_ERROR_IN_SECONDS} -gt 0 ]; then
          # previous LAST_DB_ERROR_IN_SECONDS > 0 => instance was down
          # let us send mail about good news ...
          f_send_mail "UP" "DB"
        fi
      else
        f_send_mail "DOWN" "DB"
      fi
    else 
      # non ASM instances-real Oracle instances!
      if [ "$INSTANCE_STATUS" = "OPEN" ]; then
        if [ ${LAST_DB_ERROR_IN_SECONDS} -gt 0 ]; then
          # previous LAST_DB_ERROR_IN_SECONDS > 0 => instance was down
          # let us send mail about good news ...
          f_send_mail "UP" "DB"
        fi
      else
        f_send_mail "DOWN" "DB"
      fi
    fi
  else
    f_send_mail "DOWN" "DB"
  fi
  # check listener part
  ps -ef | grep LISTENER | grep -v grep |grep _${DB_NAME}_
  retvalue=$?
  if [ $retvalue -eq 0  ] ; then
    if [ ${LAST_LSNR_ERROR_IN_SECONDS} -gt 0 ]; then
      f_send_mail "UP" "LSNR"
    fi
  else
    # no listener
    f_send_mail "DOWN" "LSNR"
  fi
  sleep ${CHECK_PERIOD}
done

In mine case listener has always ${DB_NAME} in it's name and has capital letters. Here are some examples of listener on my environment:
ora.node2.LISTENER_PROD_NODE2.lsnr
ora.node3.LISTENER_PROD_NODE3.lsnr
ora.node1.LISTENER_ASM_NODE1.lsnr
ora.node4.LISTENER_DEV_NODE4.lsnr

If you have different situation with listener names, apply it to your need and change line.
ps -ef | grep LISTENER | grep -v grep |grep _${DB_NAME}_

Listener checking may be done in a different way. With dummy user which connect to real ${ORACLE_SID}. Implementation should be like following:
create user dummy identified by dummy;
grant create session to dummy;
It is important to grant only create session privilege and nothing else because of security reasons. The rest is to check status of sqlplus command with such an user:
sqlplus dummy/dummy@${ORACLE_SID}

But this approach need aditional Oracle user (sometimes it is not possible to do that), additional logic in script (ASM instance has no users and placing sys pasword in script is not acceptable)...so previously describe solution looks to me more easy and enough correct in the same time.

Hope this will help someone to build even more sophisticated chack_instance script base on my "old fashioned way"-manual codding!

Here I call someone to show windows version of such an script...

Welcome!

No comments :

Post a Comment

Zagreb u srcu!

Copyright © 2009-2014 Damir Vadas

All rights reserved.


Sign by Danasoft - Get Your Sign