Coraid Monitoring Scripts

Here you will find the most current versions of our CORAID monitoring scripts:

These scripts are intended to be used to monitor the status of CORAID's AoE (ATA over Ethernet) devices.

While working with CORAID's AoE devices for our clients, I realized that I didn't have any quick way to be notified if a problem arose such as a failed drive, a change in the device's configuration, or some disconnect between the servers and the CORAID devices.

The good thing is that some of our client's CORAID devices have been running for almost three years without any type of failure, but as everyone knows, there are two types of hard drives; Those that have failed and those that will fail. With that in mind, these scripts were born.

The following scripts have been tested against CORAID's SR421 and SR1521 devices, are known to work, and are in production in multiple locations. In other words, they work for me, but your mileage may vary. Please submit any comments, suggestions or changes so that I may update these scripts accordingly.

ToDo: Add support for password protected CEC sessions in the expect script embedded in the cec-chk-coraid.sh script.

NOTES: CORAID's "cec" program requires by default a "CTRL-\" key sequence to exit from the command line interface where a "q" may be entered to disconnect from the CORAID device. In the cec-chk-coraid.sh script, the embedded expect script changes the default "\" (backslash) character to a "e" when the cec program is called because "\" characters can sometimes be ugly in bash shell scripts. :)

If you attempt to cut and paste this script, it will fail due to the fact that the line -- send "^E " -- is not sending a "^" (carat) followed by an "E" (capital letter e), but rather a single "Control-e" character.

If you have pasted this script from this website to a local file, you can change the resulting two-character "^E" in the script to a single CONTROL-e character from within the vi editor by entering the following keystrokes:

i, CTRL-v, CTRL-e, esc

Then save the file.


Related Scripts

We have also written several other scripts that you may find useful. You may find them HERE


aoe-chk-coraid.sh

#!/bin/bash
#
# aoe-chk-coraid.sh - The most current version of this script may
# be found at http://www.revpol.com/coraid_scripts
#
# William A. Arlofski
# Reverse Polarity, LLC
# 860-824-2433 Office
# http://www.revpol.com/
#
# ---------
# Changelog
# ---------
# 20080227 - waa - Initial version
# 20080306 - waa - Minor modifications
#          - Slight changes in wording of INSTALLATION section
#          - Modified umask from 007 to 0007
#
# ----------------------------------------------------------------------
#
# -------
# PURPOSE
# -------
# - Sample bash shell script which makes use of the aoeping and
#   aoe-stat commands to check the status of one or more CoRAID
#   lblades.
#
# - Given a list of lblades to check, the first test will verify the
#   current output of the aoeping command against a known good output
#   of the aoeping command.
#
# - The next test uses the aoe-stat command to verify the current output
#   of the aoe-stat command against a known good output of the aoe-stat
#   command.
#
# - If the current output of either test differs from the known good
#   baseline files, an admin will be notified by email.
#
# ------------
# INSTALLATION
# ------------
# - Edit the variables below as needed to match your site configuration
#
# - Create your baseline files by issuing the following commands as root:
#
#  # aoe-stat > /var/state/aoe-stat.baseline
#  # aoeping -v -S return_status 0 0 eth1 > /var/state/shelf0.slot0.baseline
#
# - Replace the shelf, slot and interface settings above with your own
#
# - Do this for each lblade that you want to monitor
#
# - It is a good idea to chattr +i the baseline files immediately after they
#   are created. This way your "known good" files remain "good."
#
# - Add this script to your contab file to run at any interval you desire. I
#   run this script once per hour from my crontab file like so:
#
#   0 * * * * /usr/local/sbin/aoe-chk-coraid.sh > /dev/null 2>&1
#
# - Use cec-chk-coraid.sh for a script that makes use of expect to talk
#   to the CoRAID devices using the cec (Coraid Ethernet Console) program to
#   perform a similar test.
#
###############################################################################
#
# Copyright (C) 2008 William A. Arlofski - waa-at-revpol-dot-com
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License, version 2, as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
# or visit http://www.gnu.org/licenses/gpl.txt
#
###############################################################################
#
#
# ---------------------
# Set up some variables
# ---------------------
umask 0007
aoeif="eth1"
admin="admin@example.com"
host="server@example.com"
statedir="/var/state"
diff="/usr/bin/diff"
mktemp="/bin/mktemp"
aoeping="/usr/sbin/aoeping"
aoestat="/usr/sbin/aoe-stat"
sendmail="/usr/sbin/sendmail"
# --------------------------------------
# Define the lblades you wish to monitor
# here in the form of "shelfX.slotY"
# Separate multiple lblades with spaces
# --------------------------------------
lbladelist="shelf0.slot0"

# -------------------------------------------
# Nothing below here should need to be edited
# -------------------------------------------
#
#
# -------------
# Check lblades
# -------------
for lblade in $lbladelist; do
 shelf=`echo "$lblade" | cut -d'.' -f1 | sed -e 's/shelf//'`
 slot=`echo "$lblade" | cut -d'.' -f2 | sed -e 's/slot//'`
 "$aoeping" -v -s 5 -S return_status "$shelf" "$slot" "$aoeif" > "$statedir/$lblade.current"
 "$diff" "$statedir/$lblade.current" "$statedir/$lblade.baseline" >> /dev/null

 if [ "$?" != "0" ]; then
   # -------------------------------------
   # An error was detected, emailing admin
   # -------------------------------------
   mailfile=`"$mktemp"`
   echo "To: $admin" > "$mailfile"
   echo "From: $admin" >> "$mailfile"
   echo "Reply-To: $admin" >> "$mailfile"
   echo "Subject: CoRAID error on $lblade at $host" >> "$mailfile"
   echo >> "$mailfile"
   echo "Baseline of $lblade" >> "$mailfile"
   echo "--------------------------" >> "$mailfile"
   cat "$statedir/$lblade.baseline" >> "$mailfile"
   echo >> "$mailfile"
   echo >> "$mailfile"
   echo "CURRENT STATUS of $lblade" >> "$mailfile"
   echo "-------------------------------" >> "$mailfile"
   cat "$statedir/$lblade.current" >> "$mailfile"
   cat "$mailfile" | "$sendmail" -t
   # ---------------
   # Cleanup tmpfile
   # ---------------
   rm -f "$mailfile"
 fi
done

# ---------------
# Check aoe-stats
# ---------------
# This test WILL generate an email if AoE devices
# or lblades are added, removed or changed on the
# the network interface defined. Disable this check
# if you don't need or care to know about these instances
# -------------------------------------------------------
"$aoestat" > "$statedir/aoe-stat.current"
"$diff" "$statedir/aoe-stat.current" "$statedir/aoe-stat.baseline" >> /dev/null

if [ "$?" != "0" ]; then
  # -------------------------------------
  # An error was detected, emailing admin
  # -------------------------------------
  mailfile=`mktemp`
  echo "To: $admin" > "$mailfile"
  echo "From: $admin" >> "$mailfile"
  echo "Reply-To: $admin" >> "$mailfile"
  echo "Subject: CoRAID aoe-stat change reported on $aoeif by host $host" >> "$mailfile"
  echo >> "$mailfile"
  echo "Baseline of aoe-stat command" >> "$mailfile"
  echo "--------------------------" >> "$mailfile"
  cat "$statedir/aoe-stat.baseline" >> "$mailfile"
  echo >> "$mailfile"
  echo >> "$mailfile"
  echo "CURRENT STATUS of aoe-stat command" >> "$mailfile"
  echo "--------------------------" >> "$mailfile"
  cat "$statedir/aoe-stat.current" >> "$mailfile"
  cat "$mailfile" | "$sendmail" -t
  # ---------------
  # Cleanup tmpfile
  # ---------------
  rm -f "$mailfile"
fi


cec-chk-coraid.sh

#!/bin/bash
#
# cec-chk-coraid.sh - The most current version of this script may
# be found at http://www.revpol.com/coraid_scripts
#
# William A. Arlofski
# Reverse Polarity, LLC
# 860-824-2433 Office
# http://www.revpol.com/
#
# ---------
# Changelog
# ---------
# 20080227 - waa - Initial version
# 20080306 - waa - Minor modifications
#          - Slight changes in wording of INSTALLATION section
#          - Added NOTES section to explain CONTROL character in
#            expect script
#          - Modified umask from 007 to 0007
#          - Added aoeif variable and changed expect spawn command to
#            make use of this variable instead of the hard-coded "eth1"
#
# ------------------------------------------------------------------------
#
# -------
# PURPOSE
# -------
# - Sample bash shell script which makes use of expect and the cec
#   program to check the status of one or more CoRAID shelves.
#
# - Given a list of CoRAID shelves to check, this script will check the
#   output of the expect script against known good output.
#
# - If the current output of the expect script differs from the known good
#   baseline, an email will be sent to the defined admin(s).
#
#
# ------------
# INSTALLATION
# ------------
# - Edit the variables below as needed to match your site configuration
#
# - Create your baseline files by running this script once - ignoring any
#   errors reported - then copy cec-shelfX.current to cec-shelfX.baseline
#
# - It is a good idea to chattr +i the baseline files immediately after
#   they are created. This way your "known good" files remain "good."
#
# - Add this script to your contab file to run at any interval you desire.
#   I run this script once per hour from my crontab file like so:
#
#   0 * * * * /usr/local/sbin/cec-chk-coraid.sh > /dev/null 2>&1
#
# - Use aoe-chk-coraid.sh to make use of the aoeping and aoe-stat commands
#   to talk to your CoRAID devices and perform similar tests.
#
# -----
# NOTES
# -----
# Coraid's "cec" program requires by default a "CTRL-\" key sequence to exit
# from the command line interface where a "q" may be entered to disconnect
# from the Coraid device. In the cec-chk-coraid.sh script, the embedded expect
# script changes the default "\" (backslash) character to a "e" when the cec
# program is called because "\" characters can sometimes be ugly in bash shell
# scripts. :)
#
# If you attempt to cut and past this script, it will fail due to the fact that
# the line -- send "^E " -- is not sending a "^" (carat) followed by an
# "E" (capital letter e), but rather a single "Control-e" character.
#
# If you have pasted this script from the website to a local file, you can change
# the resulting two-character "^E" in the script to a single CONTROL-e character
# from within the vi editor by entering the following keystrokes:
#
# i, CTRL-v, CTRL-e, esc
#
# Then save the file.
#
#
###############################################################################
#
# Copyright (C) 2008 William A. Arlofski - waa-at-revpol-dot-com
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License, version 2, as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
# or visit http://www.gnu.org/licenses/gpl.txt
#
###############################################################################
#
# ---------------------
# Set up some variables
# ---------------------
umask 0007
aoeif="eth1"
admin="admin@example.com"
host="server@example.com"
statedir="/var/state"
cec="/usr/sbin/cec"
expect="/usr/bin/expect"
sendmail="/usr/sbin/sendmail"
diff="/usr/bin/diff"
mktemp="/bin/mktemp"
#
# Define the shelves to be tested here in the
# form of shelfX, where X represents the shelf
# number. Separate multiple shelves with spaces
# ---------------------------------------------
shelflist="shelf0"

# -------------------------------------------
# Nothing below here should need to be edited
# -------------------------------------------
#
#
# Get the current status using expect to talk
# to the CoRAID devices via the cec program
# -------------------------------------------
for shelf in $shelflist; do
shelfnum=`echo $shelf | sed 's/shelf//'`
`"$expect" > "$statedir/cec-$shelf.current" << WAAcecEOF
spawn "$cec" -s "$shelfnum" -ee  "$aoeif"
expect "Escape is Ctrl-e"
send "\r"
expect -re "SR shelf(.*)>"
send "show -l\r"
expect -re "SR shelf(.*)>"
send "list -l\r"
expect -re "SR shelf(.*)>"
send "\r"
send "^E"
expect ">>>"
send "q\r"
WAAcecEOF`

# Test current cec output against known good baseline output
# ----------------------------------------------------------
"$diff" "$statedir/cec-$shelf.current" "$statedir/cec-$shelf.baseline" >> /dev/null

 if [ "$?" != "0" ]; then
   # -------------------------------------
   # An error was detected, emailing admin
   # -------------------------------------
   mailfile=`"$mktemp"`
   echo "To: $admin" > "$mailfile"
   echo "From: $admin" >> "$mailfile"
   echo "Reply-To: $admin" >> "$mailfile"
   echo "Subject: CoRAID error detected on $shelf at $host" >> "$mailfile"
   echo >> "$mailfile"
   echo "------------------" >> "$mailfile"
   echo "Baseline of $shelf" >> "$mailfile"
   echo "------------------" >> "$mailfile"
   cat "$statedir/cec-$shelf.baseline" >> "$mailfile"
   echo >> "$mailfile"
   echo >> "$mailfile"
   echo "------------------------" >> "$mailfile"
   echo "CURRENT STATUS of $shelf" >> "$mailfile"
   echo "------------------------" >> "$mailfile"
   cat "$statedir/cec-$shelf.current" >> "$mailfile"
   cat "$mailfile" | "$sendmail" -t
   # ---------------
   # Cleanup tmpfile
   # ---------------
   rm -f "$mailfile"
 fi
done