Nagios Event Handler - Restart Remote Service

I wrote a few quick posts on using Nagios Event_Handlers to restart a service on the local system.  Mostly I followed the example from the Nagios documentation, but it was a little tricky using SUDO to restart a service.  Once I solved that, the logical next step was to be able to restart a service on a REMOTE system with the event_handlers and NRPE.

NAGIOSSVR runs nagios and monitors itself and WEBSVR.  I use the "check_linux_procs" script which is also known as "check_system_procs".  On the remote server WEBSVR, the script configuration lines look something like:

# Processes to check
PROCLIST_RED="httpd sendmail nrpe"
PROCLIST_YELLOW="crond"

# Ports to check
PORTLIST="25 80 5666"

The check_linux_procs is executed on the remote server via NRPE.  We can use NRPE to remotely execute event handlers as well as service checks.  Setup is a bit more complex than a local host configuration. 

Proper SUDO configuration is required on the remote system, WEBSVR.   Read my other post on the Nagios Local Sevice Restart with Event_Handlers for the more information on the SUDO settings.

On WEBSVR I created a very simple script that uses sudo to restart the services.  Something like:

#!/bin/sh
#
/usr/bin/sudo /sbin/service httpd restart
/usr/bin/sudo /sbin/service sendmail restart
exit 0

NRPE is not listed because... well, if NRPE crashes the event_handler cannot run since it uses NRPE to connect and execute the script.  Do not foget to add your script to your nrpe.cfg file like below and restart NRPE on WEBSVR.

command[remote_restart]=/usr/local/nagios/libexec/eventhandlers/remote-restart

On NAGIOSSVR, this service check has a max_check_attempts of 3.  So I had to tweak the script I used before.  The trick here is passing the right variables through.  In my Nagios commands.cfg I added the $HOSTADDRESS$ value to the end of the line like so:

command_line    $USER1$/eventhandlers/restart-services-remote \
$SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$

And the local "event_handler" script on NAGIOSSVR looks similar to the localhost example, but the "sudo" restart command is replaced with:

/usr/local/nagios/libexec/check_nrpe -H $4 -c remote_restart

Don't forget to change the case logic if you need to adjust for a different max_check_attempts value in your config.

NOTES: 

  • You could break this into two checks and event_handlers, but I just restart both services to keep it as simple as possible. 
  • You may also try use key-based SSH w/o a password as an alternative to NRPE.  That may be my next tweak to work around NRPE itself crashing.

HINTS: 

  • Always double check script ownership and permissions.  I had forgot to make the script executable on WEBSVR and that held me up for a few trying to sort it out.