Long time no see…
One of my main interests in working with production systems, is to be able to sleep well at the night. A very important component to help making sure I can, is to know when things go bad, which they will; sooner or later. It is just part of life. Just like any car or mechanical thing, a computer system will eventually have a hickup. It is better to know yourself when and what went wrong, than having a customer call you and tell you that something in your shop is broken.
Short story: You can implement quite a mature and powerful monitoring even with a very small budget. Even large corporations are looking into cost effective solutions.
Today, I checked out the OP5 Monitor, which is a commecial but yet very attractive extension of Nagios. It has many bells and whistles, which are not part of the standard issue, mainly when it comes to reporting and configuration. It still took me a couple of hours to set it up the way I wanted. But man, the configuration is a walk in the park in comparison. After the first hit, there is almost no way back to plain vanilla Nagios.
I have used Nagios quite a lot in the past, but it is ugly (eh, the gui honestly looks like crap, but it for sure fulfills it’s purpose) and there is a horde of config files to keep track of.
Well, being an old school Nagios hacker, I already know the basic concepts. Perhaps the ease of config of the OP5 Monitor software is easier for me than for many others, but I will put that aside. Here, I will just give you a quick glance on how easy it is to extend the Nagios NRPE (Nagios Remote Plugin Executor), so that the monitoring server (Nagios or OP5) can execute remote scripts on a host withot having to deal with weird home grown ssh scripts and keys.
First, I have to give you a short introduction to how Nagios checks a service. It is simple, really simple.
If you want to write your own check-script, you need to know what you want to check. A good example is to look for the presence of a file, e.g /tmp/foo.bar. Let us say, that your whole corporation is depending on knowing whether this file exists. A simple way to check this, is to write a script.
#!/bin/ksh
[ ! -f /tmp/foo.bar ] && echo "The file does not exist"
This will just echo a warning if the file does not exist.
If you would like for Nagios to understand this, you need to tell it just a little more; a return code.
#!/bin/ksh
if [ ! -f /tmp/foo.bar ]
then
msg="CRITICAL - The file does not exist"
rc=2
else
msg="OK - The file is here!"
rc=0
fi
echo $msg
return $rc
It is simple as that (plus that you have to go through the tedious job of configuring the chkcommand.cfg file and your Nagios services). With this you have a simple Nagios module.
To make this a NRPE module, which is remotely executed by the Nagios or OP5 server on the server of choice, you just have to put this script somewhere on your monitored server, e.g in /opt/plugins/check_myfile and setup the NRPE configuration.
remote host $> sudo chmod 755 /opt/plugins/check_myfile
remote host $> grep check_myfile /etc/nrpe.d/my_config.cfg
command[myfile]=/opt/plugins/check_myfile
remote host $> sudo /etc/init.d/nrpe restart
On the Nagios server, check that your script works (my remote host has the IP address 192.168.2.90):
OP5 $> /opt/plugins/check_nrpe -H 192.168.2.90 -c myfile
CRITICAL - The file does not exist
remote_host $> touch /tmp/foo.bar
OP5 $> /opt/plugins/check_nrpe -H 192.168.2.90 -c myfile
OK - The file is here!
That is basically it! Now, go ahead and configure a new nrpe service for a host in your OP5 environment, and put the work “myfile” in the “check_command_args” field, and you are done. Two minutes of work, and you save yourself tons of head ache.
DEBUG: The script has to send at least something to stdout, it doesn’t really matter what. Othervise you will get an error message from the server side check_nrpe script:
remote host $> grep echo
# echo $msg
OP5 $> /opt/plugins/check_nrpe -H 192.168.2.90 -c myfile
NRPE: Unable to read output