Collect pnp4nagios data in Check_MK distributed environment

Introduction

I hope you enjoy the great monitoring software Check_MK. If you have a big environment with lots of hosts and services distributed over dozen of sites (company locations) , you might have already configured your Check_MK to use the distributed monitoring in WATO. I assume that you have got a master and some slaves. The master in terms of Check_MK is the host, which you are using to configure your whole Check_MK installation and to look at all hosts and services, whilst slaves are those hosts that receive the configuration from the master and which the master gets the monitoring results from.  Distributed monitoring is an extremely nice and powerful feature of Check_MK that allows you to keep your configuration at one place and that takes care on distributing the changes on the configuration to the slaves. Also it’s very easy to connect a new slave to the master site.

In this article I’d like to talk about the handling of pnp4nagios graphs that you hopefully enabled and love. I’ve found a restriction in the standard Check_MK conception: though you don’t access to slaves directly to see check results, the PNP graphs are still stored on the slaves and transferred to your web browser from there. You need to add some special reverse proxy directives to your Apache configuration to point to the PNP graphs that are stored on the slave in the remote site. That conceptions has the following disadvantages from my point of view:

  • Network performance issue: In case that your slave has only a slow Internet connection, it might take many seconds to show the graph, because it is loaded from the remote location each time, you hover the mouse over the small PNP icon.  If you open the PNP sites of the service, the site opens very slow and it is no fun to deal with it.
  • Data security in case of slave’s disaster: In case of a crash of a slave also its  PNP data would be away. The configuration is stored on the master (that you have a backup procedure for I assume) and can be put to the new slave very fast. But the valuable PNP statistics are stored on the slave. Either you need to back up slaves additionally to the master of you will lose your PNP data, if the slave crashes.
  • If you like to view services that are monitored from different slaves, you cannot do this so easy. The reason is that PNP data is spread over different slaves and the pnp4nagios cannot get access over boarders of one hosts (or I don’t know this capability).

To get rid of the disadvantages I provide a solution to store the PNP data on the master to get a centralized PNP data collection.  In a few words: it’s based on a Shell script using rsync + ssh + sed and started periodically by cron. Here are the steps to create the script and the appropriate cron job.

Assumptions

Let us say, that the master site’s name is “global”. Let us assume that there are 4 slaves: “omdnorth”, “omdeast”, “omdsouth” and “omdwest”. Let us assume that the appropriate sites are called similar to the slaves themselves: “north”, “east”, “south” and “west”. I assume further that your master’s host name is “omdmaster.company.com”.

Create the script

Log in as the site’s owner (You remember? Site’s name is “global”, so the site owner’s name is “global” as well).  Create the script under “local/bin/pnp-rsync.sh” of the master site. It means that the full path to the script should be /omd/sites/global/local/bin/pnp-rsync.sh

#!/bin/bash
master="global"
perfdata="var/pnp4nagios/perfdata"

echo "+++ started at $(date)"

for slave in north east south west  ; do 
echo "Synching ${slave}.."
rsync -av -e 'ssh ' ${slave}@omd${slave}:/omd/sites/${slave}/${perfdata}/* ${perfdata}/ --exclude='*.rrd'
find ${perfdata} -name '*.xml' | xargs sed -i 's#omd/sites/'${slave}'#omd/sites/'${master}'#'
rsync -av -e 'ssh ' ${slave}@omd${slave}:/omd/sites/${slave}/${perfdata}/* ${perfdata}/ --exclude='*.xml' &
done

echo "+++ finished at $(date)"
echo '--------------------------------------------------------------------'

Save the script and set the executable right on it:

$ chmod +x local/bin/pnp-rsync.sh

Prepare slaves

Generate a new SSH key for the communication between the master and the slaves and put the  generated public key part onto each of the slaves in the file .ssh/authorized_keys. You can refer to numerous articles in the Internet that provide detailed instructions. Make sure that you can connect to each of slaves from the master using SSH like this:

$ ssh -l north omdnorth

Run an initial synchronization

You might want to run the first synchronization not using the cron facility, but manually. I recommend to do this manually, because it might take a long time to copy all PHP data from the slaves to the master and because you would have the opportunity to see, if everything works as expected.

Please make sure that the partition, which OMD is resided on, has enough free space to store all the PNP data, before the initial run.

Run the script with the command like this

$ pnp-rsync.sh >> $OMD_ROOT/var/log/pnp-rsync.log 2>&1 &

and look at the file $OMD_ROOT/var/log/pnp-rsync.log by executing

$ tail -f $OMD_ROOT/var/log/pnp-rsync.log

You should see messages from rsync about copying of files. The destination directory var/pnp4nagios/perfdata should be getting filled by new data.

Set up a new cron job

Create a new cron job by creating a new file like this:

$ vim /omd/sites/global/etc/cron.d/pnp-rsync

The file should consist of only this one line:

*/10 * * * * pnp-rsync.sh >> $OMD_ROOT/var/log/pnp-rsync.log 2>&1

That means that our script should be executed each 10 minutes. In my productive environment with about 15 slaves, more than 500 monitored hosts and more than 20,000 services it takes about 2 minutes to re-synchronize the whole PNP data (~ 22.5 GByte currently). You should play with this value to find out the best one for your needs.

After the cron job file has been created we have to reload the cron daemon (and I hope that you didn’t change the default setting of the OMD site that cron is enabled):

$ omd reload crontab
Removing Crontab...
Initializing Crontab...OK

 Apache configuration

To make sure that the Apache web server running on the master knows, where to find the new data, you have to re-configure the multisite related part of the configuration as root like this:

# vim /etc/httpd/conf.d/multisite_proxy.conf

RewriteEngine On
RewriteRule ^/$ http://omdmaster.company.com/global/ [R,L]
RewriteRule ^/(north|east|south|west)/pnp4nagios/(.*) http://omdmaster.company.com/global/pnp4nagios/$2 [P]
RewriteRule ^/(north|east|south|west)/(.*) http://omdmaster.company.com/$1/$2 [P]

Save the file and reload the Apache web server. On a CentOS machine it can be done with a command like this:

# service httpd reload

Enjoy!

P.S. What I haven’t verified are possible issues with access rights. I can imagine that some of them do not work anymore. That is something, I haven’t taken care of yet.

4 thoughts on “Collect pnp4nagios data in Check_MK distributed environment

  1. Cristian Beltran

    Great!!

    I’m configuring my own check_mk distribuite monitoring with OMD 1.20 , but my doubt is:

    Where put the apache configuration with rewrite rules in OMD server configuration? is necesary create de conf file multisite_proxy.conf on apache folder of OMD?

    Reply
  2. Hermann Maurer Post author

    Yes, the file has to be created in the directory of the Apache web server, which contains other configuration files. As I wrote you can add the directives in the file /etc/httpd/conf.d/multisite_proxy.conf. Depending on your distribution the path can differ. Note simply that this piece of configuration has to refers to the global Apache web server that every user talks to and that acts as a reverse proxy. You don’t need to modify the configuration of the Apache web server running in the site itself.

    Reply
  3. Thomas

    Hi Hermann,

    Nice write-up, just wondering if you found a solution to this:

    If you move hosts between slaves, the perf data is not transferred to the new slave. If you then view the pnp4nagios graphs they will only show the data from the ‘new’ slave… ideas? I heard of the mod_gearman way of pushing the perf data to the master, however, I’m not sure this is ideal.

    Best regards,
    Thomas

    Reply
    1. Hermann Maurer

      Thomas, that’s an issue indeed. A workaround could be copying the pnp data manually to the new slave. I don’t know mod_gearman, can’t say anything about it.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *