April 10, 2008 by alex
autoki has recently grown into a more and more complex application. Besides two clusters of mongrels and the mysql database we have a memcached server, ferret and starling plus clients for asynchronous processing. WIth so many services running (and sometimes not running) the need to monitor all these grew. We decided to set up nagios on one of the servers - it’s ugly but it lets you monitor all sorts of stuff pretty easily via remote agents that run on each monitored server.
With nagios we have access to quite a number of monitoring plugins, e.g. for monitoring TCP ports (e.g. for checking memcached is still alive), HTTP, server load, free disk space etc. Yesterday I came to a point where I wanted to monitor something nagios couldn’t: When a user uploads a bunch of photos, the task of creating copies of the photos in different sizes is put in a starling queue for asycnhronous processing. If something with that processing goes wrong and the queue gets too big I want nagios to pick this up and notify me. Time for my own nagios plugin.
Nagios plugins are actually very simple. All you have to provide is something that can be executed in a shell and that returns either 0, 1 or 2 for an OK, Warning or Critical state of the monitored service. So here’s the source code for monitoring the number of photo uploads in the queue (RAILS_ROOT/lib/check_photo_uploads.rb):
At autoki the server that runs the asynchronous processes is called jobs1 and this is also where the queue should be checked, so I added this to the /etc/nagios/nrpe.cfg file (the config file for the remote nagios agent):
Then I had to add the service to the nagios configuration on the monitoring server:
Well, that was it, and this is how it looks - ugly but it works :)
(I recently signup with scout - looks much prettier and the setup is much easier than nagios, plugins are written as ruby classes and it comes as a ruby gem - sweet concept so far, could have been my idea, more on that later)