Nagios Plugin–Advance Traceroute to check between two devices

We had to create a plugin to basically do the following
1) Do a typical traceroute from the Nagios box to a destination IP
2) Instead of calculating the time between the Nagios to Destination Host, we are interested to know the time between two host in between

In other words, a typical traceroute will
NagionServer –> Gateway –> Hop 1 –> Hop 2 –> Hop 3 –> Destination

What this plugin can do is when defined correctly, to check the time (in ms) between Hop 1 up until Hop 3, plot a graph and put up warning and critical values for your alerting.
Here’s the sample plugin, and relevant configuration files you probably need.
NOTE: You may need to tweak for different  Oses other than Debian as this was created and tested with a Debian.
The plugin

  • The plugin (place typically in /usr/local/nagios/libexec)
  • Paste below into a file say trace_time
  • Make sure it belongs to user <nagios> and has execution right; e.g.
  • chown nagios:nagios /usr/local/nagios/libexec/trace_time
  • chmod +X /usr/local/nagios/libexec/trace_time
#####START PLUGIN#####
# usage
# ./trace-time <final-dest> <startip> <endip> <warning> <critical>
# Note: You must define all three, there’s no error checking
# tip: do a traceroute first, then determine from which ip to which ip do you want to calculate. If
PROG=`which traceroute`
if [[ $DEST == “” ]]; then
    echo “UNKNOWN: No destination ip defined”
    exit 3

if [[ $IP1 == “” ]]; then
        echo “UNKNOWN: No start ip defined”
    exit 3

if [[ $IP2 == “” ]]; then
if [[ $WARNING -eq “” ]]; then
        echo “UNKNOWN: No warning value defined”
        exit 3
if [[ $CRITICAL == “” ]]; then
        echo “UNKNOWN: No critical value defined”
        exit 3

if [[ $WARNING  >  $CRITICAL ]]; then
        echo “UNKNOWN: Warning value larger than critical value”
        exit 3
myepoch=`date +%s`
/bin/touch $filename
/bin/touch $tempfile
/bin/chown nagios:nagios $filename
/bin/chown nagios:nagios $tempfile
getreading=`$PROG -n -q 1 $DEST > $tempfile`
numberip1=`cat $tempfile | grep ms | grep $IP1 | awk {‘print $1’}`
numberip2=`cat $tempfile | grep ms | grep $IP2 | awk {‘print $1’}`
for i in $(seq $numberip1 $numberip2)
    getms=`cat $tempfile | sed -e ‘s/^[ t]*//’ | grep ^$i |  awk {‘print $3’}`
    echo $getms >> $filename
startcalc=`awk ‘{s+=$0} END {print s}’ $filename`
rm $filename
rm $tempfile
if awk ‘BEGIN{if(0+’$startcalc’>’$CRITICAL’+0)exit 0;exit 1}’
        echo “CRITICAL($startcalc): Time exceed critical value|$grapher=$startcalc;$WARNING;$CRITICAL”
        exit 2
if awk ‘BEGIN{if(0+’$startcalc’>’$WARNING’+0)exit 0;exit 1}’
        echo “WARNING($startcalc): Time exceed warning value|$grapher=$startcalc;$WARNING;$CRITICAL”
        exit 1
        echo “OK($startcalc): Time OK|’$grapher’=$startcalc;$WARNING;$CRITICAL;;”
        exit 0
#####END PLUGIN#####

Nagios – Host.cfg

define host{
        use                     debian5-linuxserver
        host_name     Google WWW server
        alias                   For Tracing TimeHop Distances

Nagios – commands.cfg

define command{
        command_name    check_time_between_hosts
        command_line    $USER1$/trace-time $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$ $ARG4$

Nagios – services.cfg

define service{
        use                                       debian5-linuxservice
        host_name                       Google WWW server
        service_description      Between IP to
        action_url                          /nagios/pnp/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$
        check_command            check_time_between_hosts!!!10!20
  • Note, the template debian5-linuxservice and debian5-linuxserver is not default and you need to define one first or use the defaults
    Now, just restart Nagios to make it work.

    More info
    In order for you to know the hop you wish to monitor, simply do a traceroute;

traceroute -n -q 1
-n = Numeric output
– q 1= Only do a single query

In this example below, I am tracing to one of Google’s servers at, the output of the trace is like below (NOTE!: actual IPs have been changed)

1  0.554 ms
2  0.667 ms
3  1.026 ms
4  1.218 ms
5  1.488 ms
6  1.627 ms
7  1.542 ms
8  2.322 ms
9  3.075 ms
10  2.801 ms
So lets say you wish to trace the time between IP and IP113.23.161.66, simply use the plugin with these values on the CLI (to test);

./trace-time 10 20

And the output will look like this;
OK(5.909): Time OK|’–>’=5.909;10;20;;
*Which is a typical output expected by Nagios with PNP graphing enabled
Graphs will look like this

1 Comment

  1. Sanjay, nice job. I did some testing to see if I could adapt the code for my need of testing a MPLS connection between offices. I was thinking of using it to test the Primary (ideal) connection. We discovered that you are not testing if the IPs are actually in the traceroute results. Lets say the second hop IP drops off the network. You are still giving a time result. I think you need to test if numberip1 and numberip2 are in the traceroute results. The answer is blank when they are missing. This should make your plugin more reliable.

Comments are closed.