UCD-SNMP (NET-SNMP) Integration


Note: Nagios is not designed to be a replacement for a full-blown SNMP management application like HP OpenView or OpenNMS. However, you can set things up so that SNMP traps received by a host on your network can generate alerts in Nagios. Here's how...

Introduction

This example explains how to easily generate alerts in Nagios for SNMP traps that are received by the UCD-SNMP snmptrapd daemon. These directions assume that the host which is receiving SNMP traps is not the same host on which Nagios is running. If your monitoring box is the same box that is receiving SNMP traps you will need to make a few modifications to the examples I provide. Also, I am assuming that you having installed the nsca daemon on your monitoring server and the nsca client (send_nsca) on the machine that is receiving SNMP traps.

For the purposes of this example, I will be describing how I setup Nagios to generate alerts from SNMP traps received by the ArcServe backup jobs running on my Novell servers. I wanted to get notified when backups failed, so this worked very nicely for me. You'll have to tweak the examples in order to make it suit your needs.

Additional Software

Translating SNMP traps into Nagios events can be a bit tedious. If you'd like to make it easier, you might want to check out Alex Burger's SNMP Trap Translator project located at http://www.snmptt.org which, combined with Net-SNMP, provides a more enhanced trap handling system. The snmptt documentation includes integration details for Nagios.

Defining The Service

First off you're going to have to define a service in your object configuration file for the SNMP traps (in this example, I am defining a service for ArcServe backup jobs). Assuming that the host that the alerts are originating from is called novellserver, a sample service definition might look something like this:

define service{
	host_name                       novellserver
	service_description             ArcServe Backup
	is_volatile                     1
	active_checks_enabled		0
	passive_checks_enabled		1
	max_check_attempts              1
	contact_groups                  novell-backup-admins
	notification_interval           120
	notification_period             24x7
	notification_options            w,u,c,r
	check_command                   check_none
	}

Important things to note are the fact that this service has the volatile option enabled. We want this option enabled because we want a notification to be generated for every alert that comes in. Also of note is the fact that active checks are disabled for the service, while passive checks are enabled. This means that the service will never be actively checked - all alert information will have to be sent in passively by the nsca client on the SNMP management host (in my example, it will be called firestorm).

ArcServe and Novell SNMP Configuration

In order to get ArcServe (and my Novell server) to send SNMP traps to my management host, I had to do the following:

  1. Modify the ArcServe autopilot job to send SNMP traps on job failures, successes, etc.
  2. Edit SYS:\ETC\TRAPTARG.CFG and add the IP address of my management host (the one receiving the SNMP traps)
  3. Load SNMP.NLM
  4. Load ALERT.NLM to facilitate the actual sending of the SNMP traps

SNMP Management Host Configuration

On my Linux SNMP management host (firestorm), I installed the UCD-SNMP (NET-SNMP) software. Once the software was installed I had to do the following:

  1. Install the ArcServe MIBs (included on the ArcServe installation CD)
  2. Edit the snmptrapd configuration file (/etc/snmp/snmptrapd.conf) to define a trap handler for ArcServe alerts. This is detailed below.
  3. Start the snmptrapd daemon to listen for incoming SNMP traps

In order to have the snmptrapd daemon route ArcServe SNMP traps to our Nagios host, we've got to define a traphandler in the /etc/snmp/snmptrapd.conf file. In my setup, the config file looked something like this:

#############################
# ArcServe SNMP Traps
#############################

# Tape format failures
traphandle ARCserve-Alarm-MIB::arcServetrap9 /usr/local/nagios/libexec/eventhandlers/handle-arcserve-trap 9

# Failure to read tape header
traphandle ARCserve-Alarm-MIB::arcServetrap10 /usr/local/nagios/libexec/eventhandlers/handle-arcserve-trap 10

# Failure to position tape
traphandle ARCserve-Alarm-MIB::arcServetrap11 /usr/local/nagios/libexec/eventhandlers/handle-arcserve-trap 11

# Cancelled jobs
traphandle ARCserve-Alarm-MIB::arcServetrap12 /usr/local/nagios/libexec/eventhandlers/handle-arcserve-trap 12

# Successful jobs
traphandle ARCserve-Alarm-MIB::arcServetrap13 /usr/local/nagios/libexec/eventhandlers/handle-arcserve-trap 13

# Imcomplete jobs
traphandle ARCserve-Alarm-MIB::arcServetrap14 /usr/local/nagios/libexec/eventhandlers/handle-arcserve-trap 14

# Job failures
traphandle ARCserve-Alarm-MIB::arcServetrap15 /usr/local/nagios/libexec/eventhandlers/handle-arcserve-trap 15

This example assumes that you have a /usr/local/nagios/libexec/eventhandlers/ directory on your SNMP mangement host and that the handle-arcserve-trap script exists there. You can modify these to fit your setup. Anyway, the handle-arcserve-trap script on my management host looked something like this:

#!/bin/sh

# Arguments:
#  $1 = trap type

	# First line passed from snmptrapd is FQDN of host that sent the trap
     read host

    # Given a FQDN, get the short name of the host as it is setup in Nagios
    hostname="unknown"
    case $host in
        novellserver.mylocaldomain.com)
            hostname="novellserver"
            ;;
        nt.mylocaldomain.com)
            hostname="ntserver"
            ;;
	esac
	
    # Get severity level (OK, WARNING, UNKNOWN, or CRITICAL) and plugin output based on trape type
    state=-1
    output="No output"
    case "$1" in

        # failed to format tape - critical
        11)
            output="Critical: Failed to format tape"
            state=2
            ;;

        # failed to read tape header - critical
        10)
            output="Critical: Failed to read tape header"
            state=2
            ;;

        # failed to position tape - critical
        11)
            output="Critical: Failed to position tape"
            state=2
            ;;

        # backup cancelled - warning
        12)
            output="Warning: ArcServe backup operation cancelled"
            state=1
            ;;

        # backup success - ok
        13)
            output="Ok: ArcServe backup operation successful"
            state=0
            ;;

        # backup incomplete - warning
        14)
            output="Warning: ArcServe backup operation incomplete"
            state=1
            ;;

        # backup failure - critical
        15)
            output="Critical: ArcServe backup operation failed"
            state=2
            ;;
    esac


    # Submit passive check result to monitoring host
    /usr/local/nagios/libexec/eventhandlers/submit_check_result $hostname "ArcServe Backup" $state "$output"

exit 0

Notice that the handle-arcserve-trap script calls the submit_check_result script to actually send the alert back to the monitoring host. Assuming your monitoring host is called monitor, the submit check_result script might look like this (you'll have to modify this to specify the proper location of the send_nsca program on your management host):

#!/bin/sh

# Arguments
#	$1 = name of host in service definition
#	$2 = name/description of service in service definition
#	$3 = return code
#	$4 = output

/bin/echo -e "$1\t$2\t$3\t$4\n" | /usr/local/nagios/bin/send_nsca monitor -c /usr/local/nagios/etc/send_nsca.cfg

Finishing Up

You've now configured everything you need to, so all you have to do is restart the Nagios on your monitoring server. That's it! You should be getting alerts in Nagios whenever ArcServe jobs fail, succeed, etc.