Indirect Host and Service Checks

Introduction

Chances are, many of the services that you're going to be monitoring on your network can be checked directly by using a plugin on the host that runs Nagios. Examples of services that can be checked directly include availability of web, email, and FTP servers. These services can be checked directly by a plugin from the Nagios host because they are publicly accessible resources. However, there are a number of things you may be interested in monitoring that are not as publicly accessible as other services. These "private" resources/services include things like disk usage, processor load, etc. on remote machines. Private resources like these cannot be checked without the use of an intermediary agent. Service checks which require an intermediary agent of some kind to actually perform the check are called indirect checks.

Indirect checks are useful for:

Monitoring "local" resources (such as disk usage, processer load, etc.) on remote hosts
Monitoring services and hosts behind firewalls
Obtaining more realistic results from checks of time-sensitive services between remote hosts (i.e. ping response times between two remote hosts)

There are several methods for performing indirect active checks (passive checks are not discussed here), but I will only talk about how they can be done by using the nrpe addon.

Indirect Service Checks

The diagram below shows how indirect service checks work. Click the image for a larger version...

Multiple Indirected Service Checks

If you are monitoring servers that lie behind a firewall (and the host running Nagios is outside that firewall), checking services on those machines can prove to be a bit of a pain. Chances are that you are blocking most incoming traffic that would normally be required to perform the monitoring. One solution for performing active checks (passive checks could also be used) on the hosts behind the firewall would be to poke a tiny hold in the firewall filters that allow the Nagios host to make calls to the nrpe daemon on one host inside the firewall. The host inside the firewall could then be used as an intermediary in performing checks on the other servers inside the firewall.

The diagram below show how multiple indirect service checks work. Notice how the nrpe daemon is running on hosts #1 and #2. The copy that runs on host #2 is used to allow the nrpe agent on host #1 to perform a check of a "private" service on host #2. "Private" services are things like process load, disk usage, etc. that are not directly exposed like SMTP, FTP, and web services. Click on the diagram for a larger image...

Indirect Host Checks

Indirect host checks work on the same principle as indirect service checks. Basically, the plugin used in the host check command asks an intermediary agent (i.e. a daemon running on a remote host) to perform the host check for it. Indirect host checks are useful when the remote hosts being monitored are located behind a firewall and you want to restrict inbound monitoring traffic to a particular machine. That machine (remote host #1 in the diagram below) performs will perform the host check and return the results back to the top level check_nrpe plugin (on the central server). It should be noted that with this setup comes potential problems. If remote host #1 goes down, the check_nrpe plugin will not be able to contact the nrpe daemon and Nagios will believe that remote hosts #2, #3, and #4 are down, even though this may not be the case. If host #1 is your firewall machine, then the problem isn't really an issue because Nagios will detect that it is down and mark hosts #2, #3, and #4 as being unreachable.

The diagram below shows how an indirect host check can be performed by using the nrpe daemon and check_nrpe plugin. Click the image for a larger version.