ERROR: CHECK_NRPE: Socket timeout after 10 seconds.

Several conditions can trigger this error with your Nagios checks.  Many of them are obvious, but this one had me stumped for awhile.

Problem

All my nagios checks with NRPE to a given host were failing with the "CHECK_NRPE: Socket timeout after 10 seconds." message.  I logged into the host and made sure NRPE was running, even restarted it.  Double checked the firewall rules to make sure the port was open.  I went to my nagios server, did an NSLOOKUP, PING and TELNET to the port to ensure I was resolving the correct IP address and could connect.  The machine in question was a Virtual Private Server (VPS) so it does sometimes become sluggish and non-responsive, but poking around it all seemed fine.  I tested from the command line of my Nagios server and got the same results.

Solution

What got me looking in the right direction was when I pinged my Nagios server from my host.  It worked fine, but I noticed it took a few seconds to resolve the host.  So then I checked the DNS servers of my Linux VPS.  The first server listed was not pingable.  I quickly flip-flopped the servers in my resolv.conf and VOILA!  My command-line check from my Nagios server fixed it.