It's been Critical for how long?
Nagios has a wonderful ‘duration’ column in its web interface that’s always bemused me. At what point does a check being in a warning, or even worse, a critical state stop being a problem worthy of head space and start being normal operating procedure?
Checks can stay in an extended broken state for many reasons but they all seem to be symptoms of a larger problem. If it’s a small thing then are you getting enough time to do housekeeping?
Read on →