Software development is 20% writing code and 80% maintenance. Since most of your time is spent keeping the lights on, your logs need to be simple to read. High-quality logs aren’t just a “nice to have” they are the only way to make observability actually work.
This piece is primarily focused on ERROR logs, dealing with exceptions. A few rule of thumb for error logs:
- Only log an ERROR if the operation (eg request) failed unexpectedly: If the opeartion keeps going, it’s not an error.
- Clear: State exactly what went wrong in plain English.
- Human intervention: If a developer doesn’t need to jump in and fix something, it shouldn’t be an ERROR. Log severity should encode actionability, not just system correctness.
These rules exist to ensure every log is accurate and actionable.
Accuracy is paramount. If your code handles a bad password gracefully, don’t log an ERROR. That’s just the system doing its job. If the request didn’t terminate unexpectedly, keep the log level at INFO or WARN.
Actionable is about what happens next. When an error log alert hits Slack, it should tell the developer exactly what happened and how to investigate or fix the problem. If you see an alert and your first instinct is to ignore it, the log level needs to be lowered.
This is where the Noise Tax comes in.
Most teams don’t have a defined process for fixing noisy logs. We might tweak a dashboard here or there, but eventually, the noise creeps back in. This eventually leads to alert fatigue. You start ignoring the alerts. Keep your logs clean. If an error doesn’t demand an answer, it’s just expensive noise.