Logging Best Practices, Part 1: Priority Number One

4 MIN READ

MIN READ

Laura Santamaria

6.30.20

As LogDNA’s Developer Advocate, Laura Santamaria loves to learn and explain how things work. She bridges the gap between external developers and SREs and internal engineering teams. In addition, she is the curator for A Minute on the Mic. Apart from work, she co-hosts Austin DevOps and Cloud Austin, taught Python for Women Who Code Austin for many years, is an organizer for DevOpsDays Texas, and volunteers with DevOpsDays Austin. Outside of tech, Laura runs, plays with her dogs, throws discs, and watches clouds—the real kind.

4 MIN READ

MIN READ

Isn’t all logging pretty much the same? Logs appear by default, like magic, without any further intervention by teams other than simply starting a system… right?

While logging may seem like simple magic, there’s a lot to consider. Logs don’t just automatically appear for all levels of your architecture, and any logs that do automatically appear probably don’t have all of the details that you need to successfully understand what a system is doing. Let’s start our exploration of logging best practices by thinking about priority number one for logs: actionable logs.

‍Regardless of how you log, the most important part of logging after simply having logs is having logs you can actually use. Actionable logs provide enough information, enough details, and enough history to ensure you actually can use the logs to get something done.

Let’s examine an example system. This example system is an application with a microservices architecture running on a Kubernetes cluster that lives on top of a FlatcarOS box. By default, you get logs from FlatcarOS and Kubernetes. How do you get logs for the different microservices? You have to program those logs into your application while building the codebase. Consider the following elements when you think about how to build actionable logs.

Logging Best Practice with LogDNA - Information

Enough Information

Your logs should provide enough information to answer “when,” “who,” and “where” for any event. As an example, let’s say you want to have access logs. If there was an access event for your system, the logline for that access event should answer when that access occurred, known as the timestamp. The logline should also include who accessed the system. Often this detail comes in the form of an IP address or other identifier, and the “who” should include not just human action, but also actions by other systems. Always log every access! Finally, the logline should include where the system was accessed, meaning which service was accessed or which part of the platform was accessed.

Logging Best Practice with LogDNA Details

Enough Details

Your logs should provide enough details to answer “what” exactly happened during any single event. Simply answering the who, when, and where doesn’t actually tell you what happened, and a basic answer such as “this pod started” doesn’t provide enough detail to differentiate a code change triggering a pod restart from a system crash triggering the orchestrator restarting pods.

Loggin Best Practice with LogDNA - History

Enough History

Finally, your logs should provide enough history to answer “why” something happened. Logs overall help you understand the context of the action that was taken. Writing log lines to a terminal somewhere without storing them is pointless. A single logline doesn’t provide you with enough context to understand the “why” for any action or event. How could you tell if an event was normal for a system without an easily accessible history?

All three of these components to a logging plan lead to actionable logs—logs you can actually use to understand what happened, why it happened, and how you can act to handle the situation. If you can answer who, what, where, when, and why from your logs, you probably have actionable logs and are well on your way to meeting or exceeding logging best practices.

false

TABLE OF CONTENTS

SHARE ARTICLE

RSS FEED

RELATED ARTICLES

How to Cut Observability Costs with Synthetic Monitoring and Responsive Pipelines

Debug Logs and Analyze Trends with Log Data Rehydration

Reducing Telemetry Toil with Rapid Pipelining

Webinar Recap: Telemetry Pipeline 101

Petabyte Scale, Gigabyte Costs: Mezmo’s Evolution from ElasticSearch to Quickwit