Logging Best Practices, Part 3: Text-Based Logs and Structured Logs
7.24.20
Isn't all logging pretty much the same? Logs appear by default, like magic, without any further intervention by teams other than simply starting a system… right?
While logging may seem like simple magic, there's a lot to consider. Logs don't just automatically appear for all levels of your architecture, and any logs that do automatically appear probably don't have all of the details that you need to successfully understand what a system is doing. We've talked about actionable logs, log levels, logs for all components and needs, and different methodologies for generating logs in parts 1 and 2 of this Logging Best Practices series. Now let's explore best practices specifically for text-based logging.
There are two general methodologies for logging in modern architectures: text-based logging and structured logging. Before we can even compare the two techniques, we should define what each term means.
What are They?
Text-based logs are a series of text-based messages, one for each logged event, with a timestamp included. Generally, these text-based messages are strings, and they're intended to be human-readable. The log message is often constructed with one or more variables in the string to enable specific data to surface for the user to examine. That practice can make the logs appear fairly uniform. However, a machine trying to parse that message based on rules can be easily tricked by a variable containing some similar part of the message template by accident, leading to misparsed logs. For this reason, we typically say humans are the primary user for text-based logging.
Structured logs, on the other hand, are a series of data-enriched objects, one object for each logged event. You can think of them as sets of metadata about a specific event, always including a timestamp and typically also including information on the system, platform, application, and various functions. Examples of structured log formats are JSON and XML; JSON is by far the de facto standard simply because it is so widespread. Unlike the data shared in text-based logs, the data from structured logs are tagged in their entirety and therefore easily discernible for a machine. While nearly every structured logging design includes a human-readable field to help with troubleshooting, humans are no longer the primary audience for logs with this methodology. Instead, we intend for machines to use the logs, and then humans are the audience for the resulting output.
How Are They Used?
In general, structured logs are preferred over text-based logs when you need to perform automated functions like alerting, monitoring, and analytics or large data-crunching jobs like searching. Structured logs are easier for a machine to parse and act upon, and that ease in parsing leads to faster search times and cleaner output. You'll also get a lot more consistency on multiple platforms. Remember: With structured logs, machines are now the primary user, not a human. You're not using the logs directly, just the machine-parsed results.
There are best practices to consider for each methodology. Most of the best practices are good common sense, but there are things to consider based on the intended audience and overall use of each type of log. We'll explore those best practices next.