Mezmo Best Practices: Live Long and Log
8.25.20
We examined best practices for logging in a prior series. However, how can you apply those best practices in real life? Let's dive into how you could use Mezmo, formerly known as LogDNA, in an opinionated manner to utilize best practices to bring value to your DevOps-focused projects.
How can we ensure we follow best practices and keep our logs secure and compliant as noted in the previous series? Let’s pretend we're setting up centralized log management with Mezmo for a new team and project. We want to ensure we have actionable logs that developers, sysadmins, SREs, security analysts, and others can search and use to build, operate, and secure systems up and down the stack. Let's explore how we can do that.
Why
By gathering all of your logs into a centralized log management system, you can ensure that all of the information, details, and history for any log line—the context—is available as needed when you need it. You also ensure a more secure logging approach. By its very nature, a centralized log management system like Mezmo is append-only. There's no way for a user's system to delete or tamper with prior log lines. Ideally, you'd ensure that archiving is also turned on (if you need it) to send an archive of your logs to a read-only location separate from your main infrastructure. You still need to secure your logs on transmission; that's where encryption comes in (Mezmo encrypts your archives).
Setting the Stage
You've decided to use a centralized log management system. Remember how we said we needed to separate our concerns? You can use organizations within Mezmo to help ensure staging logs are separated from production logs, for example. Let's say you have four environments—dev, test, stage, and prod—with an automated, gated release process to promote changes from one environment to another [1]. If you were to get audited, it's likely they'll just want to see one set of logs at a time. Or maybe you want your development team to have access to dev, but they are restricted from accessing any prod data for compliance reasons. A Mezmo organization for each environment ensures that, no matter what happens, there will be no accidental access of any log data.
Separate organizations by environment also ensure you can set different retention plans and archiving needs for each environment. You don't need to retain logs for dev environments nearly as long as logs for a prod environment. In fact, you may only want live tail of your logs in dev whereas you probably want to retain and archive your logs in prod to stay in compliance and aid any forensic work for troubleshooting or security needs. You can do that in Mezmo by maintaining a separate organization for each environment. This method is a great way to manage the cost of your logging system, too, since each organization has a separate billing plan. You can use a barebones plan for a dev environment that just needs the basics and a full-fledged plan for prod that needs all the bells and whistles. Pretty helpful! But wait, there's more!
With separate organizations comes the ability to fine-tune access controls by environment, as well, with standard role-based access control (RBAC). Devs may need administrative access to a dev environment's logging system, for example, but you don't want to provide anything but read access to prod's logging. Set up access control for every organization when you create it. You'll be happy you did.
Ship It
So, you've set up your Mezmo environments and organizations. Now it's time to send logs. When we start thinking about how to send logs, there's a lot of options for log ingestion with Mezmo from the Mezmo Agent for different platforms or operating systems to syslog to our code libraries and API. Having options to cover many use cases ensures that anyone adding projects to an ecosystem is more likely to send logs to the centralized platform. The Mezmo code libraries are one way to ensure your application logging is properly structured. Using a logging library that you import into your code is one way to ensure standardization of your log format. Also, they're a great way to integrate with Mezmo without needing to manage additional infrastructure, so if you don't get to manage your infrastructure enough to configure the platform agents, you can at least own the dependency in a familiar way.
Along with knowing how you're sending logs, you need to understand whether you're sending text-based logs or structured logs. We already said that structured logs are better for centralized log management systems as you can do more automation, get better search results, and otherwise get more use out of your logs. Of course, you may still choose to use text-based logs for any number of reasons. Have you considered how you'll ensure that you can handle that data quickly? Text-based logs can often be hard to parse—many a development or operations IC hears the word "regex" and dreads needing to own the relevant ticket. If you use tokenizable strings, however, you could use Mezmo custom parsing system to avoid the use of regex altogether.
Let's back up. When passing any type of logging message as a string to Mezmo, our system will attempt to parse the message based on common standards for logging messages. As a result, you will end up with a pseudo-structured log in our UI as a starting point. Sometimes, if your message is almost-but-not-quite standardized, the automated workers may parse the logs a bit differently than you intended. The Mezmo UI, however, includes a system called custom parsing. In that system, you can build out your own parsing template and test it against data that you've sent to Mezmo. So, if you've followed the best practice of tokenizable strings, you will have a much easier time building out a parsing template for any strings that aren't being parsed as you expected them to be.
Of course, if you send JSON structured logs to Mezmo, you don't have to think about that potential extra step.
Now What
Once we get logs flowing into Mezmo, we can start thinking about some of the general best practices. Want to ensure no system is sending unnecessary data like random artifacts to logs? Run an initial search for that with some filters and then create a view. From a view, you can create a related alert. Are there certain teams that only want to see ERR or higher level logs? An admin can create a filtered view and then share that view with those teams. Because you have the ability to tune a view to a team's need, you reduce the temptation to turn off logging because it is too noisy so long as everyone follows that best practice of using log levels wisely. Remember the suggestion to get a company- or organization-wide agreement on which log levels to use when? This example is why that's so important.
With the logs in Mezmo, you can set up presence and absence alerting, too, which meets the process alerting practice from the security and compliance best practices article. Not only can you get alerts based on the presence of things like personal identifiable information (PII) patterns in your data, but also you can get alerts when systems stop sending logs of any type that you can filter on. Alerts can go to your chat system, paging system, or email client of choice through integrations or a webhook, so go wild! Get alerted for everything you need, however you want to get those alerts. Those alerts are often a first line of awareness when something goes down, gets attacked, or otherwise has something wrong.
Speaking of troubleshooting or debugging, that search feature becomes very important when you need to get data immediately with a minimum of fuss. Let's say you know the time something happened and want to get context. If you have retention turned on, you can either jump to a specific time within your retention window with a simple search like yesterday at 11a (note that’s local time) or you can use the timeline to highlight a section to jump to. Not completely sure of the time, but know part of the log you need? That extra human-readable string in a structured log or the simple language in a text-based log comes into play here as you can use natural-language search syntax to find that snippet anywhere in your live tail or your retained logs. Pretty helpful when you have a manager, product owner, or other user panicking about something not working and needing it back up ASAP.
Live Long and Log
All in all, using a platform like Mezmo to centralize your logs in one place is a smart log management practice. You can handle access, compliance, and security concerns; gather all of your data in one place to manage application ecosystems large and small with many options for ingestion; keep a watchful eye on logging best practices; and get what you need, when you need it, as fast as you need it. In the end, the biggest thing to remember is this: Always check your logs.