Logging Fundamentals 1
4.10.19
Being inside a company that lives and breathes logging, observability and DevOps intelligence, sometimes it takes a moment to step back and explain what we do to our friends and family.
The simplest way we explain what LogDNA solves for companies with IT systems and software is similar to a blackbox on a plane that keeps a record of the flight data and the cockpit voice recorder. In the event something goes wrong or a system fails, the logs are readily accessible, quickly searchable and problems can be rectified.
Beginners Series
In Logging Fundamentals 1, we will introduce the basics of logs including what they are, how they are created, and the role they play in developing and maintaining systems and applications. In this lesson, you will learn:
- How logs are created, how they are structured, and the kinds of data they store
- The role that logs play in maintaining, troubleshooting, and securing your infrastructure
- The risks of not recording or retaining logs
Lesson 3: What are the risks of Not Logging?
Lesson 1: What are Logs?
A log is a timestamped record of an event that took place on a computer or device. A log can be related to almost any component running on a system, such as the operating system, services, and applications. Logs store detailed information about specific events, including:
- A description of the event
- The date and time that it occurred
- Contextual information, such as the component, user(s), or process(es) that caused the event
Logs provide a significant amount of information about how your infrastructure is operating. They can provide diagnostic information, configuration details, performance measurements, and any errors that occurred.
What is a Log File?
A log file is a collection of logs stored on a device. Log files provide a persistent chronological record of logs, making them effective tools for analyzing, troubleshooting, and optimizing your systems and applications.
The process of writing logs to a log file is called logging. Software applications and services typically create their own log file(s) to avoid mixing logs with other applications. Most log files also store their logs in plain text so you can easily review the log history. Over time, this can lead to log files taking up a significant amount of disk space, but the benefit is the ability to quickly view application-specific logs in a single file.
Modern operating systems also include services that aggregate logs from other applications and services in a single location. For example syslog is a software server and log format installed on most Linux systems. Applications can send their logs to a syslog server, which writes the logs to a common destination such as a file or other syslog server. The server then formats each log into a uniform structure, while preserving the log's original contents.
For example, the following log was generated by the systemd service on an Ubuntu server:
Mar 22 14:56:21 node1 systemd[1]: Starting Cleanup of Temporary Directories...
Let's break this log down into its individual fields:
- Mar 22 14:56:21: the date and time of the event
- node1: the name of the device that this event occurred on
- systemd[1]: the name (and ID) of the process that created the log
- Starting Cleanup of Temporary Directories...: the event's actual message
The syslog format includes additional contextual information about each log, such as its importance (severity level) and category (facility). In a future lesson, we'll explain these fields in more detail and how to define a custom log format.
Lesson 2: Why Should I Log?
Logs provide a wealth of information about operations, security incidents, user activity, errors and warnings, and countless other events. They are one of your most important assets when it comes to troubleshooting problems, identifying changes and trends, and detecting suspicious or anomalous activity across your infrastructure.
Troubleshooting and Root Cause Analysis
Eventually, a problem will occur within your infrastructure. When something does go wrong, you need to be able to determine:
- What the problem is
- Where it occurred
- How to fix it
- How to prevent it from recurring
Logs can provide all of this information. For one, logs often contain a description of the event along with the name of the host, process, or application that the problem occurred in. Application logging frameworks such as Log4Net (.NET), Winston (Node.js), and Log4j (Java) can provide even more granular information, down to the line of code that caused the error. This saves you from having to track down the source of the error, while also providing a complete history of events leading up to the error. Having this amount of contextual information can help you identify the true source of the problem and implement a more effective solution.
In addition, each log file tells an ongoing story about the activities that took place leading up to the problem. With enough log data, you can follow the trail of an error over the course of an hour, a day, a week, or even longer. You can also use this long-term operational data to set baseline expectations for how your systems behave. This can help you catch sudden changes and deviations faster.
Centralization
Modern IT infrastructures are becoming increasingly distributed, especially as more teams adopt cloud computing. When you have applications running in different data centers or on completely different platforms, overseeing them can quickly become a problem.
Fortunately, logs are portable. Not only can you copy log files between hosts, but network-capable logging services such as syslog can send logs between hosts. A common strategy is to send logs from all of your infrastructure components to a dedicated log host where they can be viewed and managed from a single location. In essence, this is how log management solutions like LogDNA work. As software architectures become increasingly distributed and abstracted, the need for centralized logging will also increase.
Persistence
Another challenge of modern infrastructures is ephemerality. Platforms like AWS Lambda and Kubernetes are designed for applications that only run for as long as necessary to complete their task. Once they are done, the platform deletes the entire application instance. Lambda functions last for just a few seconds to an hour at the most, making manual oversight nearly impossible.
Logs allow you to record the execution flow of ephemeral workloads so you can audit them even after they've been deleted.
Lesson 3: What are the Risks of Not Logging?
As we discussed in the previous lesson, Why Should I Log, logs can help you monitor your systems, troubleshoot applications, and keep a record of your applications' executions. However, logs also play an important role in security, auditing, and compliance. Not keeping comprehensive logs can lead to the following problems.
Limited Security Oversight
Logs are often the first step in auditing security incidents. Logs are such a crucial part of application security that they have become a widely accepted best practice among application developers. OWASP, a non-profit organization that promotes secure application development, lists poor logging practices as one of their top 10 most critical web application security risks.
The consequences are real for all organizations. In a study by F5 Networks on data breaches, applications were the initial target for 53% of all data breaches between 2005 and 2017. This accounted for 47% of the nearly $3.29 billion in damages caused to organizations around the world.
Logging won't prevent these attacks, but it will help you detect and respond to them in a timely manner. Even basic security logs can help you track:
- User-driven events such as logins and administrator actions
- Security alerts and errors generated by applications
- Operational logs that could identify application and system vulnerabilities
No Audit Trail
Logs are a historical record of events trailing from this moment back to the beginning of the log file. In general, logs are also considered immutable: once they've been written, they can't be changed. These two attributes of log files make them critical resources when auditing your systems.
For example, consider a log file that tracks all user-related security events on a computer. The file records login attempts, login successes, logouts, and login failures. Each log includes the name of the relevant user account, the host that the attempt occurred on, and a timestamp. With just a quick look through the logs, you can immediately determine:
- Which users are currently logged in
- The last time that a specific user logged in, and how long their session was active
- How many failed login attempts occurred
- How often a user forgets their password (indicated by the number of failed logins)
A common example is in detecting login attacks. Attackers often try to gain access to systems by randomly guessing different username and password combinations. On Linux systems, these types of events are stored in the /var/log/auth.log file. For example, these messages were generated after an attacker tried logging into an SSH server with the username admin:
Mar 27 15:09:10 client sshd[10800]: Invalid user admin from [attacker IP address] port 33239
Mar 27 15:09:10 client sshd[10800]: input_userauth_request: invalid user admin [preauth]
Mar 27 15:09:11 client sshd[10800]: Connection closed by [attacker IP address] port 33239 [preauth]
These three messages tell us when the event occurred, where the attack originated, what the result was (fortunately, the attacker was denied access), and whether this was a one-off or recurring attack. We can use this to develop strategies such as blocking the attacker's IP address, or stopping the SSH service if we're not using it.
Compliance
Certain laws and regulations—such as PCI DSS and HIPAA—have strict requirements for logging electronic systems. Your organization might be required to log certain kinds of information, retain these logs for a certain amount of time, and periodically audit these logs to ensure that your systems remain in compliance. This can include:
- All user access to system components (PCI DSS Requirement 10.1)
- Any actions performed by administrator users (European Banking Authority PSD2 Guidelines 5.1 and 5.3)
- Access to components that provide core business functionality (PSD2 Guideline 4.10)
Not complying with certain regulations can result in hefty fines and even criminal penalties. At the same time, different regulations will have different requirements for logging. If you or your organization are bound by certain regulations, you should check their requirements before developing a logging strategy.