SHARE ARTICLE
What is Structured Logging?
Learning Objectives
- Understand why you need structured logging
- Learn examples of data necessary for log analysis
- Explore common log structures
As humans, we understand sentences and words based on their grammatical and spelling structure. Machines also need formatted input to read and understand instructions. Any errors in the way input is structured could cause bugs or unknown consequences in the same way humans can’t always parse an incorrectly formatted sentence. Structured logging is formatted messages (e.g., application errors or server messages) that can be read and parsed by analytics applications to provide visualized output to the reader. This reader could be server administrators, security analysts, engineers, or any other individual responsible for making decisions based on the parsed data.
As an example, the following image is a structured log record:
Why use structured logging?
Without structured logging, organizations would not be able to use a standard analytics tool to view data. They would need to build their own solution or be limited to only applications that can read non-standard structures, and it’s possible that building a custom solution will fail. Formatted data in log files gives organizations the ability to choose any security information and event management (SIEM) and/or log aggregation tool that supports the stored structure. Many of them support standard structures out-of-the-box, so no additional work is needed to support logs.
There are three major issues with unstructured event logs:
- Non-standard formatting requires a customized solution to read the logged events. Any parsing of the data must use a customized solution.
- Should administrators need to read the raw data files, it can be difficult to read if the structure is unknown.
- Should any other applications need to consume the logged data, they would not be able to work without additional support.
Structured logs also give developers a way to write customized applications using open-source third-party libraries that already support a standard structure. This saves on development time without building a tool from scratch to consume data.
What data is included in structured logs?
Logs are only as useful as the information that they contain. The information included in an event is what is used to create dashboards, graphs, charts, algorithm analysis, and any other useful information that can be used to determine the health of the environment. These structured logs make searching for specific events more efficient. With parsing applications such as log analysis tools, structured logs can have any number of data points in a single event.
Some examples of information that can be included in an event:
- The date and time the event happened
- The type of action triggered (e.g., informational, critical, warning, error, etc)
- The location of the triggered event (e.g., an API endpoint or running application)
- A description of the event (e.g., a credit card failure could be logged to detect potential fraud)
- A unique event ID
- The customer ID or username
- Protocol used to access the application
- The port used to execute a function
Logged events should contain enough information so that any critical errors or can be remediated, but not all information should be stored in logs. For example, never store passwords, secrets, keys, or any other sensitive information in logs. These logs should be locked down so that only authorized users can access them, but in a cybersecurity event where an attacker can escalate privileges, these logs could be used in a data breach, which is why sensitive data should never be stored.
What are examples of structured log events?
The formatted information in structured logs are typically in JSON. JSON is a standard format across different operating systems and environments, so it’s easy to share data across multiple platforms (e.g., Linux and Windows). Older structures might use XML formatting. Most logging applications either in the cloud or on-premise will use standard structures so that administrators can aggregate logs in a single location and use an application such as a SIEM to parse and read the data. For example, an organization could have five servers collecting log data. These five servers log events that are then aggregated into one location where a SIEM consumes events and displays information in a visualized format for analysts to review.
Most SIEMs and other applications will automatically parse standard structures, but it is still beneficial for organizations and developers to know the structure should they decide to create their own applications that will consume the data.
The following is an example of a JSON log entry:
{
"User": "customer-43683",
"Level": "INFO",
"EventTime": "2021-2-20 11:20:30",
"Hostname": "web-server1",
"SourceName": "ecommerce-application",
"ProcessID": 9485,
"AuthenticationMethod": "Windows",
"Reason": "incorrect password",
"SourceIPAddress": "10.0.1.155",
"SourcePort": 80,
"Protocol": "HTTP"
}
Taking the above JSON entry, you could assume that the entry is a failed authentication attempt from a specific customer accessing a web application on port 80 using HTTP or possibly a 401 authentication failure. Using this entry alone, you do not know if this was from a malicious authentication request or a standard user who attempted to authenticate into the application incorrectly. However, if you had numerous events that displayed the same information, it could be malicious intent that could later be reviewed by a security analyst.
The above format is JSON, but here is the same event in XML:
<event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"></event>
<system></system>
<provider name="Microsoft-Windows-Security-Auditing"></provider>
<eventid>54849625</eventid>
<level>INF</level>
<timecreated systemtime="2021-2-20 11:20:30"></timecreated>
<hostname>web-server1</hostname>
<eventdata></eventdata>
<data name="User">customer-43683</data>
<data name="Level">INF</data>
<data name="SourceName">ecommerce-application</data>
<data name="ProcessID">9485</data>
<data name="AuthenticationMethod">Windows</data>
<data name="Reason">incorrect password</data>
<data name="SourceIPAddress">10.0.1.155</data>
<data name="SourcePort">80</data>
<data name="Protocol">HTTP</data>
As you can see above, the same information is contained within the XML log entry, but the structure and syntax are different. If the data cannot be parsed, it would be useless to the application and unusable.
Any logging solution chosen should support the structure that you need. Before choosing a SIEM, ensure that your application writes logs in a format that you need and your logging solution parses data for efficient analytics and reporting.