Gartner IOCS Conference Recap: Monitoring and Observing Environments with Telemetry Pipelines
12.14.23
Last week, I attended the Gartner IT Infrastructure, Operations & Cloud Strategies Conference (IOCS). Gartner IOCS is my favorite conference every year because of the quality and level of the presentations. Gartner analysts deliver most sessions and put a lot of effort into the presentations and supporting research.
I’d like to highlight two sessions that I found to be very informative. One was “Use Telemetry Pipelines to Efficiently Monitor Your Hybrid Environments” by Gregg Siegfried, VP Analyst at Gartner. The second was “The Future of Observability” by Mrudula Bangera, Director Analyst at Gartner. Here are some highlights:
Use Telemetry Pipelines to Efficiently Monitor Your Hybrid Environments
Gregg Siegfried started by saying the I&O (Infrastructure and Operations) has long had an image problem. Telemetry is treated like wastewater, and we, the plumbers, are left to make sure it's properly dealt with. Yes, there’s an operational aspect to gathering telemetry data in terms of service reliability and application performance. However, business-critical insights are in the data if you “know where to look.”
Data Engineering Meets Telemetry Pipelines
Data engineering is precisely what I&O needs. Happily, there are tools to help. Telemetry Pipelines are the pathway to engineering operational telemetry data. Gregg indicated that data engineering principles must be applied to infrastructure and operational telemetry data.
Telemetry Pipelines Help Data Engineering
Gregg noted that the arrangement of I&O-specific telemetry pipelines mirrors that of the general-purpose data engineering pipeline. However, there is some specific terminology to know for operational telemetry data.
Gartner defines the functions of a telemetry pipeline as Collect, Transform, Enrich, and Route. Collection is straightforward and may involve a vendor agent or other popular interfaces, including syslog, Fluentd, Fluent Bit, Logstash, OTLP, or Splunk forwarders. The Route function sends the data to one or more destinations based on use case. Transform manipulates the data into more efficient forms. You can change the structure and format, turn logs into metrics, rename fields, mask fields, sample, filter, normalize, reduce or any of a wide variety of transformation processes.
What Does Data “Enrichment” Mean?
Enrichment means adding context, sometimes from external sources to your data in motion. Examples of data enrichment would include timestamps, geolocation data, names, IDs or anything that can be correlated with the data to help analysis at the destination system. Gregg also categorized data rehydration as an enrichment function.
Use Cases for Telemetry Pipelines
Gregg described the common use cases for telemetry pipelines.
Cost Control
Some IT organizations send terabytes of telemetry data without understanding what is needed or what can be discarded or placed into object storage. A telemetry pipeline is a natural place to make these decisions before the data incurs a toll by driving up ingress charges on the destination system. For example, a telemetry pipeline filters, deduplicates, summarizes, or routes data to low-cost object storage.
Consolidation From the Edge
Centralized or edge pipelines consolidate and organize data within the user’s environment. With a SaaS service, central configuration control and deployment can significantly enhance the ability to scale data optimization. Gregg pointed out that it is not always a requirement to egress data to an external service or SaaS to process it. This can be a huge advantage for organizations concerned with data integrity.
Editorial note: Mezmo introduced Edge Pipelines in October 2023; read Introducing Mezmo Edge: A Secure Approach To Telemetry Data.
Maintaining Unified Taxonomy
Different teams and tools will naturally name and categorize things differently. For example, one team will use the term “Host ID,” and another will use “Node Name.” This simple difference can complicate backend analytics. However, a telemetry pipeline can normalize such differences to improve consistency, clarity, and analysis. In addition, if your telemetry data contains PII or other confidential information, a centralized telemetry pipeline can apply a consistent set of rules for masking or redaction instead of leaving that work to be repeated by every individual team.
OpenTelemetry - Many to Many
It’s not uncommon to have many collectors of telemetry data and multiple observability and analytics tools. For example, you may have many OpenTelemetry collectors. This creates a “many to many” complexity that can be simplified with a telemetry pipeline. The telemetry pipeline can also co-reside with a centralized control plane, easing the management of many distributed or edge-located collectors.
Use Case Summary
Overall, a telemetry pipeline can significantly reduce costs, increase efficiency, and improve collaboration. Gregg used the expression “manage your telemetry, or it will manage you.”
“Manage Your Telemetry, or it Will Manage You”
Open-Source Telemetry Pipelines
In addition to overviewing various telemetry pipeline vendors, including Mezmo, Gregg went on to describe some of the open-source options for telemetry pipelines.
Gregg noted that open-source Vector (acquired by Datadog) is a good option because it is optimized as a telemetry pipeline rather than a generic data streaming function such as Kafka. Vector is well-documented, has a wide selection of sources and sinks, and can be deployed in a highly available manner. If you are staffed to support open-source at this scale, Vector is preferred because of its core telemetry pipeline functionality. Also mentioned as open-source options were observeIQ’s BindPlane OP, Apache Kafka, and Apache NiFi.
Recommendations
In conclusion, the benefits of telemetry pipelines include cost reduction, analysis simplification, and improved incident response.
The Future of Observability
Mrudula Bangera started by saying that monitoring is dead. That definitely got everybody’s attention! Monitoring fails to provide the context to understand the “whys” behind anomalies, making root cause identification very difficult. Whereas observability helps with unknown unknowns, answers you don’t find in your monitoring dashboard.
“Monitoring is Dead”
More than Metrics, Logs, and Traces
So, observability is the future, but it is not delivered by one tool. Observability is delivered through a combination of capabilities, including the ability to analyze metrics, logs, and traces. However, as Mrudula explained, observability should also include telemetry data about APIs, networking, service mesh, service topology, as well as business context.
Historically, telemetry data was gathered by vendor-specific agents. But increasingly, it is important to consider standards such as OpenTelemetry or eBPF (extended Berkeley Packet Filter) and other open standards. Mrudula explained that the process of instrumenting for telemetry data is becoming more vendor-agnostic.
Telemetry Data Overload Causes Pain
She also explained that the huge volumes of telemetry data generated by observability agents and collectors drive unpredictability in costs. Sometimes, this data can be in the Petabyte scale, incurring huge expenses for observability solution subscribers.
Observability costs use data input quantity to meter pricing, and because most users do not control or optimize their telemetry data, costs can spike unexpectedly, leaving observability subscribers helpless.
Telemetry Pipelines Can Help
Mrudula explained that the telemetry data overload and lack of control problem can be solved using a Telemetry Pipeline. A telemetry pipeline can filter the data, derive metrics, and route the telemetry to the appropriate tools and teams or low-cost storage, significantly increasing efficiency.
In this slide, shown during the Future of Observability presentation, Gartner recommends deploying a telemetry pipeline to reduce costs and stay in control of your data.
About Mezmo
In the Gartner presentations last week, vendors were not the main focus, but I’d like to take a moment to describe Mezmo’s solution.
Mezmo offers a Telemetry Pipeline deployable in your environment and centrally managed, or deployable in the Mezmo Cloud. Unlike competitive solutions, both Edge and Cloud use the same control plane and are managed as a single set of pipelines. Mezmo supports a wide variety of sources, including OTLP, Azure Event Hub, AWS S3, Kafka, Kubernetes, Datadog agent, and Splunk HEC. Processors include the ability to parse data from common sources, dedupe, filter, sample, and transform logs into metrics - just to name a few. Many popular destinations are supported, including Datadog, Splunk, Grafana, or S3 for low-cost storage.
A visual user interface makes Recipe selection and Pipeline configuration easy. If preferred, Pipelines can be created as code and automated using Terraform. All Mezmo functions are accessible via APIs.
The Mezmo workflow starts with Understanding your data and then recommending pre-configured pipeline Recipes to optimize the data for common log patterns.
If you’d like to understand your data better and quickly realize the power of what a telemetry pipeline can do, let’s get in touch!
SHARE ARTICLE
RELATED ARTICLES