Transform Logs into Actionable Insights with Mezmo Pipelines & Dashboards
Dive into the key steps for reshaping or transforming raw, messy, data to yield more effectiveness with analytics tools, observability tools, and AI models.
What is Data Transformation?
Data transformation is the process of converting raw data from one format, structure, or value state into another so that it becomes more useful, consistent, and compatible with downstream systems. It is a foundational step in data pipelines, analytics workflows, observability platforms, and AI applications.
Raw data is often messy, inconsistent, and not structured in a way that analytics tools, observability pipelines, or AI models can consume effectively. Transformation makes it standardized, enriched, and optimized — turning raw input into actionable insight.
Four Main Types of Data Transformation
Structural Transformation
Changing the data schema, such as converting rows to columns or JSON to CSV. This includes normalizing or denormalizing data tables and aggregating or pivoting datasets.
Content Transformation
Cleaning data by removing duplicates and fixing errors, standardizing formats, masking or obfuscating sensitive information, and converting codes or labels into human-readable form.
Enrichment
Adding contextual information such as geo lookups, user metadata, or log enrichment, and joining data from multiple sources to create a fuller picture.
Optimization
Filtering irrelevant data, sampling or summarizing to reduce volume, and applying compression or encoding to improve storage efficiency and performance.
Data transformation is the bridge between messy input and actionable insight.
What is the Purpose of Log Transformation in Data Analysis?
Log transformation applies a logarithm function — commonly log base 10, natural log, or log base 2 — to each data point in a dataset. This is distinct from observability logs (system-generated events). Its purpose is mathematical: to normalize distributions, stabilize variance, make multiplicative patterns linear, reduce outlier influence, and improve interpretability in statistical analysis.
Log transformation helps when data is skewed, when variance grows with the mean, or when the relationship between variables is multiplicative rather than additive. It is a standard preprocessing step in data science and machine learning workflows.
The Basics of Telemetry Data Transformation
Telemetry data — logs, metrics, traces, and events — is often high-volume, noisy, and redundant. Before it reaches your backend systems, transforming it in-flight controls cost, improves performance, and increases usefulness.
Basic Filtering
Filtering means deciding which telemetry data to keep, drop, or route differently before ingestion — so only the signals that matter reach downstream systems.
Well-applied filtering delivers:
- Cost reduction: fewer bytes ingested means lower storage and licensing fees
- Noise reduction: analysts and AI agents see fewer irrelevant signals
- Compliance: sensitive or regulated data can be dropped before it leaves the environment
- Performance: faster queries and dashboards built on cleaner datasets
Adding or Deleting Attributes
Telemetry is most useful when it carries the right contextual attributes — fields, tags, or labels. Transformation lets you enrich or trim that context before ingestion.
- Adding attributes enriches telemetry with useful context, increasing its analytical and operational value
- Deleting attributes removes noise, reduces data volume and cost, and limits exposure of sensitive fields
The goal is telemetry that is both rich enough to be useful and lean enough to be efficient.
Renaming Metrics or Metric Labels
Telemetry comes from many sources, each with its own naming conventions. Left unchecked, this creates inconsistency, duplication, and confusion across teams and tools. Renaming ensures consistency, clarity, and cross-system compatibility — preventing duplication, improving queryability, and making observability data reliably usable.
Enriching Telemetry with Resource Attributes
Enrichment with resource attributes turns raw logs, metrics, and traces into context-rich signals that make debugging, monitoring, and cost attribution far easier.
Resource attributes are key–value pairs — such as service.name, cloud.region, deployment.environment — that add context about the source system, environment, and infrastructure. Standards like OpenTelemetry Semantic Conventions make them portable and interoperable across tools.
Enriched telemetry can be searched, correlated across services, and attributed to cost centers or teams — transforming raw data into signals that drive faster decisions.
Setting a Span Status
A span's status summarizes the outcome of a traced operation. It is distinct from log level, exceptions, or HTTP codes — it is a normalized success/failure signal for distributed traces, set at instrumentation time and optionally refined in the pipeline.
Accurate span status:
- Enables faster triage and root cause identification
- Powers SLO compliance tracking
- Reduces cost by routing only meaningful spans to expensive backends
Mezmo Active Telemetry: The Next Wave of Observability
What is Active Telemetry?
Active Telemetry is Mezmo's approach to observability for the AI era. Traditional observability operates on a "collect everything first, ask questions later" model — generating mountains of noise, driving up platform costs, and feeding AI agents low-quality context that leads to slow troubleshooting and unreliable results.
Active Telemetry flips this model. Instead of passive collection and reactive analysis, it engages with telemetry at the moment it is generated, making intelligent, real-time decisions about what is valuable, how to shape it, and where to send it — before it reaches any downstream system.
The result: up to 99.98% data compression, 70–90% reduction in observability costs, and dramatically faster incident resolution.
The Three Pillars of Active Telemetry
Active Engagement
Developers get the exact telemetry they need, precisely when they need it — whether through their IDE, an MCP integration, or Mezmo's UI. No waiting on a platform team. No digging through irrelevant noise. Developers can pull live, high-fidelity data on demand.
Active Routing
Telemetry is directed with intent. Downstream systems — including AI agents and APM tools — receive only the relevant, contextualized data they need to function effectively. This slashes costs and prevents noise from propagating downstream.
Active Analysis
Intelligent, in-stream decisions about data happen as it flows. Mezmo identifies what is valuable, maintains the right level of cardinality, and detects anomalies through live tailing as telemetry is being created — not after the fact.
How Active Telemetry Works in Practice
Active Telemetry sits between your telemetry sources and your observability destinations, operating through four core functions:
Enrich & Contextualize
Adds business metadata, environment tags, and trace correlation at ingestion — the full picture is already there before analysis begins.
Filter & Reduce
Removes low-value data (verbose health checks, duplicate events, debug logs) before it reaches destinations — delivering 50–85% cost reduction for real customers.
Route & Normalize
Directs specific data types to appropriate destinations while normalizing schemas across tools.
Detect & Respond
Automatically identifies anomalies and cost spikes, triggering reroutes or capturing diagnostic data during incidents in real time — no batch delay, no manual intervention.
Active Telemetry for AI Workflows
As AI agents take on more operational work — incident response, root cause analysis, automated remediation — the quality of the data they receive becomes critical. Bloated, noisy telemetry leads to context window overload, hallucinations, and slow resolution.
Active Telemetry is built to serve AI natively. It deduplicates, clusters, and enriches data before agents see it, ensuring they receive curated, right-sized, and trustworthy signals. This is also the data layer powering AURA, Mezmo's production AI agent platform for SRE workflows.
Every token of noise removed from the telemetry stream saves inference cost and makes AI-driven decisions faster and more reliable.
Mezmo Telemetry Pipeline: What is it?
The Mezmo Telemetry Pipeline is the real-time, cloud-native engine that powers Active Telemetry. It sits between your telemetry sources and your observability platforms — giving you full control over what gets ingested, how it is shaped, and where it goes.
Core Capabilities
Ingestion
Supports a wide range of sources: applications, services, cloud infrastructure, Kubernetes, OpenTelemetry, Fluentd/Fluent Bit, syslog, and more. Handles high-volume, real-time data at scale.
Transformation
- Filtering: drop irrelevant data such as debug logs and health checks
- Attribute management: add, delete, or rename fields and labels
- Enrichment: append metadata such as service name, region, environment, and trace IDs
- Normalization: standardize metric and log naming, formats, and schemas
- Redaction: remove sensitive or PII data before it leaves your environment
Optimization
- Sampling and aggregation to reduce volume while preserving signal
- Cardinality control: monitors and limits metric cardinality in real time, preventing exponential cost increases while preserving essential dimensional data
- Tail-based sampling with instant rehydration: filtered-out data can be replayed the moment an anomaly is detected
- Compression and batching to improve throughput and lower cost
Routing
Dynamically send different data streams to different destinations:
- Enriched traces to APM tools (Datadog, New Relic)
- Processed logs and metrics to low-cost object storage (S3, GCS)
- Alerts and anomalies to the teams and workflows that need them
Multi-sink delivery means one copy of telemetry can serve many teams simultaneously.
Context Engine
Acts as a pipeline agent — automatically catching events and anomalies to trigger pre-defined SRE actions, without requiring manual intervention or batch processing delays.
How the Mezmo Telemetry Pipeline and Active Telemetry Transform Your Data
Data ingestion and normalization
Inconsistent data from dozens of sources arrives standardized and ready to analyze. No more manual wrangling before every query.
Attribute enrichment and context
Telemetry gains business and operational context at ingestion time, making root-cause analysis faster and correlations obvious rather than manual.
Transformation and optimization
Less telemetry volume means lower cost, faster queries, and clearer dashboards — with no loss of the signals that matter.
Routing and multi-use distribution
The same telemetry simultaneously powers observability, compliance, and business intelligence — without duplication or redundant ingestion costs.
Noise removal
Filtering and noise reduction cut storage costs, reduce alert fatigue, and speed up troubleshooting by ensuring analysts and AI agents only see relevant data.
Real-time insights
Data shaping at ingestion time means real-time visibility into relevant events with full context — service, environment, location, trace correlation — from the moment an event occurs.
Actionable decisions
Combined with Active Telemetry's detect-and-respond capability, transformation enables automated actions — rerouting, alerting, capturing diagnostics — the moment conditions change.
Together, the Mezmo Telemetry Pipeline and Active Telemetry turn raw observability data into a competitive advantage: faster incident response, lower costs, and a reliable data foundation for AI-driven operations.
Transformed logs flow into Mezmo for search, visualization, and alerting. Learn more
Related Articles
Share Article
Ready to Transform Your Observability?
- ✔ Start free trial in minutes
- ✔ No credit card required
- ✔ Quick setup and integration
- ✔ Expert onboarding support
