Understanding Metric Formats and Models Like OTel, Prometheus, and StatsD
- Understand what a metric is, its components, and its significance in the context of telemetry data and IT/software applications.
- Recognize how OpenTelemetry has emerged as the standard for metric instrumentation —and what that means for teams still running Prometheus and StatsD alongside it.
- Learn how OTel, Prometheus, and StatsD differ in their data models, transport protocols, and metric type support.
- Understand why real production environments rarely run a single format—and how a telemetry pipeline functions as the control layer that manages translation, cardinality, and routing between them.
- Explore how Active Telemetry pipeline normalizes, governs, and routes metric data for both human observability and AI agent consumption.
Metrics are a bedrock of observability. They offer crucial insights into the performance and health of systems and applications—and for most production engineering teams today, they arrive in more than one format simultaneously.
OpenTelemetry has emerged as the industry standard for metric instrumentation: vendor-neutral, semantically rich, and natively supported by the major observability platforms. But production environments don't migrate cleanly. StatsD is still running in services that predate the OTel era. Prometheus remote-write is wired into infrastructure that teams aren't ready to re-instrument. Both will remain in the data pipeline for years alongside newer OTel instrumentation.
That mixed reality is where the practical complexity of metrics lives. Understanding the structure of each format, how they differ, and how to manage translation and normalization between them isn't an academic exercise—it's the operational baseline for anyone building or maintaining a telemetry infrastructure.
This piece covers the anatomy of a metric, how OTel, Prometheus, and StatsD represent data differently, and how a telemetry pipeline functions as the control layer that governs what happens to metrics between collection and consumption— whether the downstream consumer is a Grafana dashboard, a Prometheus backend, or an AI agent performing incident triage.
What Is a Metric?
In the context of telemetry data and IT and software applications, a metric is a quantifiable measure providing valuable insights into a system's performance, status, or behavior. These measurable values, captured and monitored over regular intervals, span a wide range of aspects - from system performance (CPU usage, memory usage) and application behavior (transaction rates, error counts) to user activity (number of active users, session duration).
Consider the number of users currently logged into a web application; tracking this over time gives a metric. When collected, analyzed, and understood, these metrics reveal essential patterns, facilitate troubleshooting, and empower decision-making. Their role is indispensable for IT professionals, SREs, DevOps engineers, and data analysts, as they provide a crucial lens through which these professionals can identify potential issues, understand system usage, and guide their strategies and actions.
The Components of a Metric
At its core, a metric consists of multiple components, each playing an essential role in conveying information and ensuring clarity.
Let’s explore the anatomy of a metric:
- Name: The metric's identifier, which communicates its purpose.
- Value: The core quantitative measure -- can be a single number or a set of metrics like histograms.
- Timestamp: Marks when the metric value is captured, vital for tracking temporal changes.
- Tags / Labels: Key-value pairs that provide extra context. For instance, tags such as
server:server1pinpoint specifics. - Type: Indicates the metric category, distinguishing between types like counters and gauges.
- Unit: The standard of measurement, ensuring users interpret metrics like percentages or milliseconds correctly.
- Description: Also referred to as "metadata." A brief note offering extra clarity or context about the metric's use or origin.
- Aggregation: How the metric has been condensed or summed up, such as through averaging.
Understanding the anatomy of a metric equips professionals to extract meaningful insights, ensure data integrity, and make informed decisions. Each component plays a part in painting a clearer picture of system and application behavior.
Common Metric Types and Formats
Metrics can be represented in various types and formats, each serving a unique purpose and providing distinct insights.
Common Metric Visualization Methods
- Counter: A simple tally of an event or action, such as the number of clicks on a web page. For example,
user_logins: 350signifies 350 users have logged into an application.- Note that OpenTelemetry uses a Sum instead of a counter (see OTel Sums below).
- Gauge: A snapshot of a value at a particular point in time, like the current CPU usage of a server. For instance,
current_cpu_usage: 55%implies the CPU is currently operating at 55% of its capacity. - Histogram: Distribution of numerical data -- for instance, the distribution of page load times for a website. An example would be
load_time: {100ms: 50, 200ms: 30, 300ms: 20}, showing that 50 pages loaded in 100ms, 30 in 200ms, and 20 in 300ms.
Metric Formats
- Ratio: The quantitative relation between two values. For example,
cache_hits_to_misses: 5:1indicates five cache hits for every cache miss. - Percentage: Expresses a number or ratio as a fraction of 100. For example,
disk_usage: 75%means 75% of the disk capacity is used. - Average: The sum of values divided by the number of values. For instance,
average_response_time: 200mssignifies the mean response time is 200 milliseconds. - Median: Represents the middle value in a series of numbers. For example,
median_load_time: 150msmeans the central value of all load times is 150 milliseconds. - Mode: The value appearing most frequently in a data set. An example:
mode_load_time: 120ms, indicating the most common load time is 120 milliseconds. - Range: The difference between the highest and lowest values. For instance,
temperature_range: 20-30°Cindicates temperature varies from 20 to 30 degrees Celsius.
Now that we understand the basic forms metrics come in, let's look at how OTel, Prometheus, and StatsD each represent them—and why the differences matter.
From Metrics to Models: OTel, Prometheus, and StatsD
Within telemetry, a data model defines the structure of data—specifically how metrics are represented, related, and stored. The three most prevalent data models in the observability landscape are OpenTelemetry (OTel), Prometheus, and StatsD.
These aren't mutually exclusive choices you make once and stick with. Most production environments run all three simultaneously: newer services instrumented with OTel SDKs, Prometheus scraping infrastructure components, and StatsD still present in legacy application code. The practical question isn't "which one do you pick"—it's "how do you manage the translation between them without losing fidelity or incurring cost at every hop."
Understanding the models individually is the prerequisite for that.
OpenTelemetry (OTel)—The Standard
OpenTelemetry has become the de facto standard for metric instrumentation in modern observability. It provides a vendor-neutral, specification-driven data model that supports a broad range of metric types and— critically—allows metrics to be correlated with traces and logs through shared context propagation. That correlation capability is what makes OTel the most useful format for environments where you need to connect a latency metric spike to the specific trace and log entries that explain it.
OTel uses OTLP (OpenTelemetry Protocol) for transport, which operates over gRPC or HTTP/protobuf. The major observability platforms— Datadog, Grafana, Elastic, Honeycomb— now accept OTLP natively, as does Prometheus from version 2.47 onward.
An OTel metric looks like this:
otel.cpu_utilization{service.name="serviceA", service.instance.id="abcd1234"} 0.9This metric indicates 90% CPU utilization for service instance abcd1234 of serviceA. The resource attributes (service.name, service.instance.id) follow OTel semantic conventions, which means any OTel-aware backend can interpret them consistently without custom configuration.
For teams starting new instrumentation or planning a migration path, OTel is the right default.
Prometheus (Prom) — Time-Series, Pull Based, Infrastructure-Native
Prometheus uses a time-series data model—all data is stored as timestamped values alongside key-value pair labels. It operates on a pull model, where Prometheus scrapes metrics endpoints at configured intervals rather than receiving pushed data. This makes it well-suited for infrastructure monitoring where the services being monitored are long-running and accessible for scraping.
Prometheus integrates tightly with Grafana for visualization and remains the dominant metric backend for Kubernetes-native stacks. The common pattern — Prometheus collecting metrics, Grafana rendering dashboards— is referenced frequently in comparisons involving prometheus vs grafana vs kibana, where Prometheus handles metric collection, Grafana handles visualization, and Kibana handles log search.
A Prometheus metric looks like this:
prom.http_requests_total{method="POST", handler="/api/books"} 1027Here, the metric represents 1,027 HTTP POST requests to /api/books. The label-based system is flexible but carries a cost: high-cardinality labels (user IDs, request IDs, dynamic values) can cause combinatorial explosion in time-series count, which translates directly to storage and query cost.
Prometheus 3.x added native OTLP ingestion, which means OTel-instrumented services can push metrics directly to Prometheus without a translation layer for that specific destination. The translation between OTel semantic conventions and Prometheus naming conventions (dots to underscores, unit suffixes, etc.) is now configurable via Prometheus's otlp.translation_strategy setting.
StatsD
StatsD focuses on simplicity and speed. It supports a limited set of metric types (count, timer, gauge, set) and sends metrics over UDP with minimal overhead—a fire-and-forget model that makes it excellent for high-volume, high-throughput scenarios where some data loss is acceptable.
StatsD predates both Prometheus and OTel by years and remains embedded in a significant portion of legacy application code. The debate over statsd vs opentelemetry is largely settled for new instrumentation (OTel wins on richness and ecosystem), but StatsD's presence in existing codebases is why translation capability in a pipeline remains relevant.
A StatsD metric looks like this:
statsd.page_view:1|cThis represents a count (c) of a single page view event. No labels, no resource attributes, no semantic conventions—just a name and a value. This simplicity is both its strength (low instrumentation overhead) and its limitation (minimal context for analysis or correlation).
OTel Sums Versus Time Series Counters: Where the Models Diverge
OpenTelemetry and Prometheus have distinct ways of representing cumulative values -- a distinction that matters when you're operating both simultaneously or routing OTel data to a Prometheus backend.
OTel Sums
In the OTel data model, a "Sum" captures the total accumulated value of a measurement over a specified period. Sums can be monotonic (only increasing, like a request count) or non-monotonic (can increase or decrease, like items in a cart).
- Monotonic Sums: Ideal for values that can only increase. For example, total registered users cannot decrease.
- Non-monotonic Sums: Useful when the value can decrease, such as a count of active connections that opens and closes.
Time Series Counters (Prometheus)
In Prometheus, a counter is a metric type that only goes up, or resets to zero on process restart. It's used to count the number of times an event occurs. Prometheus can handle counter resets and accurately compute rate over time even after a reset.
- Continuous Tracking: Prometheus counters capture the state at every scrape interval.
- Resetting: When a counter resets (e.g., due to a service restart), Prometheus handles the gap gracefully for rate calculations.
The Practical Difference
- Granularity: OTel sums give you the cumulative total over a reporting period. Prometheus counters give you a continuous, fine-grained increment trail.
- Flexibility: OTel non-monotonic sums handle values that decrease; Prometheus counters are strictly monotonic.
- Translation: When routing OTel data to a Prometheus backend, understanding this distinction is essential. A monotonic OTel Sum maps naturally to a Prometheus counter. A non-monotonic Sum does not -- and the pipeline handling that translation needs to be configured accordingly.
Further Differences at a Glance
There are several other distinguishing factors among these data models:
- Protocol and Transport: StatsD uses a UDP-based protocol which allows for fire-and-forget data sending with very low overhead. On the other hand, both OTel and Prom use HTTP/HTTPS protocols, which ensure reliable delivery but come with higher overhead.
- Push vs. Pull Metrics: Prometheus uses a pull model where it scrapes metrics from instrumented jobs. Both StatsD and OTel use a push model, pushing metrics to the monitoring system as they occur.
- Metric Types: StatsD supports simpler metric types such as count, timer, gauge, and set. Prometheus expands upon these types with additional ones like histograms and summaries. OTel provides the most extensive metric types supporting all that StatsD and Prom have and adding more sophisticated types like UpDownCounter, ValueObserver, and ValueRecorder.
- Contextual Information: OTel allows correlation of metrics, logs, and traces, providing a holistic view of the system. This ability is less emphasized in StatsD and Prometheus, which focus more on metric data.
- Integration and Instrumentation: All three models have wide community support and offer numerous client libraries in various languages. However, the ease of instrumentation can vary.
- Storage and Visualization: Prometheus comes with its time-series database and Grafana integration for visualization. OTel and StatsD are more flexible and do not prescribe any specific storage or visualization solution, allowing you to choose what best fits your needs.
The choice between OTel, Prom, and StatsD is crucial. It's not about just housing metrics, but shaping your telemetry data landscape. Each model carries unique capabilities affecting your data's organization, analysis, and overall observability. In essence, the right data model not only stores but significantly standardizes and influences your telemetry data.
Active Telemetry Pipelines as the Metric Control Layer
The previous sections describe three metric models that most production environments are running simultaneously. The practical challenge isn't choosing between them—it's governing the flow of metric data across all of them without losing fidelity, incurring avoidable cost, or creating data that downstream tools (or AI agents) can't reliably use.
That's the function of the Active Telemetry pipeline: not just normalizing formats, but acting as the control layer between metric sources and metric consumers.
What "Control Layer" Means for Metrics
A passive telemetry approach collects metrics and delivers them as-is to a backend. A control layer makes active decisions about metric data in flight:
- Format translation: OTel data destined for a Prometheus backend gets translated according to the correct
translation_strategy. StatsD data gets enriched with missing context before it reaches a downstream tool that expects labeled data. Translation happens at the pipeline, not inside the destination. - Cardinality governance: High-cardinality labels—user IDs, trace IDs, request UUIDs embedded in metric names—cause exponential time-series growth in Prometheus and equivalent cost inflation in commercial backends. The Active Telemetry pipeline monitors cardinality in real time and can drop, aggregate, or cap high-cardinality dimensions before they reach storage.
- Aggregation before egress: Thousands of raw request-level metrics get converted to p50/p95/p99 aggregates at the pipeline layer, reducing volume by orders of magnitude before the data hits a backend that charges per time-series or per sample.
- Intelligent sampling: Not all metrics carry the same signal value. The pipeline applies rules —spike detection, anomaly thresholds, error rate triggers—to decide what gets retained at full granularity and what gets summarized or dropped.
- Routing by destination: A single metric stream can be simultaneously routed to Prometheus for infrastructure alerting, to Datadog for engineering dashboards, and to an MCP-accessible context store for AI agent consumption—with different filtering and enrichment applied per destination.
OTel as the Ingestion Standard
For teams building or migrating telemetry infrastructure, OTel is the right ingestion format to standardize on. The Active Telemetry pipeline accepts OTLP natively, which means OTel-instrumented services feed in directly without a custom collector configuration for each destination.
The pipeline then handles the routing: OTel metrics normalized to Prometheus format for existing Prometheus backends, OTel resource attributes preserved and enriched for AI agent context, cardinality controlled before data reaches cost-generating backends.
This is the practical value of the OTel-first approach: instrument once, let the pipeline manage what each destination needs.
Managing the StatsD and Prometheus Legacy
StatsD and Prometheus data in existing systems doesn't disappear because you've adopted OTel for new instrumentation. The pipeline handles both:
- StatsD ingestion: StatsD metrics are ingested, parsed, and enriched with contextual metadata they didn't carry natively. A bare
statsd.page_view:1|cbecomes a labeled, timestamped event that downstream tools can query and correlate. - Prometheus scraping and remote write: Prometheus data flowing through the pipeline can be filtered for cardinality, aggregated to reduce volume, and re-routed to destinations beyond the primary Prometheus backend.
The mixed-format environment most teams are operating in is a design constraint the pipeline is built for, not an edge case.
Agent-Ready Metric Output
As AI agents take on incident response tasks, the quality requirements for metric data change. A human analyst can interpret a noisy, high-cardinality metric stream — it's slow, but manageable. An AI agent receiving the same stream will either consume it at significant inference cost or reason poorly from the noise.
The Active Telemetry pipeline prepares metric data for agent consumption by applying the same cardinality control, aggregation, and enrichment described above, then delivering curated, semantically consistent metric context through Mezmo's MCP server. Agents querying for metrics during incident triage get task-scoped data— the relevant metrics for this incident, at the right granularity, with context attached -- not a raw time-series firehose.
This is covered in more depth in AI-ready context engineering.
The Importance of Normalizing Metrics
Normalization ensures that metrics are consistent and interpretable across the tools and teams that consume them. But normalization at the format level -- converting OTel to Prometheus wire format -- is the minimum. The higher-value work is what happens alongside that conversion.
Mezmo's Active Telemetry pipeline handles format conversion as a baseline capability: ingesting OTel, Prometheus, and StatsD data and transitioning it to the appropriate output format for each destination using Mezmo's metric data model. That covers the compatibility problem.
The control-layer capabilities above — cardinality governance, aggregation, routing, agent-ready output -- are what determine whether normalized data is actually useful downstream, not just technically compatible.
A few things worth knowing for teams evaluating their normalization approach:
- Prometheus 3.x added native OTLP ingestion with configurable translation strategies (
UnderscoreEscapingWithSuffixes,NoUTF8EscapingWithSuffixes,NoTranslation, etc.). For the specific use case of OTel-to-Prometheus delivery, this reduces the need for an intermediate translation layer — but it doesn't address cardinality control, multi-destination routing, or agent-ready output, which still require a pipeline. - Cardinality limits remain operationally important regardless of which translation path you take. High-cardinality metric streams are a cost and reliability problem at the storage layer, not a format problem. Format normalization doesn't fix cardinality; the pipeline control layer does.
- Mixed-format environments (OTel + Prometheus + StatsD) still require normalization to a common internal representation before you can do meaningful cross-source analysis. Prometheus-native OTel support handles OTel-to-Prometheus. It doesn't handle the reverse, or StatsD, or routing to non-Prometheus backends.
Metrics Mastery: Strategic Advantage In Telemetry
Metrics are both a technical necessity and a strategic advantage for observability. The teams that get the most from them understand not just what each format represents, but how to govern the flow of metric data across the mixed-format environments that real production systems produce.
OTel is the direction. Instrument with it where you can. Understand Prometheus and StatsD well enough to manage them where they exist. And use the pipeline as the control layer that makes all three work together—normalizing formats, governing cardinality, routing to the right destinations, and preparing data for the AI agents that are increasingly part of the incident response loop.
Whether you're planning an OTel migration, working through statsd vs opentelemetry tradeoffs in existing services, or managing the prometheus vs grafana vs kibana question for your visualization stack, the pipeline is what makes the architecture composable rather than brittle.
Related Articles
Share Article
Ready to Transform Your Observability?
- ✔ Start free trial in minutes
- ✔ No credit card required
- ✔ Quick setup and integration
- ✔ Expert onboarding support
