Petabyte Scale, Gigabyte Costs: Mezmo’s Evolution from ElasticSearch to Quickwit

4 MIN READ

5 MIN READ

Eric Satterwhite

2.5.25

Eric is a Software Engineer V at Mezmo.

4 MIN READ

5 MIN READ

At Mezmo, we handle an enormous volume of telemetry data for our customers and ourselves, requiring a robust and efficient search and analytics backend. For years, ElasticSearch served us well, but as our infrastructure grew to a multi-cluster, multi-petabyte scale, we started to see the cracks—rising costs, performance bottlenecks, and scalability concerns. We needed a change, one that would make our system more cost-effective while maintaining speed and reliability.

After extensive research, we landed on Quickwit, an open-source, cloud-native search engine for logs. The transition was far from simple—it required significant engineering effort to revamp our infrastructure and help Quickwit adjust to new demands while ensuring a seamless experience for our customers. This blog takes you through our journey: why we moved, how we made the decision, the challenges we faced, and key lessons learned.

Recognizing the Need for Change

ElasticSearch had been the backbone of our search and analytics system, but we wanted to make things better:

Cost Optimization – Managing petabytes of data is a lot of data, but it’s important to also not break the bank in the process.
Performance could be better – While there were no impacts on our service, we felt that performance could be better.
Scalability – Just like performance could be improved, we felt the same of scalability. With Quickwit, we saw an opportunity to have the storage layer separated from the compute layer, making it possible to spin up nodes faster.‍
Reliability – While we had a good process to ensure a reliable customer experience, internally the scale of our ElasticSearch deployment meant that there was a lot of attention and intervention required.

Choosing the Right Solution: The Decision Framework

Selecting a new backend wasn’t just about finding a cheaper alternative—we needed a solution that was:

Cost-efficient – Could reduce storage and compute expenses without sacrificing performance.
Cloud-native – Designed to leverage modern cloud infrastructure.
Scalable – Could handle our ever-growing data volume without constant tuning.
Performance-driven – At least on par and ideally delivered faster queries and lower resource consumption.
Reliable – Faster, cheaper and all that means nothing if the customer experience suffers.

After evaluating multiple options, Quickwit stood out for its efficient indexing, cloud-native architecture, and log-specific optimizations.

Architectural and Engineering Challenges

Migrating a mission-critical system comes with challenges. Our biggest hurdles included:

Data migration at scale – Moving petabytes of data without downtime.
Query compatibility – Ensuring Quickwit could support our existing search patterns.
Maintaining performance – Avoiding disruptions to customer experience during migration.

To tackle these, we implemented a phased migration strategy, rigorous testing, and real-time performance monitoring to ensure a smooth transition.

Executing the Migration Without Disrupting Customers

To minimize risk, we took an incremental approach:

Proof of Concept (PoC) – Tested a couple of applications running on Quickwit and compared them against ElasticSearch.
Parallel Deployment – Running Quickwit alongside ElasticSearch to validate results and make refinements as needed.
Gradual Traffic Shift – Slowly redirecting queries while monitoring performance – a break, adjust, and fix approach.
Full Cutover – Decommissioning ElasticSearch once Quickwit met all requirements.

This measured rollout ensured zero service interruptions and seamless adoption.

Key Takeaways & Lessons Learned

Recognize when to evolve – Don’t wait until performance issues impact customers.
Use a structured decision framework – Prioritize scalability, cost, reliability, and performance.
Plan for migration challenges – Expect unexpected roadblocks and address them proactively. For example, the markers to identify production issues in ElasticSearch were different in Quickwit.
Customer experience is paramount – Maintain stability while making backend improvements. With our parallel deployment model, we could switch a customer back to ElasticSearch if something wasn’t performing right in Quickwit with no interruption to the customer.
Continuous optimization is key. Post-migration tuning can unlock even more efficiencies. For example, we are still learning the right size for our cluster components and constantly fine-tuning them.

Conclusion

Switching from ElasticSearch to Quickwit transformed our search and analytics infrastructure, reducing data storage/search costs by 90% while improving scalability and performance. While the journey wasn’t easy, the payoff was worth it. For companies struggling with costly, inefficient search architectures, our advice is simple: evaluate your needs, explore alternatives, and don’t be afraid to evolve.

Click here to watch Petabyte Scale, Gigabyte Costs: Mezmo’s ElasticSearch to Quickwit Evolution on-demand.

Ready to evolve?

Try Mezmo for free

Start Free Trial

Ready to evolve?

Try Mezmo for free

Start Free Trial

true

false

TABLE OF CONTENTS

SHARE ARTICLE

RSS FEED

RELATED ARTICLES

Debug Logs and Analyze Trends with Log Data Rehydration

Reducing Telemetry Toil with Rapid Pipelining

Webinar Recap: Telemetry Pipeline 101

2024 Recap - Highlights of Mezmo’s product enhancements

Regex vs Search Terms – Finding What You Need In Your Logs

Eric Satterwhite

February 5, 2025

Eric is a Software Engineer V at Mezmo.

RELATED ARTICLES

Debug Logs and Analyze Trends with Log Data Rehydration

Reducing Telemetry Toil with Rapid Pipelining

Webinar Recap: Telemetry Pipeline 101

2024 Recap - Highlights of Mezmo’s product enhancements

Regex vs Search Terms – Finding What You Need In Your Logs

SHARE ARTICLE

Petabyte Scale, Gigabyte Costs: Mezmo’s Evolution from ElasticSearch to Quickwit

Recognizing the Need for Change

ElasticSearch had been the backbone of our search and analytics system, but we wanted to make things better:

Cost Optimization – Managing petabytes of data is a lot of data, but it’s important to also not break the bank in the process.
Performance could be better – While there were no impacts on our service, we felt that performance could be better.
Scalability – Just like performance could be improved, we felt the same of scalability. With Quickwit, we saw an opportunity to have the storage layer separated from the compute layer, making it possible to spin up nodes faster.‍
Reliability – While we had a good process to ensure a reliable customer experience, internally the scale of our ElasticSearch deployment meant that there was a lot of attention and intervention required.

Choosing the Right Solution: The Decision Framework

Selecting a new backend wasn’t just about finding a cheaper alternative—we needed a solution that was:

Cost-efficient – Could reduce storage and compute expenses without sacrificing performance.
Cloud-native – Designed to leverage modern cloud infrastructure.
Scalable – Could handle our ever-growing data volume without constant tuning.
Performance-driven – At least on par and ideally delivered faster queries and lower resource consumption.
Reliable – Faster, cheaper and all that means nothing if the customer experience suffers.

After evaluating multiple options, Quickwit stood out for its efficient indexing, cloud-native architecture, and log-specific optimizations.

Architectural and Engineering Challenges

Migrating a mission-critical system comes with challenges. Our biggest hurdles included:

Data migration at scale – Moving petabytes of data without downtime.
Query compatibility – Ensuring Quickwit could support our existing search patterns.
Maintaining performance – Avoiding disruptions to customer experience during migration.

To tackle these, we implemented a phased migration strategy, rigorous testing, and real-time performance monitoring to ensure a smooth transition.

Executing the Migration Without Disrupting Customers

To minimize risk, we took an incremental approach:

Proof of Concept (PoC) – Tested a couple of applications running on Quickwit and compared them against ElasticSearch.
Parallel Deployment – Running Quickwit alongside ElasticSearch to validate results and make refinements as needed.
Gradual Traffic Shift – Slowly redirecting queries while monitoring performance – a break, adjust, and fix approach.
Full Cutover – Decommissioning ElasticSearch once Quickwit met all requirements.

This measured rollout ensured zero service interruptions and seamless adoption.

Key Takeaways & Lessons Learned

Recognize when to evolve – Don’t wait until performance issues impact customers.
Use a structured decision framework – Prioritize scalability, cost, reliability, and performance.
Plan for migration challenges – Expect unexpected roadblocks and address them proactively. For example, the markers to identify production issues in ElasticSearch were different in Quickwit.
Customer experience is paramount – Maintain stability while making backend improvements. With our parallel deployment model, we could switch a customer back to ElasticSearch if something wasn’t performing right in Quickwit with no interruption to the customer.
Continuous optimization is key. Post-migration tuning can unlock even more efficiencies. For example, we are still learning the right size for our cluster components and constantly fine-tuning them.

Conclusion

Click here to watch Petabyte Scale, Gigabyte Costs: Mezmo’s ElasticSearch to Quickwit Evolution on-demand.

Petabyte Scale, Gigabyte Costs: Mezmo’s Evolution from ElasticSearch to Quickwit

Recognizing the Need for Change

Choosing the Right Solution: The Decision Framework

Architectural and Engineering Challenges

Executing the Migration Without Disrupting Customers

Key Takeaways & Lessons Learned

Conclusion

Petabyte Scale, Gigabyte Costs: Mezmo’s Evolution from ElasticSearch to Quickwit

Recognizing the Need for Change

Choosing the Right Solution: The Decision Framework

Architectural and Engineering Challenges

Executing the Migration Without Disrupting Customers

Key Takeaways & Lessons Learned

Conclusion

Petabyte Scale, Gigabyte Costs: Mezmo’s Evolution from ElasticSearch to Quickwit

Newsletter

Eric Satterwhite

Recognizing the Need for Change

Choosing the Right Solution: The Decision Framework

Architectural and Engineering Challenges

Executing the Migration Without Disrupting Customers

Key Takeaways & Lessons Learned

Conclusion

Newsletter

Talk to an expert to learn more

Logging in the Age of DevOps eBook

Log Data Restoration beta program

LogDNA Streaming Early-Access

LogDNA Variable Retention Early-Access

@2022 Copyright Mezmo Inc.

NEWSLETTER SECTION

Petabyte Scale, Gigabyte Costs: Mezmo’s Evolution from ElasticSearch to Quickwit

Recognizing the Need for Change

Choosing the Right Solution: The Decision Framework

Architectural and Engineering Challenges

Executing the Migration Without Disrupting Customers

Key Takeaways & Lessons Learned

Conclusion

Petabyte Scale, Gigabyte Costs: Mezmo’s Evolution from ElasticSearch to Quickwit

Recognizing the Need for Change

Choosing the Right Solution: The Decision Framework

Architectural and Engineering Challenges

Executing the Migration Without Disrupting Customers

Key Takeaways & Lessons Learned

Conclusion

Use cases >

Reduce your SIEM cost

Integrations

Use cases >

ELK Replacement

Digital Transformation

DevSecOps

Control

Learn >

Blog

eBooks

Reports and Guides

Videos

Webinars

Case Studies

Infographics

Log Management

Observability

DevOps

Kubernetes

Security

About

Customers

Partners

Newsroom

Events

Career

Culture

Compliance & Security

Contact us

Petabyte Scale, Gigabyte Costs: Mezmo’s Evolution from ElasticSearch to Quickwit

Newsletter

Eric Satterwhite

Related articles

SHARE ARTICLE

Recognizing the Need for Change

Choosing the Right Solution: The Decision Framework

Architectural and Engineering Challenges

Executing the Migration Without Disrupting Customers

Key Takeaways & Lessons Learned

Conclusion

Newsletter

Related articles

share article

Talk to an expert to learn more

Logging in the Age of DevOps eBook

Log Data Restoration beta program

LogDNA Streaming Early-Access

LogDNA Variable Retention Early-Access

@2022 Copyright Mezmo Inc.