Consolidating monitoring for a complex SaaS application at scale

One of the benefits that we recognize from Outlyer, is that we’ve been able to pull old monitoring tools out of place, put Outlyer in place and we’re starting to converge everything into Outlyer so that we have a single place to go to for our Dashboards to look at everything.

Mark Schliemann, VP Technical Operations

About Moz

Moz is a Software-as-a-Service company serving over 28,000 customers headquartered in Seattle, USA. The company provides online applications that help inbound marketers with their marketing efforts and SEO.

Moz has a large and complex environment to host their SaaS applications for customers. Running on over 1,500 virtual machines, hosted on over 400 physical servers across 2 private data centers in the U.S. their environment consists of custom applications written in over a dozen languages, huge database clusters running MySQL, Reddis, Riak, HBase and MongoDB, storing 65TB of data in total, and are a large OpenStack user with a 1.5PT Swift object store.

Moz’s Monitoring Challenges

As Moz’s infrastructure grew, with multiple teams responsible for different areas of the service, the number of Open-Source monitoring solutions run at Moz quickly exploded. There was no consistent setup of each tool, resulting in different levels of coverage across their infrastructure, and making it painful for Moz to setup monitoring for new services added to their Production infrastructure.

In addition, there was no clear visibility of how different areas of the service was performing, as the tools didn’t provide easy to view and share Dashboards that could be shared not just with Operations, but Developers and Business Stakeholders.

Moz was also moving more into the world of DevOps too, which required them to provide monitoring as a self-service solution to their different product Development teams.

How Outlyer Helped

Moz is currently rolling out Outlyer across all of it’s infrastructure to consolidate and replace all their existing monitoring solutions.

Outlyer has provided a flexible but consistent framework that all theirs teams can collaborate around to easily setup monitoring for new services in Production, and has allowed them to quickly create highly visual, and easily sharable, dashboards for the services they care about, improving their visibility and coverage across their entire infrastructure.

As you can imagine, monitoring 1,500 VMs in real-time also created a scaling challenge for Moz’s legacy monitoring solutions, and Outlyer has removed this issue entirely. Moz sends as many metrics as they need to Outlyer and no longer has to manage and scale large amounts of monitoring infrastructure, freeing up their team to focus on higher value tasks.

In future they are planning to roll out Outlyer as a self-service monitoring solution to all their development teams, so that they can write their own Nagios checks, and collect the metrics they care about to display on Dashboards and alert off.

Key Results

  • Consolidated all their infrastructure & service monitoring into Outlyer increasing coverage and visibility across their service.
  • Achieved greater visibility across the organization and other teams on how Moz’s services were performing, with easily sharable dashboards.
  • Saved considerable time across the team, who no longer have to manage and scale on-premise monitoring solutions, and can get other teams to setup their own monitoring without their involvement as a self-service solution.