LogicMonitor builds scalable, reliable data centers with Cumulus Networks

"

Deploying Cumulus Linux on our data center switches mean our network engineers can do automated configurations and policy management with tools like Puppet and Ansible. We already have a team of incredibly talented engineers who are well versed in our current toolset (Ansible, Puppet, and more) but strategically we want to utilize our existing team, rather than find and hire people that specialize in a particular vendors’ environment. Eliminating the need to have employees who specialize in vendor specific configuration interfaces makes hiring less complicated, easier to train staff, and increases our efficiency as an operations team.

Andrew Martin, Network Engineer at LogicMonitor

Industry

SaaS provider

Business Objective

Standardized, scalable solution

Partners

Puppet, Ansible, Dell

Overview

LogicMonitor is a SaaS-based performance monitoring platform for Enterprise IT that was founded in 2008. The company delivers the world’s most complete IT infrastructure performance monitoring solution for cloud, data center and on-premises environments with the simplicity of a SaaS-based platform.  It provides native support for thousands of devices and instances and is integrated with a wide range of IT tools such as ServiceNow, Puppet and Atlassian. Their production infrastructure consists of multiple global data centers as well as cloud infrastructure in AWS. At about 250 employees, LogicMonitor is based in Santa Barbara, California with additional offices in Austin, Boston, London, China, and Singapore.

Challenges

LogicMonitor required a scalable network infrastructure that is as fast as the applications they deliver to their customers. Before migrating to Cumulus Linux, LogicMonitor’s data center ran multiple network devices, each with its own operating system, set of features, and licenses. They found managing all of these different devices time consuming and expensive.

Their network consisted of Cisco switches, with Equinix as a colocation provider. As the business scaled, LogicMonitor found the existing network design did not have the ability to scale at the same pace. They were managing a L2-heavy architecture, which became increasingly complex to configure and scale. Complicated licensing plans created additional issues and limitations, so they utilized a mixture of OSPF and BGP routing protocols.

The limited set of automation tools available made it difficult for LogicMonitor to streamline network operations. Their goal was to use Ansible for automated server and networking configurations, yet Martin expressed their frustration because “the Cisco-Ansible modules were limited and had their own set of complications.”   

The tipping point occurred when an outage with their legacy solution took a portion of their network down, cutting off access to various services.  Because of limited visibility, they had to do manual troubleshooting. After the outage, the LogicMonitor network team knew they had to change the design and implementation of the network.

Solution

With the development of a new data center, LogicMonitor saw the opportunity to refresh their technology. After deciding that legacy solutions were too expensive, difficult to manage, and not redundant enough, LogicMonitor turned to Cumulus Networks to build the company’s newest data center network. They first tested the open source technology with Cumulus VX, and recognized that the Cumulus Linux operating system provided the reliability and scalability they were searching for.  Martin states, “We fell in love with it the moment we used it.”

Part of resolving the issues associated with traditional networking involved removing the L2 architecture’s complexities and implementing a L3 architecture with routing on the host. This architecture includes at least three spine switches with two leafs per rack and an out-of-band network for management. With 64 servers per rack, the racks are quite dense. LogicMonitor also required at the very least SNMP for network monitoring, which Cumulus supports and was able to incorporate into their new network. Through this solution, LogicMonitor improved redundancy, which in turn improved the network’s reliability.

In the future, LogicMonitor is looking at enhancing the way they both proactively validate configuration changes before pushing to production and how they do network monitoring with Cumulus NetQ.  As an agent based technology, NetQ provides visibility into every Linux switch to simplify network analysis, validate the impact of potential changes, and improve troubleshooting across the Linux data center.  The benefit to LogicMonitor of using NetQ will be around providing enhanced visibility into faulty network states in real time to prevent network outages. This actionable insight through proactive alerting and host-to-switch diagnostic capabilities will easily integrate with LogicMonitor’s performance monitoring platform in order to efficiently streamline their operations.

"

Since deploying Cumulus Networks, we were able to expand our data center 66% faster than with previous vendors

Andrew Martin, LogicMonitor Network Engineer

Results

By implementing Cumulus, LogicMonitor was able to design a more flexible, scalable solution to efficiently build new data centers, and better keep up with the speed of business and general company growth.  Other major benefits include:

  • Reduced TCO: By switching from legacy solutions, LogicMonitor greatly reduced their CapEx, saving on both licensing and hardware costs. They were able to reduce future OpEx since they no longer required additional Cisco or Juniper specialists; their engineers are all capable of learning Cumulus Linux.

  • Faster time to market: Using Cumulus Networks solutions, LogicMonitor’s data center was up and running in no time. In fact, since deploying Cumulus Networks, LogicMonitor found that they were able to expand their data center 66% faster than with their previous vendors.

  • Reliability: With L2 complexities out of the way, LogicMonitor can rest assured that they are at less of a risk of experiencing human misconfigurations or outages. The additional visibility gained with available Cumulus tools, established safeguards against network errors. Seeing as hourly downtime can cost businesses thousands of dollars, this security is crucial.

Summary

LogicMonitor needed a more reliable, scalable networking solution for their newest data center to keep up with the speed of business and their growth. By deploying Cumulus Linux, and revamping their L2-heavy architecture, LogicMonitor reduced their TCO, created a network that is less susceptible to outages and misconfigurations and achieved the scalability required for the company’s continued growth.

“We are growing at an exponential pace, but because we have Cumulus, we’re not worried about expanding the network to keep up with that growth,” Martin says. “We now have an infinitely more reliable and scalable network.”