NephoScale is a cloud Infrastructure-as-a-Service (IaaS) provider and cloud software development company. Their NephOS end-to-end cloud software stack powers the IaaS platform and was developed from the ground up to accelerate and simplify provisioning, deploying, and installing highly scalable cloud infrastructure. NephOS enables a hybrid cloud solution that seamlessly ties a customer’s on-premise cloud environment with their hosted off-premise cloud environment, visible in a single-pane view.
Since infrastructure is such a critical part of the business, NephoScale needs to rely on fast, scalable, and flexible designs to meet customer needs as the business grows.
When NephoScale faced capacity limitations and recognized the need to upgrade their 1G environment to 10G, the team investigated options that would allow high capacity at scale, automation, and programmability. These criteria reflected their need to respond quickly to customer demands. The criteria for choosing a solution included:
Standardize on the OS that allows freedom
The NephoScale IT team was very familiar with Linux as their data center infrastructure was running on it. NephoScale wanted to leverage existing tools like Chef and Linux skill-set for their network. They wanted to have the same operational framework on their switches and on their servers.
Eliminate Layer 2 complexities as much as possible
One of the core beliefs at NephoScale was that their infrastructure should be built for scale, and a solution should be able to take them from one rack to hundreds of racks. The architecture and the solution should support VXLAN tunneling and eliminate Layer 2 complexities.
After comparing multiple architecture and solutions from various vendors including Cisco, Arista, Dell/Force10, and Open Networking switches running Cumulus Linux, Nephoscale deployed Clos Layer 3 architecture and settled on Cumulus Linux to provide flexible solutions that can scale affordably. The solution enabled Nephoscale to deploy high capacity that was easy to manage.
NephoScale leveraged EdgeCore 10G platforms running Cumulus Linux for top of rack switches. The leaf switches were connected to spine switches in a Layer 3 Clos architecture to provide maximum flexibility and scalability.
Extensive use of BGP
The network makes extensive use of BGP not only on switches but also on some compute nodes. It also leveraged IS-IS in parts of the network. Being able to utilize routing on both switches and servers helped provide a consistent and familiar experience.
With Cumulus Linux, the process of provisioning and configuration management was entirely automated. When the switch boots for the first time, it retrieves the Cumulus Linux OS image and installs the OS almost instantly. Configuration management tools then take over the auto provisioning process, eliminating the need for an administrator to log in to the box, as all server and network provisioning is automated.
NephoScale extensively use Chef as a configuration management tool on the network infrastructure. Using configuration management tools to provision and update switches is extremely important because it guarantees that there is no discrepancy among all the switches running Cumulus Linux in various environments. This could not be guaranteed with manual configuration. Switches running Cumulus Linux provisioned with Chef, and all user accounts and configurations in the infrastructure are driven by Chef. Using Linux as a platform was not limited to configuration management. NephoScale used industry-standard tools and internal tools to monitor systems. The NephoScale team wrote some plugins for Nagios that ran on Cumulus Linux switches and published Cumulus Linux statistics into Graphite. The entire network leverages sFlow (through inMon’s sFlow) to help trend capacity and pinpoint something out of the ordinary like an attack. As for internal tools, the team was free to write their own agent to interact with the switch. The next steps to simplify installation even further was to leverage Prescriptive Topology Manager from Cumulus Networks.
NephoScale’s business can scale even better with an agile and flexible infrastructure.
From a savings standpoint, NephoScale experienced a drastic reduction in OpEx over alternative solutions using automation, expanding use of existing data center tools, and leveraging the transparency of a native Linux distribution. They realized additional savings based on CapEx cost reductions of at least 3x per 10G port over "traditional" 10G providers.