Microsoft just published information on their internal tool called “CrystalNet” which Microsoft defines as “a high-fidelity, cloud-scale network emulator in daily use at Microsoft. We built CrystalNet to help our engineers in their quest to improve the overall reliability of our networking infrastructure.” You can read more about their tool in this detailed ACM Paper. But what I want to talk about is how this amazing technology is accessible to you, at any organization, right now, with network verification using Cumulus VX.
What Microsoft has accomplished is truly amazing. They can simulate their network environment and prevent nearly 70% of the network issues they experienced in a two-year period. They have the ability to spin up hundreds of nodes with the exact same configurations and protocols they run in production. Then applying network tests, they verify if proposed changes will have negative impact on applications and services. This work took the team of Microsoft researchers over two years to develop. It’s really quite the feat!
What I find exciting about this is it validates exactly what we at Cumulus have been preaching for the last two years as well. The ability to make a 1:1 mirror of your network, with matching ports, protocols, software and features, and the ability to run automated tests against this environment is the next frontier in network management.
When we released Cumulus VX, our virtual Cumulus Linux platform, in 2015 we knew immediately how powerful of a tool it would be. Our field teams are able to do greater than 90% of all their testing and training on Cumulus VX. Our QA teams use Cumulus VX to test any software based features like routing or TACACS. Even our consulting team has moved to 100% Cumulus VX based training for our instructor-led bootcamp training courses.
A common question I see from customers is “what’s different about Cumulus VX, compared to other vendor VM platforms?”. The difference is subtle, but incredibly important. First, a quick recap on the Cumulus Linux architecture. Cumulus Linux is a complete Linux distribution, based on Debian Jessie. Cumulus relies on the Linux kernel for the source of truth for all things on the system. This means every application that runs on Cumulus Linux based switches are just unmodified Linux applications. In fact we have customers using our routing suite, FRR, directly on their Linux based servers.
What’s highlighted in the image is “switchd” our switch driver that takes the information from the Linux kernel, like VxLAN tunnels, routes or MAC addresses, and programs them into the switch hardware, to give line rate performance. What is important here is that switchd relies on the Linux kernel for this information. Switchd only programs the hardware based on what is in the Linux kernel software. If it’s not in the software, it’s not in the hardware.
But again, how is this different from the VMs provided by other network vendors? The difference is that for everything we do, we rely on the software to be the source of truth. For our competitors, they frequently write “platform dependent” features with no software layer at all. The CLI commands will directly program an ASIC or line card, with no software layer in between. This means that without hardware (like in a VM) the feature doesn’t work at all. Have you ever used GNS3 and found out you couldn’t enable a VLAN or an ACL or a VxLAN tunnel? This is exactly why. A VM without the features you are running in production isn’t very useful, now is it?
By relying on the software as the source of truth, any feature that works on a switch will work exactly the same in Cumulus VX. Furthermore, we can use standard Linux techniques to map virtual interface names to exactly what is cabled in production. Even if you skip ports, utilizing our open source topology converter tool, no matter what ports are in use, we can produce a virtual environment that is an exact replica of your physical switch.
With the ability to build a system of 100s of virtual switches together, cabled exactly as you would cable them in production, with the exact same software features you have in production, the possibilities are endless. We’ve shown how customers can build automated continuous integration/continuous delivery (CI/CD) so that any proposed network change can automatically be validated against a set of user defined tests. Some of our customers have even shown off what they are doing. We’ve made this even easier for them with Cumulus NetQ; by leveraging the “netq check” commands, customers don’t even need to manually write test suites as part of their testing pipeline. Imagine replacing 100s of lines of python code with a simple “netq check bgp” to see if BGP is running correctly on every device, no matter if there 4 switches or 400.
If you want to know how you can do in weeks what took a Microsoft research team two years, reach out to your friendly neighborhood sales team to learn more about Cumulus VX and NetQ.
This blog post is part of a series called “NetDevOpEd” where various Cumulus employees and partners write an “op-ed” style piece on an industry topic.