The continuous integration/continuous delivery (CI/CD) process is very popular in the DevOps industry. CI/CD creates a more agile software development environment which provides benefits including the faster delivery of applications. As a network engineer, are there any aspects of this I can benefit from to improve network operations and achieve the same goal: design and deploy an agile network that provides customers access to those applications as fast as they are deployed? After all, quick reliable application delivery is only as fast as customers can access it.
This blog post outlines how treating infrastructure as code and implementing a CI/CD workflow can ease the life of a network engineer. It also describes how using Cumulus VX and Cumulus NetQ can simplify this process further.
What does “infrastructure as code” mean?
Generally, it means having all your network node configurations as code that you manage external to the nodes. The program identifies each individual node and renders or produces all the configurations for all the nodes in the network in one step. This also means all configuration changes happen in this code, and the code itself accesses the nodes to deploy the configurations, not the engineer. Configuration deployment can be done automatically, or an engineer can invoke it during an outage window.
Infrastructure as code can be implemented via home-made programs or DevOps/NetDevOps tools such as Ansible, Puppet or Chef. For example, Ansible allows us to represent all the switch and server configurations in the network as a piece of code. Ansible calls the piece of code a playbook. Ansible is also capable of deploying the configurations. Since Cumulus Linux is just Linux, we can use the same modules used by the servers for our switches, unifying the network.
Usually the CI/CD methodology is applied when using infrastructure as code.
What exactly is CI/CD?
CI/CD is a process for an engineer to provide their incremental code changes swiftly as well as create reproducible, testable builds. A plethora of different approaches exist to implement CI/CD, but usually a developer checks out a code branch, makes a change to that code branch and then integrates it back to the master. The new change can be immediately tested via automation (the merge can trigger the validation or the validation can happen at regular intervals). This process helps catch issues early, which subsequently saves time and allows the developers to rapidly provide their code to customers.
Some organizations go one step further by also enabling continuous deployment. Continuous deployment means that as soon as the code passes validation, it is automatically configured on delivery, deployed on the servers, and monitored which can provide time saving and convenience.
Tools such as Jenkins with Github help expedite and automate the CI/CD workflow. We will cover more information on the workflow in a future post.
As a network engineer, why do I care about CI/CD?
As we move from more traditional, monolithic networks into agile web-scale networking, we need a more efficient and reliable way to make changes to the network. Network engineers typically access nodes individually and manually perform CLI commands since all the configs live on each network node. The old method can be tedious, error prone, and may require a longer outage time for changes.
Moving into the modern era, we can apply the same CI/CD principles that apply to developers to our configurations – which are now represented as code (e.g. an Ansible playbook). The code is placed on a version controlled repository like github. If we need to make a network configuration change, we would first check out a branch of the known good code. Then we make the required changes to the code and merge it back with the master. We can then validate the changes in a virtual environment before deploying the changes to the production network.
Our new method greatly reduces potential downtime in the production network as seen below.
How can Cumulus VX and NetQ help simplify it even further?
CI/CD involves deployment and validation in a virtual environment prior to production deployment. The validation in a virtual environment greatly reduces the potential for network outage during a change. Cumulus VX is used to set up the virtual environment, as described in this blog post.
To perform automated validation, many engineers write a large number of test cases, which can be extremely tedious. NetQ, along with Cumulus VX, greatly simplifies the validation process in CI/CD by no longer requiring the engineers to write the tests, let NetQ do the test writing for you!
NetQ validation integrates the switch and the host together, so the testing can be done end-to-end. NetQ even integrates containers into the environment.
For example, a engineer may need to write a test case to check all the MTUs in the network and another test case to check all the BGP peers. NetQ has these already done – a few commands below show some MTU mismatches in the network and show a BGP peer is down, caused by a down interface. Note the NetQ commands can be run from anywhere in the virtual network, or directly from your automation tool, the choice is yours.
NetQ show commands can also be used for validation. For example, with one simple
netq show clag command, we can see the status of all the CLAG peers in the network.
We can also trace the paths from end-to-end to make sure all our ECMP paths are up and operational.
The CI/CD workflow using Cumulus VX and Cumulus NetQ to provide validation provides much faster, robust network delivery that results in more reliable access to the money making applications, increasing your company’s bottom line. As a network engineer, this also means more design time and less outage time spent troubleshooting unforeseen errors.
How can I check this out for myself?
Try it out using Cumulus in the Cloud in a blank slate environment. Cumulus in the Cloud provides an automatic VX environment to test configurations. An Ansible playbook to validate with is located right on github. After spinning up your CITC environment and downloading the playbook to your CITC environment, test it out for yourself. The instructions are right on the github page. You can then insert a few errors into the playbook such as a MTU change or an incorrect BGP peer. Run the playbook with misconfigurations and use NetQ to find the errors. Fix the errors in the playbook, re-run the playbook and watch the errors disappear from NetQ. You will see how easy it is to find configuration and deployment errors with one or two commands, which will save you time. This gives you more time for the fun stuff like designing and optimizing your network.