Thanks to the limitations of traditional networks, network operators are accustomed to doing everything manually and slowly. But they want to perform configuration, troubleshooting and upgrades faster and with fewer mistakes. They’re ready and willing to learn a new approach, but they want to know what their options are. More importantly, they want to do it right. The good news is, regardless of your organization’s specific goals, you can operationalize Cumulus Linux to meet those objectives faster and more consistently. This post will help you understand your options for developing agile, speed-of-business workflows for:

  • Configuration management
  • Backup and recovery
  • Troubleshooting

And if you’re looking for a deeper, more technical dive into how to implement these network operations, download this white paper.

Configuration management

Automation

The biggest disadvantage of manual configurations is that they simply don’t scale. Implementing BGP across dozens of switches is a copy-and-paste endeavor that’s time-consuming and prone to error. Not only that, checking that the configuration took effect and works as expected requires hop-by-hop verification in addition to testing route propagation and IP connectivity. However, In a small network, there’s no shame in at least starting out doing everything by hand.

Cumulus Linux lets you use a configuration management platform such as Ansible, Puppet or Chef to make frequent, sweeping changes at scale. But more importantly, automation comes with an “undo” button that lets you revert those changes immediately and painlessly should you change your mind.

By storing your configurations in a centralized repository using a version control system such as Git, you essentially keep a backup of all your past and current configurations. If you make a change and it fails, you can revert to a previous working configuration with the push of a button. Version control functions as a de facto backup. Even better, version control makes it easier to push your configurations to a test environment that mirrors your production network.

Network Command Line Utility (NCLU) vs. editing configuration files

Regardless of whether you choose an automated or manual approach, you also must decide how to get your configurations onto your devices. You have two options: editing flat configuration files, or the Network Command Line Utility (NCLU).

Editing configuration files by hand is prone to error, and there are no safety checks to ensure that the directives in your configuration files are valid. Even if you successfully push out a new configuration file, you won’t necessarily know something is wrong until you see symptoms of a broken network. You should test all changes in a simulated lab environment first.

Thankfully, editing flat files isn’t the only option. You can use the NCLU to handle this process behind the scenes. Rather than editing one file to change your IGP configurations, and another to change your network interface settings, you can use the NCLU to do it all. One big advantage of the NCLU is that it checks for typos and doesn’t accept invalid commands, in much the same way Cisco IOS rejects commands with missing parameters or invalid values.

The NCLU has two wrappers that let you invoke it manually via the CLI or through the NCLU Ansible module. For manual work, the net command lets you specify configuration commands directly at the CLI. For automation, Ansible ships with an NCLU module that lets you specify the same commands in your declarative code.

NCLU also offers a rollback feature that lets you roll back to a previous configuration, regardless of whether that configuration was done manually or via automation. Issuing a net show commit history shows you recent commits, which include both manual and automated changes.

Backup and recovery

Cumulus Linux is just Linux, so if you’re already backing up Linux hosts in your environment, adding your Cumulus Linux network devices to your regular backup processes is seamless. There are numerous network-specific folders and files you should back up, including both Linux system files and Cumulus-specific configuration files. Some of these include:

 /etc/network/
 /etc/cumulus/ports.conf
 /etc/cumulus/switchd.conf
 /etc/frr/

This list isn’t exhaustive, and you should see the Installation Management chapter of the Cumulus Linux User Guide for a full list.

Of course, remember to copy the file to a safe place, otherwise it’s not a backup! In addition to serving as a backup, it’s a great learning tool. You can just glance at the configuration commands to understand what the configuration does. And if you’re contemplating automation, having the net commands at your fingertips makes it a breeze to construct your automation playbooks. Check out the automated NCLU backup playbook to help you get started.

Troubleshooting

Network troubleshooting consists of three basic steps:

  • Isolating the problem
  • Implementing a fix
  • Verifying the fix resolves the problem

Cumulus Linux and Cumulus NetQ speed up the entire troubleshooting process in several ways. First, Cumulus NetQ maintains a graph of your network, not just as it is now, but how it was hours, days, and even weeks ago! This lets you identify not only what part of the network changed, but when it changed.

Isolating the problem

The first task is to rule the network in or out. Often, the network is presumed guilty until proven innocent, so most troubleshooting starts by ruling the network out as the culprit.

Once you determine the network is indeed the problem, you must determine where on the network the problem lies. Is it a network configuration error, switch operating system issue, carrier fault or hardware issue?

Cumulus Linux and Cumulus NetQ take the pain out of the troubleshooting process. You can use Cumulus NetQ to perform the bulk of your diagnostics from a single switch or management server in seconds.

Cumulus NetQ lets you pipe the output from one command to the netq resolve command to have it resolve the IP addresses of your switches to their hostnames. This gives you a powerful way to see your network topology without having to manually look up IP addresses. By just glancing at the output from Cumulus NetQ, you can determine that spine01 has layer 2 connectivity to its failed OSPF neighbors.

Implementing a fix

Once you’ve narrowed the issue down to the configuration on a handful of switches, you can start to determine exactly what changed. Because Cumulus Linux keeps a record of every change made using the NCLU, figuring this out isn’t guesswork; it’s just a matter of using the net show commit last command. To undo changes, issue the net rollback last command. Then check the new configuration.

When you perform a rollback, the NCLU takes another snapshot. Even if the rollback doesn’t resolve the problem, you can roll back the rollback and start back over. This gives you peace of mind that you’re not compounding the problem by making a bunch of unnecessary changes.

Verifying the fix

How you verify whether the problem is resolved depends on the nature of the problem. With intermittent problems, it’s just a matter of waiting and seeing if the problem reoccurs. With ongoing problems, you have to check whether the original symptoms are still occurring. But nothing’s more annoying than believing you’ve resolved a problem, only to have it show up again later. Cumulus NetQ can help you validate that the fixes you implemented really did have the effect you intended.

The choice is yours

How you operationalize Cumulus Linux is really up to you. Instead of being locked into a set way of doing things as with traditional networks, you have the flexibility to form your own workflows and processes. Cumulus Linux will adapt to you! Traditional networks have conditioned us to think of the network as a decentralized collection of disparate devices that require a lot of individual attention. Thanks to Cumulus Linux and Cumulus NetQ, you can configure and troubleshoot the network as a cohesive unit from a single management server.