At Cumulus Networks, one of our main goals is to make networks accessible for all.

Networking, specifically in the data center, is in the throes of a major transformation. One of the fundamental problems with networking has been how networking devices have been managed. Networking is one of the last holdouts for a DevOps style of device management. It has started to slowly yield, allowing simple interface configuration, specifically layer 2 – VLANs, link aggregation, device names – configuration. Even IP address configuration is not supported in many of these models, let alone tackling routing. Yet, routing is a linchpin technology used in modern data centers.

Cumulus Networks has been working closely with CFEngine to usher Layer 3 configuration into the DevOps fold. CFEngine already supported network interface configuration, and with this partnership with Cumulus Networks, it adds IP routing to its repertoire. We developed a model of IP routing and routing protocols with a focus on the data center, but extensible to other domains as well.

A Simpler Approach To Configuring Routing

At its core, routing protocols have three simple constructs to configure: who they peer with, what they communicate with those peers, and lastly, in some cases, performance tuning.

For example, in the case of a link state protocol such as OSPF, the peer is typically specified as the interfaces over which the peering has to be established, whats communicated is the local link state database and the database received from other peers, and performance tuning typically involves configuring the area of an interface and reducing the protocol timers to low values to allow rapid convergence in the presence of soft failures. A similar case can be made of BGP as well.

In CFEngine, a simple 2×2 redundant routing block (see fig 1), running OSPF might look as simple as this.

Figure 1

CFEngine

bundle agent main()

{
interfaces:

“swp1”
link_services =>  ospf_area(“0”);
“swp2”
link_services =>  ospf_area(“0”);
}

body routing_services control
{
ospf_redistribute => { “kernel”, “static” };
# Map linux hostnames to a router_id
hostname_1:: ospf_router_id => “1.1.1.1”;
hostname_2:: ospf_router_id => “2.2.2.2”;
hostname_3:: ospf_router_id => “3.3.3.3”;
hostname_4:: ospf_router_id => “4.4.4.4”;
}

The ability to know the intended state, and to monitor the actual state, while using its machine-learning to trace performance characteristics now drops out of the CFEngine framework for free. With constructs such as unnumbered interfaces in OSPF, the bulk of the configuration lies in naming the boxes — a generic theme when managing name-spaces, according to promise theory.) This pattern is supplemented by a basic template like this, which can be kept in a standard library for re-use:

body link_services ospf_area(area)
{
ospf_area => “$(area)”;
ospf_authentication_digest => “ABCDEFGHIJK”;
ospf_link_type => “point-to-point”;
ospf_hello_interval => “5”;
}

The purpose of this very generic code is clearly not to micromanage details through continuous tweaking, but rather to make the problem go away as a managed service.

The House That Clos Built

Large datacentres today are using non-blocking topologies to scale traffic levels horizontally with commodity hardware. This architecture was pioneered by Charles Clos in the 1950s. The highly regular Clos structures are well suited to patterns of promises. Dynamical change for application management can be handled by protocols like iBGP, without bringing new virtualization into the picture.

For example, repeating the approach for iBGP, one might imagine a similar data-driven pattern for a 2×5 tree (fig 2)

Figure 2

LeafSpine

bundle agent LeafSpine
{
vars:

# Generate the interface lists used on the routers

“spine”  slist => expandrange(“swp[1-5]”, “1”); # point to 5 leafsw
“leaves” slist => expandrange(“swp[1-2]”, “1”); # point to 2 spinesw

“net_adverts[leaf1]” slist => { “10.10.10.1/24”, “10.10.20.1/24” };
“net_adverts[leaf2]” slist => { “10.10.30.1/24”, “2001:0DB9:0:f101::1/64” };
“net_adverts[leaf3]” slist => { “192.168.1.0/24” };
“net_adverts[leaf4]” slist => { “192.168.1.0/24” };
“net_adverts[leaf5]” slist => { “192.168.1.0/24” };

“router_id[spine1]” string => “2.0.0.1”;
“router_id[spine2]” string => “2.0.0.2”;
“router_id[leaf1]” string => “1.0.0.1”;
“router_id[leaf2]” string => “1.0.0.2”;
“router_id[leaf3]” string => “1.0.0.3”;
“router_id[leaf4]” string => “1.0.0.4”;
“router_id[leaf5]” string => “1.0.0.5”;

interfaces:

spine::

“$(spine)”
link_services =>  ibgp_reflector(“server”);
leaves::

“$(leaves)”
link_services =>  ibgp_reflector(“client”);
}

With the data specified in an associative array, two generic promise patterns are sufficient to configure the 2×5 tree.

Model-based for WebScale

Model-based configuration is not just a “nice to have”, it is a “must have” feature of a scalable architecture. The three challenges of today’s software stack remain: scale, complexity and knowledge management. Brute force can handle the first of these, but going forward, model-based is the only plausible option. What CFEngine adds is its usual promise of self-healing convergent end-state management and a knowledge-oriented approach.

SDN, by any other name

The promise of SDN was to make networking more accessible by tailoring it to support the applications that use it, while bringing a more policy-based approach to management, with version control. This was the world that CFEngine brought to IT management — now often called ‘Infrastructure as Code’.

CFEngine’s research into Promise Theory has been to leverage patterns and create desired end-state technology with self-repairing capability — so-called ‘executable documentation’ model. By declaring simple promises, and relying on its configuration engine, one can go beyond simple automation to bring vertical and horizontal resilience to network structures.

By providing a native Linux interface, Cumulus Linux enabled CFEngine to develop its agent without requiring any special code to access all the routing and interface configuration of a device running Cumulus Linux. Furthermore, since Cumulus Linux is just Linux, developing the agent could be done completely independently of any involvement from Cumulus Networks. This is the essence of what distinguishes a modern network OS from a traditional black box based OS: allowing the user to control how they interact with the system and enabling applications and usage beyond the imaginations of the the platform developers. In other words, being an enabler, not a gatekeeper.

In partnering with CFEngine in creating a model for managing IP routing, Cumulus Networks has been working on taking away several burdens that haunt network engineers. And that kind of evolution in the management of network services is long overdue.