In the beginning, there were switches. And connected to these switches were servers, routers and other pieces of gear. These devices ran one application, or at a stretch, multiple applications on the same operating system and thus IP stack. It was very much one-server-per-port; the SQL Server was always on port 0/8, and shutting down port 0/8 would affect only that machine.
This is no longer true, as network engineers well know. Physical hardware no longer dictates what, where, and how servers and other workloads exist. Cloud computing, multi-tenant virtual infrastructures and dynamically reallocated virtual resources mean that one port can cover 20 or 200 servers. Conversely, link aggregation and other forms of port density protocols mean that one server can have fault-tolerant aggregated links across one, five or 50 ports.
A new way of looking at switching—as a logical, rather than physical, topology—is required. In this view, switches aren’t so much pieces of the network architecture themselves, but simply ports that can be used to set up much more complex logical links. This article will focus on two main concepts: routing protocols (to allow better utilization of underutilized switching links) and switching protocols such as STP (those used to ensure that links and loops don’t entirely degrade the network if, say, a physical loop in the switching fabric is created).
Topology management: spanning tree protocol—use those cables!
Switches are growing: not only in speed, but in port density and complexity. Because of this, careful management of the layer 2 logical topology needs to be considered. Would link aggregation for speed or redundancy be better for a media server? Should multiple links be created between two switches or should a high-bandwidth single link be used? These questions are important, and so, too, is the management of the complex links created by answering them.
Spanning Tree Protocol (STP) is an important protocol for the management and integrity of these links. It can be used to manage loops, broadcast storms, and other layer 2 switching phenomena that are more and more complex to avoid in large datacenter-scale networks.
Without STP (or a similar protocol) a network engineer creating what they believe to be a redundant link by using two cables to connect two switches could very easily create a highly damaging broadcast storm, taking down a whole network and putting significant stress on the infrastructure components. With STP, however, the switches themselves can adapt to changing network topologies as routers, servers and application workloads move between hosts.
STP is an older protocol and has some drawbacks, such as the amount of time it takes for all routers on the network to become aware of a change in topology. There have been several protocols developed to try to address these deficiencies. TRILL (Transparent Interconnection of Lots of Links) is one example. However, TRILL wasn’t widely adopted, due to a number of industry factors. Most of today’s networks are still using STP.
Dynamic routing: BGP, OSPF and so on—use those links!
The second problem to solve now is how to use all the links—redundant, aggregated and regular—that your layer 2 protocols have provided you with. In the old days, static routes would be the name of the game. But in today’s world of virtual machines (VMs) bouncing between hosts and hosts changing their uses on the fly, that simply will not do.
Enter dynamic routing, where hosts and routers inform each other of the changing status of the layer 3 network. Did a high-bandwidth link just open up in the same subnet as media-server.example.com? Tell that host about it. Did the primary link for crucial-server.example.org go down? Send it a backup route.
As datacenters grow and virtual workloads get increasingly more complex, the importance of dynamic routing grows exponentially. Hundreds of VMs might be in an average datacenter, and they all need to talk among themselves to pick the optimal route for a packet, taking into account security, speed, latency, and other important policies and criteria.
BGP, OSPF and other dynamic routing protocols have been used by hyperscalers for years, and just like any technology, what was once a hyperscaler can quickly become an everyday data center, requiring the technology that was the domain of the hyperscaler to come down to the domain of the average business, no matter what their focus.
Beyond the basics
Modern networks cannot survive without backup links between infrastructure components, nor can they survive without careful management of routing. The reason new protocols are continually developed to replace their predecessors is that older protocols don’t scale adequately, and this is the state faced by many organizations today.
While STP is “good enough” for smaller organizations today, it’s already inadequate for large enterprises, and the number of organizations seeking alternatives is growing. Similarly, OSPF is giving way to BGP in today’s data centers, with larger data centers pushing even the boundaries of BGP’s capabilities.
These protocols were developed during an era in which each networking device was effectively an island; each device maintained its own route lists, managed its own links, and generally acted independently of all other networking devices. Protocols such as BGP served to exchange information between networking devices about what each device was doing, however, decisions about how to act upon that information were independently calculated and applied by each device.
Software-defined networking (SDN) was to be an answer to this. By separating the control plane from the data (or forwarding) plane, decisions could be made about the entire network fabric at once, speeding reconvergence times in case of a link error, or route change.
SDN adoption has been far from universal. However, it highlighted the importance of centralized management, even for networks that rely on traditional protocols, such as STP and OSPF. The problem that everyone is ultimately trying to solve here is how to cope with change.
The faster you need to adapt to change, the more burdensome older protocols become. Even without making use of true separation-of-the-data-plane SDN, whole-network visibility is a powerful tool for planning out changes, and ensuring that response to outages is handled efficiently.
For most modern organizations, networks are business-critical. Using dynamic protocols to handle link management and routing is a necessity. So, too, however, is centralized management, with the importance of the latter only set to grow in the near future. Check out the Cumulus best-in-class network operations solution, NetQ, to see how Cumulus can help you solve agility and scaling problems on your network today.