The Internet is big. Moreover, the Internet is bigger now than when that first sentence was written, and keeps increasing in size. The growth of the Internet from its humble beginnings as a DARPA research project was unprecedented and almost entirely unexpected. This—as well as the widespread usage of older routers and switches as crucial connection points in the Internet—has resulted in real-world scaling issues.
One of these issues, known commonly as “512K day,” occurred on Aug. 12, 2014. On that day, Verizon, a large United States-based Internet provider and Internet exchange point (IXP), submitted an extra 15,000 routes to the global BGP routing table. As these routes propagated across the network, they were accepted by some routers—the ones that had new firmware, or were configured to only store a subset of the global routing table.
But in other routers, this additional route load overran the 512,000 route maximum expected by the firmware designers, causing widespread Internet outages and degradation of service. In many cases, the issue was resolved quickly, but not as quickly as it could have been. Proprietary vendors were required to push out firmware updates for hundreds of router and switch models—a process that can take months. Not all devices were still within support; no doubt at least some of those routers were running outdated firmware simply due to a lack of funding to purchase the support contracts required to update them.
That full-up feeling, all over again
In many cases, the firmware was deployed onto routers with limited RAM and/or flash memory. Many routers for Internet backbone links are older models from the early or mid-2000s, with consequentially limited storage space. Due to this, many vendors chose only to update the routing table size to +256k, or 768,000, routes.
The result was revisiting 512k day in the form of 768k day; the Band-Aid patch required yet another Band-Aid, because the underlying problem had not actually been addressed. Per a report from ZDNet, routing table tracking firms say that the Internet routing tables should’ve already reached that 768k limit in 2019, though this is far from a universal certainty. Many routers may not yet have had to be patched for 768k day due to careful filtering of routes by providers looking to extend the life of their existing equipment.
These recurring issues are a natural consequence of the Internet’s inexorable growth, the billions of Internet of Things (IoT) devices expected to come online over the next few years and those online now. This post will look at how you can prepare for future iterations of this issue by configuring switches and routers to intelligently handle the growth, and by keeping hardware up-to-date—an easy thing to do with Cumulus-based gear.
Escaping Groundhog Day
The first (and most obvious) thing for a network administrator to do is make sure all switches and routers are up-to-date. Upgrading networking devices that use Cumulus Linux is straightforward. However, that’s not always the case for other vendor network devices, especially when the networking equipment, or the devices attached to them, aren’t clustered or using multipath for redundancy.
One of the most important things to do right away is to increase the size of memory allocated for the BGP routing table across all your devices. Of course, admins need to take into account how much RAM is used for other services, though the BGP routing table file is not, by modern standards, large. However, this tweak is really only temporary, and is exacerbated by IPV6.
At its core, this problem is about more routes existing in the BGP table than existing devices were expected to ever have to cope with. While this problem is predominantly discussed in relation to the global IPv4 routing table—in large part because IPv4 address spaces are increasingly being broken into smaller segments due to scarcity—the IPv6 address space will inevitably run into this problem, as well. In the case of IPv6, however, the drive won’t be scarcity-driven segmentation, but widespread adoption.
The most popular option is to filter routing tables, which prevents route bloat on older equipment. A common example of this approach is to stop carrying (or reject) IPv4 /24 (or other non-essential prefixes). Typically, the approach is to direct them to newer equipment, or an upstream transit provider, allowing their comparatively more modern gear to handle them. This is an excellent option if your gear is simply too old to handle any kind of enlarged routing table, and you don’t have (or cannot obtain) firmware updates for it.
Filtering is a complex topic, and usually involves weighing the cost of simply replacing older equipment against the need to create hierarchies of routers in order to adequately route various packets. In most cases, however, each network device needn’t carry the full global routing table in order to perform its task.
While filtering is currently viewed by many primarily as a means to avoid service interruptions caused by routing tables filling up, this is ultimately likely to become common practice, especially as IPv6 routes have started to rapidly proliferate.
It’s impossible to determine how many global routers were affected by 512k day, 768k day, or will be affected by future versions. Almost certainly there will continue to be issues; unpatched or misconfigured routers (especially old Cisco 6000 series routers) are still, even now, extremely popular.
The Internet is growing, as network administrators were rather rudely reminded in 2014, and it will continue to do so, especially with the current explosive growth of IoT devices, and the build-out of the 5G networks to feed them. In the modern, explosively growing Internet, it’s no longer OK to have a 10-year-old Cisco 6500 at the edge of your network. Routers of that era simply do not scale, and never will.
Organizations need a fast, up-to-date, and open source solution to the problem of connecting to the fast, up-to-date, and open source Internet they now face. If left unpatched and ignored, this issue will keep happening. Is your organization ready?