Border Gateway Protocol (BGP) is one of the most important protocols on the internet. At the same time, when it breaks, it is one of the most potentially catastrophic.
As the internet grows ever larger and becomes ever more complex, having a well-configured BGP is crucial to keeping everything running smoothly. Unfortunately, when a BGP is not configured correctly, there can be disastrous consequences.
This blog will provide a brief explanation of what BGP is, and then dive into some of the common protocol issues and pitfalls. We cannot go too deep into the intricacies of BGP – those can (and do) fill entire books. However, we can provide an overview of how Linux (which has a standardized BGP protocol set and in-depth monitoring, analysis, and control tools) can be used to alleviate some of these common issues.
What is BGP?
BGP is a routing protocol that relies on TCP, designed for providing routing information in and between autonomous systems (ASes). In large networks, BGP is responsible for informing all hosts that need to know of the ways a packet can travel from site A to site B – and, if a site or router goes down, how to reroute the packet so that it will still reach its destination.
Many routers, switches, and other hardware support BGP, but you will not find it in your standard home wi-fi router: it simply has no need for it. In a home configuration, a single static default gateway is used for all traffic leaving the LAN; this is usually the case in smaller businesses, too.
However, in larger organizations – such as internet service providers (ISPs) – BGP is crucial for providing routing information, both between ISPs’ networks (known as external BGP) and internally (known as internal BGP).
Common BGP problems
BGP is a complex protocol. It needs to be, after all: it is one of the most important protocols on the internet. Without BGP, static routes would be needed among all the major ISPs. In the absence of BGP, as network topologies changed, those routes would also need to be changed; clearly, this is an impossible task in a network as large as the internet now is.
However, like any complex protocol, BGP can have its issues, whether minor configuration differences among vendors’ implementations, or major, show-stopping errors, such as the publication of massive lists of misadvertised routes.
For an example of the latter, you need only look to the blackhole of youtube.com on February 24, 2008 by Pakistan Telecommunication Company Ltd. (PTCL), a Pakistani state-owned telecommunications company. In attempting to censor YouTube in that country, the blackhole route was accidentally published and rapidly propagated across the entire internet. This informed every ISP router worldwide that the quickest – and only – route to YouTube was via PTCL. The telecoms company dropped all YouTube traffic in accordance with the censorship they were trying to implement, thus rendering YouTube utterly inaccessible for at least one day.
The PTCL case is one of “BGP hijacking,” and it is a common misconfiguration. Unfortunately, with the trust inherent in the protocol, bad routes like this one can spread rapidly across the internet.
Another common BGP problem is route flapping, in which routes are advertised and withdrawn, advertised and withdrawn, repeatedly and rapidly. This kind of issue is most commonly a software or hardware issue, and it can cause packet loss – not to mention very confused routers across the entire internet – as routing databases are repeatedly updated and reverted.
In older routers, route flapping may in fact cause even more serious problems. The minimal processors in older network hardware might not be able to keep up, and in such cases, route flapping imposes effects both in local networks and across the internet. As routers get bogged down with constant updates to BGP route data, internal network routing may suffer dramatically.
How can these problems be solved?
All these things considered, it might sound like BGP is a free-for-all protocol, with no sense of authentication, verification, or risk mitigation. However, there is a way to alleviate all of this.
Linux has supported BGP for many years, both in the kernel and by using daemons like BIRD, Quagga, and FRR. It has a number of built-in route verification and rejection tools that allow for ignoring flapping routes, among other things. This allows Linux-based routers with the correct configuration to participate in BGP, with a far lower probability of there being misconfigurations that will take down parts of the internet.
Additionally, Linux BGP allows for complex rules, filters, and checks to be defined, allowing administrators to have fine-grained control over exactly which routes will be accepted and propagated, how many times a route can be withdrawn and republished before it is considered to be flapping (and thus ignored), and what to do if a hijack is detected, among many other things.
For example, a Linux-based router can be told not to accept a different autonomous system for a specific prefix, thus tying a specific peer to a set of addresses and preventing BGP hijacking.
A firewall can also tie in with the BGP daemon. For example, when a distributed denial of service (DDOS) is detected, a BGP route can be published that reroutes traffic to a DDOS scrubbing center. This allows the router to do less work in addressing the DDOS – all it has to do is get the BGP route out.
Linux also allows for in-depth monitoring and alerting with respect to BGP route changes, security violations, and more; this gives the network administrator a real-time view of the BGP landscape, and provides insights into what’s going on in the network.
Complexity can work in your favor
The strength of Linux is in its fine-grained control of networking configurations. BGP is indeed a complex protocol that can potentially create internet-breaking problems. However, careful use of the powerful Linux BGP daemons will give administrators much more insight into their networks.
Learn more about BGP in the datacenter with this definitive guide by O’Reilly.