A little background information first.

When having a business requirement of tenancy, most solutions will tend to lean towards VRF. That is because VLANs require a distributed L2 environment, which comes with spanning tree, mlag and a whole other glut of inefficient network control plane protocols. Upleveling the infrastructure to L3 ends up requiring VRF technology to enforce tenancy.

Once you’ve settled on this feature as the solution for the business requirement, the next question is: How do I successfully deploy VRFs in a large distributed environment at scale, that also allows me to minimize the burden of management while still enforcing tenancy in all the important parts of my network? Most conversations surrounding this question will lead down two solution paths:

  1. VXLAN with EVPN
  2. VRF Lite

Definitions of VXLAN with EVPN and VRF Lite.

VXLAN with EVPN leverages VRFs at every border and leaf switch, while all the intermediate devices (ie. spines, super spines) only see the encapsulated VXLAN traffic, and hence do not need any VRF intelligence or visibility.

A VRF Lite solution is fundamentally simpler since it uses less moving parts. The thought of enabling the EVPN address family and encapsulating traffic into a VXLAN tunnel interface can feel daunting. As such, a VRF Lite solution uses only two technologies:

  1. VRF that are local to a switch
  2. Subinterfaces with dynamic route peerings

The biggest challenge with a VRF lite solution is that the VRFs, though configured locally on the switches, are globally significant on the spines. Additionally, the VRF configuration requires scale on the following fronts:

Every VRF will require a unique VLAN, a unique IP address, a unique subinterface and additionally, a unique route peering. This is necessary because VRFs are locally significant. As such, any resource needs an equivalent on any connected device. There is no way to pass the VRF information between nodes since a VRF lite solution doesn’t rely on a unified control plane. Instead it connections individual control planes together to enforce tenancy.

So, where does that leave us?

Though simpler to configure, it’s significantly more complicated to manage at scale. Each piece of hardware may have its own unique limits on VLAN, subinterface and route scale, in addition to the routes now being multiplied by the number of VRFs present.

Take for example 2 routes inside VRF RED and 2 routes within VRF BLUE. When advertised into a VRF lite fabric and relearned back by the switches, instead of being four total routes, it actually becomes eight total routes. In VRF RED we have two static routes in VRF RED with two dynamically learned routes from VRF BLUE, and the vice versa for VRF BLUE. Depending on the deployment scale, a hidden limit on route scale may become exposed.

VXLAN with EVPN maneuvers this limitation by using route targets and route distinguishers to build a distributed control plane. This control plane information is propagated via BGP to ensure that any routes that are learned are programmed into the correct VRF. The biggest advantage of this solution is independence from needing to scale VLANs and interfaces with the number of VRFs. In Cumulus Linux, we make things easy by auto-generating the RD and RT information to ensure it matches across the infrastructure.

So, which solution is better?

I’m biased. VRF Lite feels like a hack to me, so I’m going to go to lean towards VXLAN with EVPN. That’s even with the certain limitations around scale, which I feel 99% of actual users will never hit. The management challenges of a distributed VRF lite infrastructure are too heavy since it requires the network operator to consistently adjust the configuration of the spine or super spine to accommodate configurations on even just one leaf, rack or application.

This level of impact is too broad for my interests.

I’ve heard arguments that VXLAN with EVPN is still too new, but this technology has been around for more than 5 years in active production. In technology years, that is equivalent to a lifetime. The benefit of configuring the “core” of your network (ie. spines and super spines) and never having to touch them again has too much upside to the large range of impact that regular changes inflict on a network.

If this blog was interesting to you, be sure to check out our upcoming webinar with Dinesh Dutt on Wednesday, March 25th, 2020 at 9 AM Pacific. We’ll be talking about what’s new with EVPN including

  • When and how to deploy EVPN multicast replication
  • Underlay design considerations when using multicast replication
  • How EVPN multihoming simplifies host deployments
  • EVPN open source contributions and the future of open EVPN networks

Unable to join us on the 25th? Register anyway and we will send you the recording to watch on demand! Just fill out the form via the link here.