faithlife-video

The network infrastructure powering Faithlife has seen a massive transformation in the last eighteen months. We’re really excited about all the cool new changes, and the measurable impact they’ve had on our employees, customers, and the products / features we’re able to offer. Given that, we thought that sharing our solution was a fun way to live our values and showcase what we think is pretty cool.

Philosophy

At Faithlife we value smart, versatile learners and automation over expensive vendor solutions. Smart, versatile learners don’t lose value when technology changes or the company changes direction, as vendor solutions often do. If we can use commodity hardware and open source software to replace expensive vendor solutions, we do.

Commodity hardware is generally re-configurable and reusable, and lets us treat our hardware like Lego bricks. Open source software allows us to see behind the curtain, and more easily work with other existing tools. We’re empowered to fix our own issues by utilizing the talent we already employ, not just sit on our hands waiting for a vendor support engineer to help us out (though we do like to keep that option available when possible). Additionally, combining commodity hardware with automation tools like Puppet, we’re able to be nimble.

By being nimble, leveraging in-house talent, Lego brick-ish hardware, and open software, we’re able to save a considerable amount of cash. Saving cash on operational expenses enables us to make business decisions that would have otherwise been cost prohibitive. At Faithlife we have the IT problems of a large company, with a small company budget.

Network Infrastructure

Not long ago we were exhausting the capacity of a variety of Cisco, and F5 1Gb network gear. Bottlenecks were popping up left and right, packet loss was high, retransmits were through the roof, and changes to network hardware happened at a glacial pace. We were beyond the limits of 1Gb, our topology was problematic, and shortcuts were continually being taken in order to keep up with the demand of our sites and services. At the same time, we had just begun the process of moving to Puppet and automating our server deployments, which meant we could easily outpace network changes. Additionally, the gear did not fit our hardware philosophy.

Fast forward to today, we are now running Cumulus Linux on Penguin Computing and Dell switches. With Cumulus Linux, we get Linux on a switch, so we can manage switches like servers, and could easily introduce team members to switches that would normally not know what to do with a CLI-based switch. This solution has allowed us to scale projects and get the business where it needed to be.

For a while now, we’ve been automating our servers, load balancers and many key pieces of our infrastructure with Puppet. We can now use the same model to automate our switches. This wasn’t possible before, since other vendors only allowed basic configuration through Puppet. In contrast, with Cumulus Linux, you can provision the switch with high level of customization. We’ve set Puppet up such that we can just iterate through templates.

Another aspect of scaling comes from leveraging bare metal switches. We now use Penguin Arctica and Dell switches in our network. The savings on hardware, SFPs and cables enabled us to reuse CapEx to build more cabinets to scale our business.

We converted our data center topology to a modified spine and leaf, or “folded Clos” design. A simplified layer 3 Clos fabric means that we don’t have to deal with all the restrictions of layer 2. We isolated our layer 2 environment to a single cabinet, and we don’t have to worry about Spanning Tree issues or lack of bandwidth. Instead, we can add capacity easily when more is needed, so hardware failures really don’t impact us as much. We use OSPF to route traffic between cabinets. A pair of leaf switches are placed in each cabinet; they represent a layer 2 boundary and allow us to MLAG our servers to maintain switch redundancy within that boundary. In addition, a pair of spine switches are placed in an end of row networking cabinet. We have multiple edge routers and firewalls connected to an area border router via OSPF. Furthermore, the edge routers are connected to ISPs via BGP. Using Quagga on our switches means that we can use the same OSPF and BGP configuration across leaves, spines, firewalls and area border routers. We start with a common template for all these devices. And since Cumulus Networks upstreams its code, we can also get the same cool enhancements such as IP unnumbered across devices. IP unnumbered makes point-to-point links simpler, and we don’t have to have /30 everywhere and keep track of separate IP per link.

Summary

Our network infrastructure now reaps the benefits of open software with our choice of hardware and software vendors while enjoying the level of support normally provided by a huge company like Cisco. This meets our criteria for smart, versatile, and automated solutions. And we can now apply the same philosophy not only to server and storage, but also across our data center infrastructure.

Guest blog: Richard Kiene, Principal Engineer at Faithlife