OpenStack is the de facto open source orchestration standard for modern cloud infrastructure. The foundational components stitch together compute, storage and, of course, networking. Linked together, these components are used for both public and private clouds all around the world. Cumulus Networks naturally fits into this ecosystem, and Cumulus Linux is the universal underlay or enabler for such deployments.
Over the past two quarters, Cumulus Networks has shared solution guides for our 2.5.x releases. In this post we’re going to dive into how you can automate a proof-of-concept OpenStack deployment. For those who learn by watching, a recent video from the OpenStack Vancouver (May 2015) summit event may be helpful; the presentation summarizes all of the behind-the-scenes tasks described below.
Our goal is to set up an end-to-end OpenStack deployment with the fewest interactive steps, making it as unattended as possible, and ideally taking no more than 20 minutes. The configuration scope includes all networking, server and storage components.
To facilitate a consistent architecture, we’ve imposed a few basic cabling and physical requirements. To make the PoC easy to implement, we assume no external Internet access is available — the entire solution is autonomous with all prerequisites present or cached.
For our first OpenStack solution (many future variants are planned), we chose to build a large MLAG topology. MLAG or multi-chassis link aggregation is a method of spanning a common layer 2 network or single broadcast domain over two or more switches. This is a popular technique to plug hosts into multiple switches for high availability, without requiring a routing protocol running on the host. From the host’s prospective, it sees a logical LAG or link aggregation group (802.3ad) across two unique switch ports. The design has been tested on simple two-switch and 6+ switch leaf-spine topologies. For larger deployments that are planned to grow beyond 10+ racks or for a standards-based design, it’s worth evaluating overlays as potential architecture.
As mentioned before, we’re going to assume some basic cabling requirements — at least two links are required between each pair of MLAG switches. Separately, each server will receive a single unique link between each MLAG switch pair.
Automation across an out-of-band management network
Additionally, all servers and switches are cabled together on a common management/out-of-band network, which is used solely to bring up the solution in an unattended fashion.
All equipment is also assumed to be in a factory default or uninstalled state. For switches, this means ONIE is present but no network operating system is installed. On the server side, the BIOS is set to continue looping in PXE or network boot mode; internal hard drives aren’t specified in the boot order setting.
Once you have everything cabled up, but prior to powering on, insert a prepared USB thumb drive into the first switch, which we’re calling the “genesis” switch. The genesis switch only has a mated MLAG peer and downstream switches or servers; nothing upstream. Once you power on all of the equipment, the genesis switch will install Cumulus Linux off the USB stick and automatically run an initial shell script upon boot.
Under the covers
Cumulus Linux offers a flexible zero-touch provisioning system (ZTP) — scripts can be sourced from the management network or a USB drive. In either case, the script can be written in any of the pre-installed scripting languages (Bash, Python, Ruby and Perl). On the genesis switch, the ZTP script installs a license, detects its MLAG peer, sets up the initial network configuration and configures the network-based provisioning environment.
The provisioning environment comprises HTTP, DHCP, TFTP, NTP and Puppet daemons or services. All of these services are bound to the eth0 or management interface of the genesis switch, enabling all the other switches and compute infrastructure to source off this network. The second switch boots off the provisioning network instead of a USB stick, but still follows the model of installing a license key, determining its MLAG peer and bringing up a complete network configuration.
Servers follow a common large-scale deployment model. The BIOS is left in a state to always PXE or network boot. During this initial provisioning phase, servers continue to loop, looking for an installation target, which the genesis switch offers once it’s fully operational. All of the servers install an Ubuntu long-term support version. Ubuntu’s pre-seed support installs the OS with basic default options around packages, hard drive partitioning and other common prompted settings.
Similar to the genesis switches that are built, server installs are parallelized — the first server that requests an installation receives the OpenStack controller duties. All other servers are configured as compute nodes, to host tenant VMs. Puppet deploys all of the OpenStack packages and setup dependencies automatically, such as message queueing, databases and authentication credentials.
If you’re using x86-based switches, this complete process takes around 20 minutes. PowerPC-based switches take slightly longer due to slower flash write time during the initial Cumulus Linux install process.
Rocket Turtle as the automation enabler
The true power and beauty of this solution is the relevance outside of OpenStack or even enforcement of different provisioning rules. For example, with this documented approach — one that’s loose on cabling enforcement — a partner or end customer could adjust this to meet their own architecture requirements, say at least four 40G-rated links between each MLAG pair. The same unattended approach could also be used for non-OpenStack deployments, such as a traditional L3 Clos or leaf-spine topology. Cumulus Linux shines when it comes to flexibility around automation, monitoring and custom tooling.
The power of server scripting, now available for the network!