Businesses today have to get applications to market faster than ever, but with the same or less budget. Because of this requirement, modern data centers are evolving to support a change in application delivery. In order to get applications to market faster and increase revenue, applications that were once built as one monolithic entity are becoming segmented and deployed separately, communicating amongst themselves. The pieces of applications, sometimes referred to as microservices, are often deployed as containers. This results in much faster deployment and a quicker update cycle. However, the network teams operating the infrastructure supporting the applications often have no visibility into how their networks are being utilized, and thus are making design, operations and troubleshooting decisions blindly. Now, Cumulus NetQ provides this visibility from container deployments all the way to the spine switches and beyond — accelerating operations and providing the crucial information to efficiently design and operate the networks running containers.
Understanding the challenges of container management
Traditionally, the new application design and deployment method using containers makes operating and managing the infrastructure to support them very challenging. The containers often have to talk with each other within or across data centers or to the outside world. An orchestration tool, such as Kubernetes, is often used to automatically manage container deployments and scale (amongst other duties). The containers are also often ephemeral, and automatically spun up and destroyed as needed on any server in the cluster. Also, since the containers are located inside a host, they can be invisible to network engineers — never knowing where they are located or when they are created and destroyed.
Operating modern agile data centers is notoriously difficult with a blind eye and changing traffic patterns. For example, in the below scenario the containers on server01 need to talk with the other containers in the network. However, the other containers just got re-deployed across the data center, so the traffic pattern changes very quickly!
Now, it just got a lot faster and easier to operate a dynamic network as shown above. Cumulus NetQ 1.3 was just released announcing integration with the Kubernetes orchestration tool, in addition to the Docker Swarm integration that was provided with NetQ 1.1. Kubernetes is the #1 container orchestrator on the market, deployed by 71 percent of enterprises, according to a 2017 451 Research Study.
With NetQ 1.3, we have visibility into the network not only from the spine to the host and containers inside the host, but also into the Kubernetes API. This allows a unified view of a network running containers, accelerates operations of the modern data center and simplifies the network engineer’s life.
How does Cumulus NetQ integration with Kubernetes work?
A Kubernetes (k8s) cluster is managed by one or more cluster master nodes. The master node is the control plane of the cluster, which includes the API server, scheduler and resource manager. When the master launches a service or an application on the k8s cluster, the containers of the application are launched onto the worker nodes as k8s pods, based upon worker node availability. A pod is a group of containers that share common namespace, share resources and share an IP address. The IP address management of the pods and connectivity between the nodes in a cluster is done by a container network interface (CNI) plug-in.
The master is responsible for launching an application and maintaining its state. When an application is launched on the cluster, the requested state (like number of replicas) is stored in the etcd server for perseverance. When the master detects that the actual state doesn’t match (e.g., if a worker node is lost) the requested state, new pods are deployed to match this state. As you can see, this is a very dynamic environment with the network engineer having virtually no visibility — even though the pod locations greatly affect traffic patterns in a data center.
In addition to reading from all the switches and hosts in the network, NetQ can now directly access the k8s API server. NetQ taps into the k8s API server to view the health, status and connectivity of the nodes and the workloads in the cluster, including their networking characteristics. This makes viewing and troubleshooting networks with containers a whole lot faster and easier.
Show me some of what I can do with NetQ and K8S integration
Say, as a network engineer, you just heard from the application team they are planning to scale the deployment apache1 up. The need for apache1 has just greatly risen and it’s a real money-making application. You want to ensure there is enough bandwidth in the data center to accommodate the new demand, and you want to test connectivity between containers so the team coming in at 6am has no surprises. However, since pods are located within servers, the network team typically has no visibility as to where the extra bandwidth may be needed!
Now, NetQ and k8s integration just accelerated network operations with containers. By tapping into the k8s API, NetQ can easily determine and display the new pod locations and how they are connected to the network.
So let’s start by viewing the cluster. In this case, we have four nodes: server01 is the master and servers 02-04 are the worker nodes. We can perform this command from any node (switch or server) in the network — I am showing it from the out-of-band management server (which is also the NetQ telemetry server).
We can see how many replicas exist now (before they scale it up).
Now the application team just scaled it up, and we can see the changes right in NetQ!
Next, let’s determine where these are located. Directly from NetQ, we can see the servers that host pods from this deployment, including all of their IP addresses and their statuses.
NetQ also allows you to go back in time. For example, we can even see what this looked like 20 minutes ago, where we had only 5 pods. The time machine debugging can be done for any command in NetQ to show how the information looked before.
Next, let’s check all the pods connectivity to the network.
We can see the location of all these pods, which server they are on and how they are connected to the top of rack switch. This tells us exactly how the newly deployed pods are accessed, thus allowing us to ensure there is enough bandwidth to reach that new money-making deployment!
Let’s also trace the path between 2 new containers. Using 2 IP addresses found in an earlier step, we can see the path these 2 containers could use to communicate with each other, along with the path MTU.
What if I need to swap out a leaf switch?
NetQ has you covered! NetQ also has the ability to predetermine what deployment impact you will get if you need to swap out some hardware. For example, say we still have the 10 pod instances of apache1 running, but we need to swap out leaf01 — can we do this one evening or do we need to wait for an official outage window? NetQ can help you make the determination! Simply type the below command from any node in the network.
The items in green determine no impact, items in yellow determine partial impact and the items in red determine a full impact. We can see that only 3 of the 10 pods will have a 50% impact (total impact of only 15%) so this evening is probably a good time to do that swap!
How can I learn more?
Attend our webinar on May 10th at 11am PDT! In this webinar, we will discuss the operational benefits of NetQ, along with technical details about how NetQ works. We will also include a live demo.
You could also simply go to Cumulus in the Cloud and request a Kubernetes workbench. There are step by step instructions to get you started. And it’s all free!
In fact, all the output I have shown here was done using the Kubernetes workbench on Cumulus in the Cloud, so you can follow along! (Hint: From server01, scale up the apache1 deployment using the kubernetes control command: kubectl scale deployment apache1 –replicas=10)
Have fun, and keep those awesome networks running!