Every now and again, we like to highlight a piece of technology or solution featured in Cumulus Linux that we find especially useful. Priority Flow Control (PFC) and Explicit Congestion Notification (ECN) are exactly such things. In short, these technologies allow you to converge networks and save money. By supporting lossless or near lossless Ethernet, you can now run applications such as RDMA over Converged Ethernet (RoCE) or RoCEv2 over your current data center infrastructure. In this post, we’ll concentrate on the end-to-end solution for RoCEv2, ECN, and how it can help you optimize your network. We will cover PFC in a future post.
What is explicit congestion notification?
ECN is a mechanism supported by Cumulus Linux that helps provide end-to-end lossless communication between two endpoints over an IP routed network. Normally, protocols like TCP use dropped packets to indicate congestion, which then tells the sender to “slow down’. Explicit congestion notification uses this same concept, but instead of dropping packets after the queues are completely full, it notifies the receiving host that there was some congestion before the queues are completely full, thereby avoiding dropping traffic. It uses the IP layer (ECN bits in the IP TOS header) to communicate to the receiving host if congestion was experienced in the path. It is then up to the receiver to notify the sending host to “slow down” to avoid losing traffic. It is important to note that as long as queue space is available, packets will always be forwarded and never dropped – ECN only indicates congestion to the endpoints.
ECN must first be enabled on a flow, as ECN flows and non-ECN flows can co-reside throughout a network. Following the ECN standard, TCP is used between hosts to negotiate the capability as depicted below.
Here’s how it works. If the sender requires the flow to be lossless and wants to use ECN, it will start the 3-way handshake process with the receiver. It initiates a ECN desired connection by setting the ECE and CWR bits in the TCP Header during the SYN packet. The receiver sends the ECE bit set with the typical SYN ACK and then the sending host responds with the ACK. Now, both ends are ready to use ECN.
Every switch within the path must be configured to enable ECN Capable Transport (ECT) on any or all queues. This is how the switch knows to watch the buffer of that queue and could mark the packets if congestion is experienced. If a packet enters a non-ECT queue, it is never considered for ECN. Of course, which queue a flow enters is configurable based on the packet’s DSCP bits. If ECN is desired, The sending host sets ECT codepoint (ECN bits are 01 or 10) on a transmitted packet. Now the packet enters a switch running Cumulus Linux:
- If the packet is directed towards a ECT enabled queue and there is no congestion, the packet is forwarded as normal
- If the packet is directed towards a ECT enable queue and there IS average congestion over a pre-specified threshold (based on configurable limits), the switch will re-write the ECN field of a certain probability of those packets with “11”, indicating Congestion Experienced.
- If there is no congestion from end to end, or the packet enters a non-ECT enabled queue, everything happens normally and the ECN bits are never changed.
Below depicts the action when a ECT enabled packet enters a congested ECT enabled- queue
The end hosts are responsible for informing the sending host to back off if congestion was experienced. This can be done in a variety of ways depending on the application.
What about RoCEv2 again?
To reiterate, explicit congestion notification makes IP networking accessible for even more applications, even those sensitive to loss. Customers like ECN (and other technologies like PFC) because it allows them to converge their networks into one, reducing cost and adding simplicity.
For example, in the past, Infiniband applications often required a separate Infiniband network. Customers using these applications worried about making the jump into using IP networks to serve the same traffic. RoCEv2 allows the Infiniband payload to run on top of UDP/IP, thus converging two historically separate networks, but it requires nearly lossless data transfer. ECN provides the near lossless data transfer requirement. However, since RoCEv2 uses UDP (instead of TCP) it relies on the Infiniband (IB) Transport Protocol over UDP to notify the sender of the congestion. This behavior is depicted below.
How do you deploy explicit congestion notification?
If ECN sounds like a good fit for your network, we recommend getting started with the simple configuration steps. It’s coming soon to NCLU, but if you’re hoping to get started immediately, it is very straightforward in traditional Linux. Only one file ( /etc/cumulus/datapath/traffic.conf) will need to be edited and one process (switchd) restarted to get rocking and rolling. More information is found in the user guide.
If you would like to learn more about Cumulus Linux, here are two ways you can get started, based on your current Linux comfortability level.
- Working on the Linux basics? If you need help taking your Cumulus Linux skills to the next level, we recommend a Cumulus Networks bootcamp for personalized training based on your needs. You’ll learn everything you need to know about Cumulus Linux configurations.
- Need help with specific topics? Sign up for one of our upcoming webinars or view past webinars on demand.
- Looking for some 1-on-1 assistance? Our knowledgeable sales engineers would be happy to walk you through any questions you may have about Cumulus Linux configurations. Contact your account representative to get started.