- Download
- Download 445
- File Size 381 KB
- Download
- Download 496
- File Size 894 KB
- Download
- Download 64
- File Size 894 KB

Overview
In recent years, the demand for Artificial Intelligence and Machine Learning (AI/ML) capabilities in data centers has been increasing. Most applications now leverage deep machine learning with deployment on distributed neural networks. This approach ensures that resources remain unblocked and calculate in parallel, allowing for seamless scaling to meet increasing services demands. In such high-speed network environments, DCQCN(Data Center Quantized Congestion Notification) stands as a pivotal congestion control algorithm in RoCEv2 networks, effectively combining ECN (Explicit Congestion Notification) and PFC (Priority Flow Control) to facilitate end-to-end lossless Ethernet.
There is one main difference between AI/ML networking and cloud networking; there are more elephant flow cases in AI/ML. In other words, higher speeds are needed to tolerate the data flow peak and address the growing distributed computing traffic. Regarding these challenges, a way needs to be found to tune the lossless and low-latency network in a higher-speed environment. This challenge can be approached from two perspectives: a compute node view and a communication network view.

Overview
Enterprise SONiC by Broadcom® is an open source network operating system based on Linux that runs on merchant silicon-based platforms. The open source SONiC project is available at GitHub (https://github.com/Azure/SONiC/wiki).
Enterprise SONiC is in production today at multiple web-scale companies for Data Center fabric deployments and has a thriving developer community and vendor ecosystem. The underlying architecture of SONiC is described at GitHub (https://github.com/Azure/SONiC/wiki/Architecture). Enterprise SONiC is a commercial offering based on open source SONiC with feature enrichment and hardening that is targeted at Data Center leaf, spine, and super-spine deployments. Enterprise SONiC supports ODM and OEM platforms based on the StrataXGS® family of silicon from Broadcom.
Enterprise SONiC offers benefits such as cloud performance, simplicity based on industry leading merchant-silicon and standards based IP-Clos architecture. It also provides agility driven by a Unified Manageability Framework with programmatic APIs and an extensible, container-based architecture. Its open source foundation and standardized ecosystem provide strong economic benefits for a Data Center fabric solution.
Customer Use Cases
Data Center L3 CLOS Overlay Use Case (with VXLAN and BGP-EVPN) Starting with the 3.4.0 release, Enterprise SONiC can also be deployed in enterprises or service providers for select workloads, such as Hadoop, that require an overlay to support multi-tenancy.
Using an overlay architecture in the data center allows end users (network administrators) to place endpoints (servers or virtual machines) anywhere in the network and remain connected to the same logical Layer 2 network, enabling the virtual topology to be decoupled from the physical topology. This decoupling allows the data center network to be programmatically provisioned at a per-tenant level.
Overlay networking generally supports both Layer 2 and Layer 3 transport between servers or VMs. It also supports a much larger scale. SONiC overlay networks use a control-plane protocol (BGP-EVPN) to facilitate learning and sharing of endpoint information, and use VXLAN tunneling protocol to create the data plane for the overlay layer.

Gigabit Ethernet – Ethernet media consisting of 802.3 standards with various speeds including 1 GbE, 10 GbE, 25 GbE, 40 GbE, 100 GbE, 200 GE, 400 GbE, etc. The standard covers a range of standards and protocols for twister copper wires, optical fibers, auto link speed negotiation, link-based full duplex flow-control, link aggregation, power over Ethernet, per flow control, enhanced transmission selection for bandwidth assignment based on priorities, data center bridging capabilities exchange protocol, etc. The subject is vast, and the author assumes basic and reasonable understanding of the topic.
L2 Switching – When Ethernet, Token-Ring, or Fiber Channel data frames are switched based on a mediabased protocol frame carrying destination and source addresses. L2 networks consist of L2 bridges or switches, and they switch L2 frames based on destination address and use spanning tree to prevent a traffic storm. These can be physical LAN or Virtual LAN (VLAN) which follows the 802.1Q standard. The L2 frames can be unicast-, multicast-, broadcast-based traffic.
Spanning Tree – Built to avoid looping and storming of L2 frames. The Spanning-Tree Protocol and its variations run to establish a Spanning Tree.