The open Ethernet solution includes DriveNets Network Cloud-AI software and white boxes from Accton with Broadcom’s Jericho-3-AI and Ramon-3 DDC scheduled fabric architecture.
DriveNets – a leader in innovative networking solutions – and Accton Technology – a leader in advanced technologies of hyperscale data centers, AI, and edge computing – announced the successful testing and launch of two new white boxes based on Broadcom’s Jericho-3-AI and Ramon-3 ASICs. DriveNets and Accton are the first companies to make the white boxes with the new Broadcom ASICs available for commercial AI networking use.
The solution combines the proven high-scale software from DriveNets and white boxes from Accton, and supports AI and ML clusters with up to 32K GPUs connected with 800Gbps interfaces. The white boxes are based on the OCP DDC (Distributed Disaggregated Chassis) scheduled fabric architecture, offering a scalable solution that is fast and easy to deploy and can grow with a company’s needs. This architecture successfully passed POC’s with Tier 1 AI customers. The solution addresses the growing needs of hyperscalers building huge GPU clusters, as well as enterprises building large AI clusters with 1000s of GPUs.
“There’s high demand for the new Broadcom ASICs by companies building high-scale AI clusters that want hardware diversity without compromising performance,” said Ryan Donnelly, DriveNets Chief Operating Officer (COO). “Our software supports the new Accton white boxes and provides our customers with an Open Ethernet-based AI Networking alternative to InfiniBand without any compromise to performance.”
Also Read: Mavenir and Intel Integrate AI into Open RAN Software
“Accton brings years of engineering and design expertise to the table with millions of units shipped to date. Our latest OCP compliant open networking white box switches are on display at OCP Summit 2024 and demonstrate the performance and reliability needed for today’s AI back-ends,” said Mike Wong, Head of Product Management. “DriveNets‘ operating system solution allows for the elastic growth of that network using a Distributed Disaggregated Chassis (DDC) topology that matches the performance of older proprietary InfiniBand solutions. Together, we’re providing hyperscalers, enterprises and all AI builders with a high-performance, open-standard alternative to traditional closed hardware.”
A proven solution
The new Accton white boxes consist of:
- NCP-5 (Accton ASA926-18XKE), based on Broadcom’s Jericho-3-AI ASIC, supporting 18 network ports of 800Gbps and 20 fabric ports of 800Gbps
- NCF-2 (Accton AS9936-128D), based on Broadcom’s Ramon-3 ASIC supporting 128 fabric ports of 800Gbps
Prior to launch, the white boxes underwent rigorous testing in Accton’s lab in Taiwan with NCP-5s, NCF-2s, Spirent AI workload emulation solution and Intel Gaudi servers with 32 GPUs in a cluster running with BERT and ResNet models. The test results showed more than 30% better Job Completion Time (JCT) performance compared to Ethernet Clos architecture. This testing highlights the architecture’s superiority of a DDC scheduled fabric over other Ethernet solutions and is on par with InfiniBand.
The industry’s first AI workload emulation solution provided by Spirent generates real-world AI traffic patterns at scale using RoCEv2 transport and with integrated Collective Communications Library (CCL) support, to identify issues that can lead to network congestion, higher latency and lower throughput. The solution reduces the complexity and effort required for validating AI infrastructures by providing repeatable testing and actionable metrics such as Job Completion Time (JCT), tail latency, algorithm and bus bandwidth, to intuitively diagnose performance and efficiency issues at a fraction of the cost compared to building real xPU systems.
Source: PRNewswire