Innovating AI Infrastructures with UALink

2 min read

An Interview with Alibaba

By: Vincent Kong, Ultra-Link Chief Architect, Alibaba Cloud Infrastructure

 

The UALink Consortium is establishing an open, interoperable standard for high-performance computing connections in scale-up AI environments. The UALink 1.0 Specification enables 200G per lane scale-up connection for up to 1,024 accelerators within an AI computing pod, delivering the open standard interconnect for next-generation AI cluster performance.

We recently met with UALink Consortium Board member Alibaba Cloud to discuss the benefits of an open ecosystem and UALink technology. Continue reading our blog for highlights from our conversation.

 

Q: What is the importance of an open ecosystem?
As a key part of rack-level AI solutions, scale-up interconnects demand ultra-high performance and stability, encompassing multiple vertical technical domains. By forming an open industry standard body, the UALink Consortium brings together system, hardware, chips, connectors, and other vendors to collaborate on UALink technology and deliver optimal performance, quality, and Total Cost of Ownership (TCO). This broad ecosystem will also serve as the foundation for the evolution and iteration of system architectures, enabling more innovative ideas to be implemented across the industry.

Q: Why did your company join the UALink Consortium?
Alibaba Cloud believes that driving AI computing accelerator scale-up interconnection technology by defining the core needs and solutions from the perspective of cloud computing and applications has significant value in building the competitiveness of intelligent computing super nodes. The UALink Consortium, as a leader in the interconnect field of AI accelerators, brought together key members from the AI infrastructure industry to work together to define an interconnect protocol that is natively designed for AI accelerators, driving innovation in AI infrastructure. This will strongly promote the innovation of AI infrastructure and improve the execution efficiency of AI workloads, contributing to the establishment of an open and innovative industry ecosystem.

Q: How does UALink technology enhance AI workloads?
The characteristics defined by UALink, including memory semantics and high efficiency natively designed for AI scenarios, can widely support various AI accelerators. The lightweight characteristics of the protocol can also simplify the design of the controller by reducing the chip area and power consumption. Additionally, its low-latency features enable tighter computational coupling between GPUs within a scale-up, forming the basis for the continuous growth of model capacity. The UALink 1.0 specification supports high-density integration up to 1K accelerators and provides security features.

Q: What are some use cases for UALink technology?
There is an increasing demand for computing power and memory capacity on chips. The growth of single-chip resources is limited, making it difficult for AI accelerators to meet the demand of training and inference by horizontal expansion. UALink enables multiple AI accelerators to work closely together by vertical expansion. The low latency and lightweight characteristics of the UALink protocol can also greatly simplify the design of the controller, reducing the chip area and power consumption. The UALink protocol will become an important piece of the cabinet integration solution.

LinkedIn