Dubbed the Habana Gaudi AI Training Processor, the chip enables near-linear scaling of training systems performance. High throughput is apparently maintained even at smaller batch sizes, thus allowing performance scaling of Gaudi-based systems from a single-device to large systems built with hundreds of Gaudi processors.
The chip features on-chip integration of RDMA over Converged Ethernet (RoCE v2) functionality within the AI processor. This enables the scaling of AI systems to any size, using standard Ethernet.
The company claims that means that customers can use standard Ethernet switching for both scaling-up and scaling-out AI training systems. Since Ethernet switches are multi-sourced and offering unlimited scalability in speeds and port-count, they are already used in datacentres to scale compute and storage systems.
GPU-based systems rely on proprietary system interfaces, that inherently limit scalability and choice for system designers however Habana uses a more standard’s based approach.
Linley Gwennap, principal analyst of The Linley Group said Habana has quickly extended from inference into training, covering the full range of neural-network functions.
“Gaudi offers strong performance and industry-leading power efficiency among AI training accelerators. As the first AI processor to integrate 100G Ethernet links with RoCE support, it enables large clusters of accelerators built using industry-standard components.”
The Gaudi processor includes 32GB of HBM-2 memory and is currently offered in two forms:
• HL-200 – a PCIe card supporting eight ports of 100Gb Ethernet;
• HL-205 – a mezzanine card compliant with the OCP-OAM specification, supporting 10 ports of 100Gb Ethernet or 20 ports of 50Gb Ethernet.
Habana is introducing an 8-Gaudi system called HLS-1, which includes eight HL-205 Mezzanine cards, with PCIe connectors for external Host connectivity and 24 100Gbps Ethernet ports for connecting to off-the-shelf Ethernet switches, thus allowing scaling-up in a standard 19’’ rack by populating multiple HLS-1 systems.
Gaudi is the second purpose-built AI processor to be launched by Habana Labs in the past year, following the Habana Goya AI Inference Processor. Goya has been shipping since Q4, 2018, and has demonstrated industry-leading inference performance, with the industry’s highest throughput, highest power efficiency (images-per-second per Watt), and real-time latency.
David Dahan, CEO and Co-founder of Habana Labs said: “Training AI models require exponentially higher compute every year, so it’s essential to address the urgent needs of the datacentre and cloud for radically improved productivity and scalability. With Gaudi’s innovative architecture, Habana delivers the industry’s highest performance while integrating standards-based Ethernet connectivity that enables unlimited scale.
It already has a client in the form of the social networking site, Facebook. Facebook Director of Technology Vijay Rao said that the use of open platforms was an important bonus.
“We are pleased that the Habana Goya AI inference processor has implemented and open-sourced the backend for the Glow machine learning compiler and that the Habana Gaudi AI training processor is supporting the OCP Accelerator Module (OAM) specification”, he said.
Habana will be sampling the Gaudi to select customers in the second half of 2019.