Oracle Unveils Oracle Cloud Zettascale10 Cluster for AIOracle Unveils Next-Generation Oracle Cloud Infrastructure Zettascale10 Cluster for AI
Oracle announced Oracle Cloud Infrastructure Zettascale10, the largest AI supercomputer in the cloud. OCI Zettascale10 connects hundreds of thousands of NVIDIA GPUs across multiple data centers to form multi-gigawatt clusters that deliver up to an unprecedented 16 zettaFLOPS of peak performance. OCI Zettascale10 is the fabric underpinning the flagship supercluster built in collaboration with OpenAI in Abilene, Texas, as part of Stargate. Built on next-generation Oracle Acceleron RoCE networking architecture, OCI Zettascale10 is powered by NVIDIA AI infrastructure that delivers breakthrough scale, extremely low GPU-GPU latency across the cluster, industry-leading price-performance, improved cluster utilization, and the reliability required for large scale AI workloads. OCI Zettascale10 is a powerful evolution of the first Zettascale cloud computing cluster, which was introduced in September 2024. OCI Zettascale10 clusters are housed in large gigawatt data center campuses that are hyper-optimized for density within a two-kilometer radius to offer the best GPU-GPU latency for large scale AI training workloads. This architecture is being deployed with OpenAI at the Stargate site in Abilene. “With OCI Zettascale10, we’re fusing OCI’s groundbreaking Oracle AcceleronRoCE network architecture with next-generation NVIDIA AI infrastructure to deliver multi‑gigawatt AI capacity at unmatched scale,” said Mahesh Thiagarajan, executive vice president, Oracle Cloud Infrastructure. “Customers can build, train, and deploy their largest AI models into production using less power per unit of performance and achieving high reliability. In addition, customers will have the freedom to operate across Oracle’s distributed cloud with strong data and AI sovereignty controls.” “OCI Zettascale10 network and cluster fabric was developed and deployed first at the flagship Stargate site in Abilene, Texas – our joint supercluster with Oracle,” said Peter Hoeschele, vice president, Infrastructure and Industrial Compute, OpenAI. “The highly scalable custom RoCE design maximizes fabric-wide performance at gigawatt scale while keeping most of the power focused on compute. We’re excited to keep scaling Abilene and the broader Stargate program together.” OCI plans to offer multi-gigawatt deployments of OCI Zettascale10 to customers. Initially, OCI Zettascale10 clusters will target deployments of up to 800,000 NVIDIA GPUs delivering predictable performance and strong cost efficiency, with high GPU‑to‑GPU bandwidth enabled by Oracle Acceleron’s ultra‑low‑latency RoCEv2 networking. “Oracle and NVIDIA are bringing together OCI’s distributed cloud and our full‑stack AI infrastructure to deliver AI at extraordinary scale,” said Ian Buck, vice president of Hyperscale, NVIDIA. “Featuring NVIDIA full-stack AI infrastructure, OCI Zettascale10 provides the compute fabric needed to advance state‑of‑the‑art AI research and help organizations everywhere move from experimentation to industrialized AI.” Oracle AcceleronRoCE networking delivers scale, reliability, and efficiency for AI on OCI Zettascale10 Oracle AcceleronRoCE networking architecture is a critical innovation for customers to build, train, and inference AI workloads in the cloud, while taking full advantage of OCI Zettascale10’s power and capabilities. It uses the switching capability built into modern GPU NICs (network interface cards), allowing them to connect to multiple switches simultaneously, with each on a separate and isolated network plane. This approach dramatically increases the network’s overall scale and reliability by shifting traffic to other network planes when one has a problem, avoiding costly stalls and restarts. Key features of Oracle AcceleronRoCE networking that help customers with their critical AI workloads, include:
OCI is now taking orders for OCI Zettascale10, which will be available in the second half of next calendar year, with up to 800,000 NVIDIA AI infrastructure GPU platforms Source: Oracle media announcement | |