NVIDIA Launches Cosmos World Foundation Model PlatformNVIDIA Launches Cosmos World Foundation Model Platform to Accelerate Physical AI Development
NVIDIA announced NVIDIA Cosmos, a platform comprising state-of-the-art generative world foundation models, advanced tokenizers, guardrails and an accelerated video processing pipeline built to advance the development of physical AI systems such as autonomous vehicles and robots. Physical AI models are costly to develop, and require vast amounts of real-world data and testing. Cosmos world foundation models, or WFMs, offer developers an easy way to generate massive amounts of photoreal, physics-based synthetic data to train and evaluate their existing models. Developers can also build custom models by fine-tuning Cosmos WFMs. Cosmos models will be available under an open model license to accelerate the work of the robotics and AV community. Developers can preview the first models on the NVIDIA API catalog, or download the family of models and fine-tuning framework from the NVIDIA NGC™ catalog or Hugging Face. Leading robotics and automotive companies, including 1X, Agile Robots, Agility, Figure AI, Foretellix, Fourier, Galbot, Hillbot, IntBot, Neura Robotics, Skild AI, Virtual Incision, Waabi and XPENG, along with ridesharing giant Uber, are among the first to adopt Cosmos. “The ChatGPT moment for robotics is coming. Like large language models, world foundation models are fundamental to advancing robot and AV development, yet not all developers have the expertise and resources to train their own,” said Jensen Huang, founder and CEO of NVIDIA. “We created Cosmos to democratize physical AI and put general robotics in reach of every developer.” Open World Foundation Models to Accelerate the Next Wave of AI NVIDIA Cosmos’ suite of open models means developers can customize the WFMs with datasets, such as video recordings of AV trips or robots navigating a warehouse, according to the needs of their target application. Cosmos WFMs are purpose-built for physical AI research and development, and can generate physics-based videos from a combination of inputs, like text, image and video, as well as robot sensor or motion data. The models are built for physically based interactions, object permanence, and high-quality generation of simulated industrial environments — like warehouses or factories — and of driving environments, including various road conditions. In his opening keynote at CES, NVIDIA founder and CEO Jensen Huang showcased ways physical AI developers can use Cosmos models, including for:
Advanced World Model Development Tools Building physical AI models requires petabytes of video data and tens of thousands of compute hours to process, curate and label that data. To help save enormous costs in data curation, training and model customization, Cosmos features:
World’s Largest Physical AI Industries Adopt Cosmos Pioneers across the physical AI industry are already adopting Cosmos technologies. 1X, an AI and humanoid robot company, launched the 1X World Model Challenge dataset using Cosmos Tokenizer. XPENG will use Cosmos to accelerate the development of its humanoid robot. And Hillbot and Skild AI are using Cosmos to fast-track the development of their general-purpose robots. “Data scarcity and variability are key challenges to successful learning in robot environments,” said Pras Velagapudi, chief technology officer at Agility. “Cosmos’ text-, image- and video-to-world capabilities allow us to generate and augment photorealistic scenarios for a variety of tasks that we can use to train models without needing as much expensive, real-world data capture.” Transportation leaders are also using Cosmos to build physical AI for AVs:
“Generative AI will power the future of mobility, requiring both rich data and very powerful compute,” said Dara Khosrowshahi, CEO of Uber. “By working with NVIDIA, we are confident that we can help supercharge the timeline for safe and scalable autonomous driving solutions for the industry.” Developing Open, Safe and Responsible AI NVIDIA Cosmos was developed in line with NVIDIA’s trustworthy AI principles, which prioritize privacy, safety, security, transparency and reducing unwanted bias. Trustworthy AI is essential for fostering innovation within the developer community and maintaining user trust. NVIDIA is committed to safe and trustworthy AI, in line with the White House’s voluntary AI commitments and other global AI safety initiatives. The open Cosmos platform includes guardrails designed to mitigate harmful text and images, and features a tool to enhance text prompts for accuracy. Videos generated with Cosmos autoregressive and diffusion models on the NVIDIA API catalog include invisible watermarks to identify AI-generated content, helping reduce the chances of misinformation and misattribution. NVIDIA encourages developers to adopt trustworthy AI practices and further enhance guardrail and watermarking solutions for their applications. Source: NVIDIA media announcement |