Close Menu
AI News TodayAI News Today

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Today’s NYT Mini Crossword Answers for June 1

    Nvidia announces RTX Spark as ‘the most efficient PC chip ever built’

    The First Open Omni-model for Physical AI Reasoning and Action

    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    Facebook X (Twitter) Instagram Pinterest Vimeo
    AI News TodayAI News Today
    • Home
    • Shop
    • AI News
    • AI Reviews
    • AI Tools
    • AI Tutorials
    • Chatbots
    • Free AI Tools
    AI News TodayAI News Today
    Home»AI News»The First Open Omni-model for Physical AI Reasoning and Action
    AI News

    The First Open Omni-model for Physical AI Reasoning and Action

    By No Comments9 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    The First Open Omni-model for Physical AI Reasoning and Action
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Atharva Joshi's avatar


    NVIDIA Cosmos 3 is here – and it’s available on Hugging Face today. Cosmos 3 represents a major leap forward in world foundation models (WFMs) for physical AI: a single, unified omni-model that combines world generation, physical reasoning, and action generation in one model. No more juggling between different models and inference pipelines – Cosmos 3 does it all.

    Whether you’re building for robotics, autonomous vehicles, or smart spaces, Cosmos 3 gives you the foundation to simulate and understand the physical world.

    Here’s what’s shipping with this release:

    • Cosmos 3 Super and Cosmos 3 Nano on Hugging Face with model cards and licensing
    • Cosmos 3 Diffusers integration for generation pipelines
    • Post-training scripts for training Cosmos 3 on your own data (on GitHub)
    • Open synthetic data generation (SDG) datasets for physical AI

    TABLE OF CONTENTS

    1. What’s new with Cosmos 3?
    2. Cosmos 3 Capabilities
    3. Using Cosmos 3 with Diffusers
    4. Datasets for physical AI
    5. Cosmos Framework
    6. Resources



    SECTION 1: What’s new with Cosmos 3?

    The biggest change in Cosmos 3 compared to previous Cosmos releases is that it’s an omni-model, built on a Mixture-of-Transformers (MoT) architecture. Previously, developers had to work with separate models for different capabilities like world generation (Cosmos Predict), controlled generation (Cosmos Transfer), scene understanding (Cosmos Reason) and policy generation (Cosmos Policy). Cosmos 3 enables all of this in a single model that can reason and generate different modalities in one unified forward pass.

    This means you can now do all this from one model:

    • Generate realistic and physically plausible video worlds from text, images, videos or action inputs
    • Reason about physical properties like motion, causality, and spatial relationships
    • Predict future video and action sequences based on the current state

    Why this matters for physical AI

    Cosmos 3 helps build physical AI systems capable of understanding the real world. Not just pixels and tokens, but motion, causality, physics, and action. If you’re training a robot to fold laundry, building an autonomous driving simulation, or generating synthetic training data for warehouse safety scenarios, Cosmos 3 is the foundation model designed for exactly these use-cases.


    Video generated by Cosmos 3 for robotics pick and place use-cases.


    Video generated by Cosmos 3 for long tail driving scenarios.


    Image-to-video generation using Cosmos 3 for warehouse safety data.


    Cosmos 3 chain-of-thought reasoning in an autonomous driving application.

    Architecture

    Cosmos 3 is built on an MoT backbone that processes all modalities – text, image, video, audio, and action – within a single unified architecture. Each modality is first encoded by a dedicated encoder (a ViT for visual understanding, a VAE for visual/audio generation, and domain-aware vectors for actions), then projected into a shared representation space.

    cosmos3-architecture-diagram

    The input sequence is split into two subsequences: an autoregressive (AR) subsequence that handles reasoning and understanding via next-token prediction, and a diffusion (DM) subsequence that handles generation via iterative denoising. AR and DM tokens use separate parameter sets within each transformer layer but interact through joint attention – this is what lets a single model seamlessly switch between acting as a VLM, a video generator, a forward/inverse dynamics model, or a robot policy without any architectural changes.

    Model Versions

    This release of Cosmos 3 includes two model sizes, optimized for different deployment scenarios:

    • Cosmos 3 Nano – This is the 8B parameter model (8B reasoner and 8B generator), optimized for efficient inference. Cosmos 3 Nano is designed to run on workstation-grade compute like the RTX PRO 6000 GPU, and is available on Hugging Face at nvidia/Cosmos3-Nano.
    • Cosmos 3 Super – This is the 32B parameter model (32B reasoner and 32B generator) designed for large-scale synthetic data generation (SDG) and research, and runs on NVIDIA Hopper and Blackwell GPUs. Cosmos 3 Super is available on Hugging Face at nvidia/Cosmos3-Super.



    SECTION 2: Cosmos 3 Capabilities

    Cosmos 3 supports multiple input and generation modalities through a single unified model:

    Input Modality Output Modality Application
    Text | Image | Video Video Video Model
    Text | Video Text Vision Language Model (VLM)
    Action | Image | Text Video Forward Dynamics Model
    Text | Video Action Inverse Dynamics Model
    Image | Text Video & Action Policy Model

    Prompt Guide

    For video generation, we recommend using detailed prompts in the form of narrative paragraphs. For example:

    The video begins with a view from inside a vehicle traveling on a multi-lane highway under a clear blue sky. The road is bordered by dense green trees on both sides, creating a tranquil environment. Several vehicles, including a prominent white semi-truck and various cars, are visible ahead, maintaining a steady pace. The highway features multiple lanes separated by concrete barriers, and the scene is bathed in bright sunlight, indicating a clear day. As the video progresses, a large amount of debris suddenly appears on the lane ahead. With little time to avoid it, the ego vehicle has to drive over the debris and continue moving forward. A noticeable jolt occurs as the ego vehicle passes over the scattered objects. A point-of-view shot from inside the vehicle, capturing the road ahead and the surrounding environment.

    For action generation, prompts should be concise and provide spatial references. For example:

    Put the pot to the left of the purple item. This video is captured from a first-person perspective looking at the scene.

    Find the prompt upsampling template, and best practices for writing high-quality prompts in the prompting guide on GitHub.



    SECTION 3: Using Cosmos 3 with Diffusers

    Cosmos 3 is integrated with the Hugging Face Diffusers library, making it easy to use world generation pipelines with just a few lines of code. You can run Cosmos 3 through the familiar DiffusionPipeline via Cosmos3OmniPipeline. With this, the goal is enabling frictionless adoption of Cosmos 3 and integration with your existing pipelines.

    Let’s see a Text-to-Image example for single frame generation using the Cosmos 3 Nano model:

    import torch
    from diffusers import Cosmos3OmniPipeline
    
    pipe = Cosmos3OmniPipeline.from_pretrained(
        "nvidia/Cosmos3-Nano", torch_dtype=torch.bfloat16, device_map="cuda"
    )
    
    prompt = (
        "A medium shot of a modern robotics research laboratory with white walls and a gray floor. "
        "A robotic arm with a metallic finish is mounted on a clean white workbench, its gripper positioned "
        "above a row of small colored objects. A laptop and neatly arranged tools sit beside the robot. "
        "A large monitor on the wall behind displays a software interface. The scene is brightly lit by "
        "overhead fluorescent lights."
    )
    
    result = pipe(prompt=prompt, num_frames=1, height=720, width=1280)
    result.video[0].save("cosmos3_t2i.jpg", format="JPEG", quality=85)
    

    Here’s the image generated by the Cosmos 3 Nano model and given prompt:

    The documentation also has examples on Text-to-Video, Image-to-Video and more. Find information and API usage in the Cosmos 3 Diffusers documentation.



    SECTION 4: Datasets for physical AI

    As part of the Cosmos 3 launch, NVIDIA is releasing a set of Synthetic Data Generation (SDG) datasets to help the physical AI community train and evaluate world foundation models. These datasets were generated by various NVIDIA teams and are available on Hugging Face.



    Section 5: Cosmos Framework

    Cosmos Framework is an end-to-end framework for training and serving WFMs like Cosmos 3. This is where you’ll find inference and post-training scripts, and agent skills for development.

    Post-training Cosmos 3

    Cosmos 3 understands and generates world videos and actions for robotics, autonomous vehicles, and smart spaces out of the box, but some applications may require further post-training on specific datasets to get the best results. We encourage post-training Cosmos 3 for different robots, environments, and tasks – check out the post-training guide in the repo.

    Agent Skills

    The repo also comes with agent skills to make development fast and easy. These skills help validate requirements, and set up the environment with dependencies. You can also use them for learning about the repo structure and examples, drafting good prompts, or running the inference and post-training scripts.



    SECTION 6: Resources

    Read the Cosmos 3 technical blog to learn about Cosmos 3 capabilities, performance, post-training, and deployment with NIM microservices.



    Acknowledgments

    Cosmos 3 is the result of amazing collaboration between many teams and people across NVIDIA, including –

    Adeline Aubame, Aditya Mahajan, Aigul Dzhumamuratova, Akash Gokul, Akul Santhosh, Aleksandr Efitorov, Alex Sotelo, Alexander Schwarz, Alperen Degirmenci, Amol Fasale, Andrew Tham, Ankur Handa, Arihant Jain, Arslan Ali, Artur Zolkowski, Aryaman Gupta, Asawaree Bhide, Ashkan Mirzaei, Ashley Chow, Ashna Khetan, Atharva Joshi, Barnaby Simkin, Benedikt Falk, Brett Hamilton, Carlos Casanova, Chaeyeon Chung, Charles Zhou, Chen-Hsan Lin, Chen-Hsuan Lin, Chhavi Nijhawan, Chieh-Yun Chen, Chintan Shah, Chris Helvig, Chris Pruett, Cindy Zha, Cyrus Hogg, Dahjung Chung, Dan Blick, David Wehr, Dawid Majchrowski, DeLesley Hutchins, Delin Qu, Dennis Lynch, Diego Garzon, Dima Zhylko, Durra Mohsin, Egor Krivov, Ekram Mukbil, Eric Cameracci, Fangyin Wei, Fengzhe Zhou, Francesco Ferroni, Freya Li, George Kurian, Gwanghyun Kim, Haaland Hao Liang, Hai Loc Lu, Hans Yang, Hao Liang, Hao Wang, Hesam Rabeti, Hugo Hadfield, Hyejin Moon, Itai Zadok, Jayjun Lee, Jeana Choi, JF Lafleche, Jiangran Lyu, Jiaojiao Fan, Jiaxiang Tang, Jibin Varghese, Jim Fan, Jingyi Jin, Jinwei Gu, Jon Allen, Joshua Bapst, Joyjit Daw, Julia Kiczka, Julian Ouyang, Kaichun Mo, Kayley Ting, Ke Ding, Kedi Wu, Kevin Brady, Kirill Motkov, Kristen Rumley, Krzysztof Tomala, Liang Feng, Liangkai Zhang, Ling Li, Louis Marcoux, Maciej Bala, Madison Huang, Magdalena Dadela, Mahesh Patekar, Marco Di Lucca, Marilyn Reeb, Mark Carlson, Martin Antolini, Mateusz Sieniawski, Matt Cragun, Meredith Price, Michael Huang, Miguel Guerrero, Miguel Martin, Min Shi, Ming-Yu Liu, Mohammad Harrim, Morteza Ramezanali, Mukesh Beladiya, Nalin Dadhich, Naomi Eigbe, Nathan Hayes-Roth, Nicole Drumheller, Nikhilesh Joshi, Omar Laymoun, Paris Zhang, Paula Ramos, Pawel Morkisz, Peter Gambrill, Pooya Jannaty, Pooya Khaloo, Pranjali Joshi, Qi Wang, Qianli Ma, Qiao Wang, Qing Miao, Qizhi Chen, Rahul Heinrich Steiger, Raju Wagwani, Robert Denomme, Rodrigo Vieira Del Monte, Roy Anthony, Ruqing Xu, Ryan Bernard, Ryan Ji, Saeid Motiian, Sandip Bhaskar, Sandra Skaff, Santanu Dutta, Saurav Kumar, Sehwi Park, Sergiy Fefilatyev, Shangkun Sun, Shangru Li, Shilin Zhu, Shreyas Misra, Shun Zhang, Shuran Song, Simon Yuen, Simon Zhang, Slawek Kierat, Smita Ithape, Soha Pouya, Sophia Huang, Stefanie Manzinger, Steven Baughman, Suneel Indupuru, Sunil Srinivasa, Sunny Kim, Tavish Chen, Thabang Ngazimbi, Thomas Volk, Tianwei She, Tiffany Cai, Ting-Chun Wang, TJ Galda, Tolou Tavakkoli, Tomasz Kornuta, Trung Pham, Tsung-Yi Lin, Vanni Brighella, Varun Praveen, Wei-Cheng Tseng, Wenjie Luo, Wesley Li, Wojciech Kutak, Wojciech Rymer, Xiangyu Lu, Xiaodong Yang, Xiaotong Chen, Xin Kong, Xinquan Xu, Xiu Chia, Xuning Yang, Yan Chang, Yan Wang, Yanan Jian, Yao Xu, Yashraj Narang, Yeongho Seol, Yichu Yang, Yifan Ding, Yihuai Gao, Yilin Zhao, Yin Cui, Yogesh Balaji, Yu Wang, Yu-Wei Chao, Yue Tang, Yufan Huang, Yuke Zhu, Yuliya Zhautouskaya, Yurong You, Yuzhu Dong, Zaid Pervaiz Bhat, Zekun Hao, Zhaoshuo Li, Zhizheng Zhang.

    action Omnimodel open Physical Reasoning
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleThis is the Microsoft Surface Laptop Ultra with Nvidia RTX Spark
    Next Article Nvidia announces RTX Spark as ‘the most efficient PC chip ever built’
    • Website

    Related Posts

    AI News

    ‘This is fine’ artist KC Green reaches agreement with AI startup Artisan

    AI News

    Making sense of the debate over AI psychosis

    AI News

    Reassessing 1986’s SpaceCamp – Ars Technica

    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Today’s NYT Mini Crossword Answers for June 1

    0 Views

    Nvidia announces RTX Spark as ‘the most efficient PC chip ever built’

    0 Views

    The First Open Omni-model for Physical AI Reasoning and Action

    0 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    AI Tutorials

    Quantization from the ground up

    AI Tools

    David Sacks is done as AI czar — here’s what he’s doing instead

    AI Reviews

    Judge sides with Anthropic to temporarily block the Pentagon’s ban

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Today’s NYT Mini Crossword Answers for June 1

    0 Views

    Nvidia announces RTX Spark as ‘the most efficient PC chip ever built’

    0 Views

    The First Open Omni-model for Physical AI Reasoning and Action

    0 Views
    Our Picks

    Quantization from the ground up

    David Sacks is done as AI czar — here’s what he’s doing instead

    Judge sides with Anthropic to temporarily block the Pentagon’s ban

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Terms & Conditions
    • Privacy Policy
    • Disclaimer

    © 2026 ainewstoday.co. All rights reserved. Designed by DD.

    Type above and press Enter to search. Press Esc to cancel.