Open Reasoning VLA Model for Humanoid Robots

We are releasing NVIDIA Isaac GR00T N1.7 (Early Access) — an open-source, commercially licensed Vision-Language-Action model for humanoid robots, built on a simple premise: human data is the most scalable source of robot intelligence.

TL;DR

🤖 GR00T N1.7 — open-source, commercially licensed humanoid foundation model, available now on Hugging Face and GitHub
🏭 Factory-floor ready — commercial licensing enables production deployments today, across material handling, packaging, and inspection
🧠 Reasoning built for multi-step tasks — task and subtask-level reasoning improve reliability on complex workflows
🖐 Expanded dexterous manipulation — finger-level control enables contact-rich tasks like small parts assembly and handling fragile components
🔬 First-ever dexterity scaling law — trained on 20,000+ hours of human egocentric video, more human data directly and predictably improves robot dexterity — without mass teleoperation
🚀 GitHub | Hugging Face | Supports LeRobot dataset format

What is GR00T N1.7?

GR00T N1.7 is a 3B-parameter Vision-Language-Action (VLA) model that maps visual observations and natural language instructions to continuous robot actions. It uses an Action Cascade architecture — a dual-system design that separates high-level reasoning from low-level motor control:

System 2 (Vision-Language Model): A Cosmos-Reason2-2B backbone processes image tokens and language instructions to produce high-level action tokens. This is where task decomposition and multi-step reasoning happen.
System 1 (Diffusion Transformer): A 32-layer DiT takes the VLM’s output and live robot state, then denoises them into precise motor commands in real time.

Inputs: RGB image frames (any resolution) + language instruction + robot proprioceptive state (joint positions, velocities, EEF poses)

Outputs: Continuous-value action vectors mapped to the robot’s degrees of freedom

Validated across loco-manipulation, tabletop manipulation, and dexterous bimanual tasks on Unitree G1, Bimanual Manipulator YAM, and AGIBot Genie 1.

Training on Human EgoCentric Video Data

The central research that has been used for GR00T N1.7 is EgoScale — pre-training on 20,854 hours of human egocentric video spanning 20+ task categories, from manufacturing and retail to healthcare and home environments. This is a significant step up from the few thousand hours of robot teleoperation data used to train N1.6.

The intuition: humans and robots share similar embodiments — two hands, a first-person viewpoint, a world full of objects to manipulate. Training on sensorized human video (ego cameras, wrist cameras, hand tracking) gives the model rich manipulation priors without requiring every behavior to be demonstrated on a physical robot first. It moves pre-training beyond what teleoperation can scale to.

The key finding from this work: we discovered the first-ever scaling law for robot dexterity. More human egocentric data produces predictable, consistent improvements in dexterous manipulation capability — going from 1k to 20k hours more than doubles average task completion. This scaling law translates directly into dexterous manipulation capability — enabling 22 DoF hands to perform contact-rich tasks that generalist robot models have historically struggled to achieve.

GR00T N1.7 — 22 DoF hand handling small objects

Inference & Deployment

Install and launch a policy server against your embodiment:

git clone --recurse-submodules https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T
bash scripts/deployment/dgpu/install_deps.sh
source .venv/bin/activate

uv run python gr00t/eval/run_gr00t_server.py 
    --embodiment-tag GR1 
    --model-path nvidia/GR00T-N1.7

Then query it from your environment loop:

from gr00t.policy.server_client import PolicyClient

policy = PolicyClient(host="localhost", port=5555)

obs, info = env.reset()
action, info = policy.get_action(obs)
obs, reward, done, truncated, info = env.step(action)

Inference performance at 4 denoising steps, single camera view can be found here.

GR00T N1.7 is commercially licensed and supported on NVIDIA Ampere, Hopper, Lovelace, Blackwell, and Jetson platforms.

Fine-Tuning on Your Robot

N1.7 supports fine-tuning on custom embodiments using the LeRobot dataset format. Pre-registered embodiments include UNITREE_G1, LIBERO_PANDA, OXE_WIDOWX, and others — or register your own:

CUDA_VISIBLE_DEVICES=0 uv run python gr00t/experiment/launch_finetune.py 
    --base-model-path nvidia/GR00T-N1.7 
    --dataset-path  
    --embodiment-tag  
    --modality-config-path  
    --num-gpus 1 
    --output-dir  
    --max-steps 2000 
    --global-batch-size 32

Upgrading from N1.6? It’s a drop-in swap — point --model-path to nvidia/GR00T-N1.7 and your existing embodiment configs and workflows carry over. The main differences are the upgraded VLM backbone (Cosmos-Reason2-2B) and EgoScale pre-training, which improves out-of-the-box dexterity and generalization before any fine-tuning.

If you build something with GR00T N1.7, we’d love to hear from you.

What's Hot

These are the first Nvidia RTX Spark laptops

Escaping the Valley of Choice in BI

Strava declares war on scrapers ahead of IPO

An OpenAI model solved a famous math problem that stumped humans for 80 years

The First Open Omni-model for Physical AI Reasoning and Action

‘This is fine’ artist KC Green reaches agreement with AI startup Artisan

These are the first Nvidia RTX Spark laptops

Escaping the Valley of Choice in BI

Strava declares war on scrapers ahead of IPO

Quantization from the ground up

David Sacks is done as AI czar — here’s what he’s doing instead

Judge sides with Anthropic to temporarily block the Pentagon’s ban

Most Popular

These are the first Nvidia RTX Spark laptops

Escaping the Valley of Choice in BI

Strava declares war on scrapers ahead of IPO

Our Picks

Quantization from the ground up

David Sacks is done as AI czar — here’s what he’s doing instead

Judge sides with Anthropic to temporarily block the Pentagon’s ban

Subscribe to Updates

What's Hot

Open Reasoning VLA Model for Humanoid Robots

TL;DR

What is GR00T N1.7?

Training on Human EgoCentric Video Data

Inference & Deployment

Fine-Tuning on Your Robot

Related Posts

Subscribe to Updates