Wednesday, June 17, 2026
AI Coding Agents Are Now Training Physical Robots — Nvidia's ENPIRE Hits 99% Success
Posted by

What Just Happened
AI coding agents — the same tools developers use to write Python, debug TypeScript, and refactor codebases — are now training physical robots in the real world.
Nvidia's GEAR Lab, in collaboration with Carnegie Mellon University and UC Berkeley, just published ENPIRE (ENvironment-Policy Improvement-Rollout-Evolution), a harness framework that gives frontier coding agents direct control over a fleet of robot hardware. The results are hard to ignore: up to 99% success on dexterous manipulation tasks that would traditionally require weeks of human effort to script, tune, and debug.
The agents used? OpenAI's Codex (GPT-5.5), Anthropic's Claude Code (Opus 4.7), and Kimi Code (Kimi K2.6). The robots? A fleet of eight dual-arm YAM stations.
This is the first time AI coding agents have closed the full physical reinforcement learning loop — from hypothesis to code to robot execution to autonomous iteration — entirely on their own.
The Problem ENPIRE Solves
Training a robot to do something in simulation is relatively straightforward these days. Simulators are fast, safe, and parallelizable. The hard part comes when you deploy to real hardware: the scene needs resetting between trials, policies need tuning based on real-world physics (which never matches simulation perfectly), and every failure mode needs manual debugging.
Traditionally, this human-in-the-loop overhead is the bottleneck. A researcher resets the robot arm, checks the logs, tweaks the reward function, re-runs training, and repeats. Days or weeks per iteration.
ENPIRE replaces the human with a coding agent that can:
- Read research papers and form hypotheses about what to try next
- Write and edit training code directly
- Reset physical scenes between trials using the robot itself
- Run policies across multiple robots in parallel
- Analyze logs and iterate on failure modes
- Consult literature for known solutions to specific problems
The loop runs autonomously. The agent picks the training method — behavior cloning, reinforcement learning, or a hybrid — based on real-world success signals, not a pre-scripted plan.
The Architecture: EN-PI-R-E
ENPIRE's name spells out its four core modules. Each maps to a phase of the robot learning loop that the coding agent orchestrates.
EN — Environment Module
Handles automatic scene reset and verification. After each trial, the coding agent directs the robot to reset the workspace — reposition objects, clear debris, return the arm to starting pose — then verifies the scene is ready for the next attempt. No human touches the hardware between iterations.
PI — Policy Improvement Module
The agent launches a policy refinement run. It might start with behavior cloning from a few human demonstrations, switch to reinforcement learning when the policy plateaus, or try a hybrid approach. The agent chooses the method. If a reward function isn't working, it reads the training logs, forms a hypothesis about what's wrong, rewrites the code, and tries again.
R — Rollout Module
Evaluates the current policy on single or multiple physical robots running in parallel. ENPIRE scales the rollout across the fleet — all eight YAM stations can execute the same policy simultaneously, generating trial data at 8x the rate of a single robot. The agent collects success/failure statistics per trial and feeds them back into improvement.
E — Evolution Module
The most interesting piece. The coding agent analyzes the aggregated logs, identifies failure modes, consults relevant research literature, and decides what to change. It might improve the training infrastructure, adjust algorithm hyperparameters, or rewrite entire sections of the policy code. Then the cycle repeats.
This is not a fixed pipeline. The agent decides which module to invoke and in what order, guided by real-world outcomes. The system is genuinely autonomous.
The Results
ENPIRE was evaluated on four dexterous manipulation tasks:
| Task | Description | Success Rate |
|---|---|---|
| PushT | Slide a T-shaped block into a target position and orientation | 99% |
| Pin Organization | Pick and sort pins into a box with precise placement | 99% |
| GPU Seating | Insert a GPU into a motherboard slot — high precision required | 99% |
| Cable Tie Cutting | Position and cut a zip-tie with a gripper-mounted tool | 99% |
The 99% pass@8 metric means that across 8 parallel robot stations running the same policy, at least one succeeded in 99% of trial batches. For tasks that traditionally require millimeter precision and days of manual tuning, this is a step change.
The fleet ran continuously. The coding agents managed the full lifecycle — deploy a policy, run trials, aggregate results, debug failures, write improved code, deploy again — without a human touching the hardware between iteration cycles.
Which Coding Agents Worked
ENPIRE is agent-agnostic. The research tested three frontier coding agents:
- Codex (GPT-5.5) — OpenAI's latest coding agent
- Claude Code (Opus 4.7) — Anthropic's terminal-based agent
- Kimi Code (Kimi K2.6) — The open-weights contender from Moonshot AI
All three were able to close the loop, though performance varied by task and the agents exhibited different failure-mode diagnosis styles. The framework itself doesn't care which agent you use — it provides a standardized harness, and the agent handles the reasoning.
Why This Changes Things
The obvious implication is that robotics training just got dramatically faster. But the bigger story is about convergence.
Coding agents were designed to manipulate symbols — code, text, APIs. ENPIRE extends that reach to the physical world. The same Claude Code agent you use to refactor a React component can now write a policy, deploy it to a robot arm, watch the trials, and iterate until the robot can seat a GPU with surgical precision.
This has three implications worth watching:
1. Robotics hits software iteration speeds. The bottleneck in robotics has never been hardware — it's the human time required to tune policies for real-world physics. ENPIRE collapses weeks of manual iteration into autonomous loops that run overnight.
2. Agentic AI graduates from chat to the real world. Until now, the conversation about AI agents has been about software tools — writing code, browsing the web, querying APIs. ENPIRE puts agents in control of physical hardware, with real-world consequences for success and failure.
3. Fleet-scale learning becomes practical. With eight robots running in parallel, each managed by its own coding agent, the data generation rate is multiplicative. This is the same playbook that made large language models work — scale the training data — applied to physical robotics.
What This Means for Developers
If you're building AI coding agents — or using them — ENPIRE signals a shift in what's possible. The same agent architecture that powers code generation now extends to physical policy optimization. The API your agent calls might not be a REST endpoint. It might be a robot arm.
For now, the hardware setup (eight YAM stations, GPU clusters per station) is research-grade. But the pattern is portable. As Nvidia's Isaac platform standardizes the robot-agent interface, expect this pattern to trickle down to smaller setups — a single arm in a university lab, a pick-and-place station in a factory, eventually a robot in a warehouse.
What's Next
ENPIRE was published as part of Nvidia's broader CVPR 2026 research wave, which also includes new agent skills for autonomous vehicles, vision AI, and the Cosmos 3 world model platform. GEAR Lab co-lead Jim Fan has positioned ENPIRE as a step toward what he calls "physical foundation models" — agents that learn not just from text and code, but from physical interaction.
The paper and project page are live at research.nvidia.com/labs/gear/enpire. The architecture, agent variants, and full benchmark results are documented there.
For developers watching the agent space: keep an eye on this. If coding agents can close the physical RL loop today, the boundary between digital and physical AI is thinner than most people think.