ENPIRE's architecture is a closed loop made of four modules, each handling a critical part of the physical research process :
EN — Environment module: Automatically resets the physical scene to a randomized initial state and verifies task completion using vision-based reward functions (e.g., segmentation models and bounding-box detectors). No human resets the robot between trials .
PI — Policy Improvement module: Launches policy refinement using any of several regimes—heuristic learning, tool calling, behavior cloning, offline reinforcement learning, or online RL. The coding agent proposes algorithmic hypotheses and writes the code .
R — Rollout module: Evaluates the candidate policy on single or multiple physical robots operating in parallel. It preserves state, action, video, and outcome data for audit .
E — Evolution module: Coding agents analyze logs, consult research literature, compare branches, and modify training infrastructure and algorithm code to address failure modes. Successful recipes are reused; failing hypotheses are pruned .
Rather than inventing an exotic orchestration layer, the framework relies on a familiar tool for distributed collaboration: Git. When one agent-station achieves a breakthrough, it commits the improved policy code. Other stations pull the update and build on it, enabling distributed, asynchronous improvement without centralized coordination .
The team deployed eight AI coding agents paired with eight robotic workstations, each equipped with dual six-degree-of-freedom mechanical arms, Intel RealSense depth cameras, and local NVIDIA RTX 5090 GPUs. Given an allocation of GPUs and a
Comments
0 comments