Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Source
- Raw Markdown: paper_open-x-embodiment-2023.md
- PDF: paper_open-x-embodiment-2023.pdf
- Preprint: arXiv 2310.08864
- Project page: robotics-transformer-x.github.io
Core Claim
Open X-Embodiment consolidates many robot-learning datasets into a standardized multi-embodiment repository and shows that RT-X policies can transfer skills across robot platforms.
Sensor-Time-Series Notes
- The dataset is a large collection of real robot trajectories rather than a passive forecasting benchmark.
- The relevant time-series unit is a trajectory with image observations, language instructions, and control inputs.
- The repository uses RLDS to accommodate different action spaces and sensor modalities across robots.
- The RT-X experiments coarsely align observations and actions by selecting a canonical camera view, resizing images, and mapping controls into a 7-DoF end-effector action representation before discretization.
Model Notes
RT-1-X and RT-2-X represent two common robotics foundation-model interfaces. RT-1-X treats recent image history plus language as inputs to a Transformer policy that emits discretized actions. RT-2-X maps robot actions into language-token-like outputs so a vision-language model can be co-fine-tuned for control.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Causal structure, counterfactuals, and control | adjacent | Provides large cross-embodiment robot trajectories with image observations, language instructions, and action outputs, which is an analogy for the digital-world robot north star. | Physical robot policy data does not model digital telemetry or future observations under actions. |
| Context interface | adjacent | Uses language instructions plus recent visual observations to condition action generation across embodiments. | No channel metadata, topology, or numeric system-context contract. |
| Benchmarks: control utility | adjacent | RT-X experiments evaluate policy success and transfer across robots. | Does not test causal simulation, counterfactual rollouts, or TSFM latent-state quality. |
Links Into The Wiki
- Foundation Time-Series Model Research Agenda
- Robotics Time-Series Modeling
- Action-Conditioned Time-Series Datasets
- World Models
Open Questions
- Which parts of the RT-X alignment recipe are necessary for cross-embodiment transfer, and which are artifacts of the available datasets?
- How should multi-view observations, proprioception, force, tactile, and control-frequency metadata be standardized without erasing embodiment-specific dynamics?