Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Source

Raw Markdown: paper_open-x-embodiment-2023.md
PDF: paper_open-x-embodiment-2023.pdf
Preprint: arXiv 2310.08864
Project page: robotics-transformer-x.github.io

Core Claim

Open X-Embodiment consolidates many robot-learning datasets into a standardized multi-embodiment repository and shows that RT-X policies can transfer skills across robot platforms.

Sensor-Time-Series Notes

The dataset is a large collection of real robot trajectories rather than a passive forecasting benchmark.
The relevant time-series unit is a trajectory with image observations, language instructions, and control inputs.
The repository uses RLDS to accommodate different action spaces and sensor modalities across robots.
The RT-X experiments coarsely align observations and actions by selecting a canonical camera view, resizing images, and mapping controls into a 7-DoF end-effector action representation before discretization.

Model Notes

RT-1-X and RT-2-X represent two common robotics foundation-model interfaces. RT-1-X treats recent image history plus language as inputs to a Transformer policy that emits discretized actions. RT-2-X maps robot actions into language-token-like outputs so a vision-language model can be co-fine-tuned for control.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Causal structure, counterfactuals, and control	adjacent	Provides large cross-embodiment robot trajectories with image observations, language instructions, and action outputs, which is an analogy for the digital-world robot north star.	Physical robot policy data does not model digital telemetry or future observations under actions.
Context interface	adjacent	Uses language instructions plus recent visual observations to condition action generation across embodiments.	No channel metadata, topology, or numeric system-context contract.
Benchmarks: control utility	adjacent	RT-X experiments evaluate policy success and transfer across robots.	Does not test causal simulation, counterfactual rollouts, or TSFM latent-state quality.

Links Into The Wiki

Open Questions

Which parts of the RT-X alignment recipe are necessary for cross-embodiment transfer, and which are artifacts of the available datasets?
How should multi-view observations, proprioception, force, tactile, and control-frequency metadata be standardized without erasing embodiment-specific dynamics?

Alex Open Research Wiki

Explorer

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Source

Core Claim

Sensor-Time-Series Notes

Model Notes

Foundation TSFM Relevance

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks