DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Source

Core Claim

DROID provides a large in-the-wild robot manipulation dataset spanning many scenes, tasks, and buildings, with synchronized visual observations and language annotations for policy learning.

Sensor-Time-Series Notes

The dataset is embodied trajectory data: each episode is an ordered sequence rather than an independent image or static table row.
Each episode includes synchronized RGB camera streams, camera calibration, depth information, and natural-language instructions.
DROID is useful for studying how generalist policies adapt to new observation streams, scene distributions, and task language.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Context interface	adjacent	The readable raw Markdown describes 76K robot trajectories, 350 hours, synchronized camera streams, depth, calibration, and natural-language instructions.	The local raw Markdown is an incomplete LaTeX conversion and does not expose a general time-series context schema.
Data diversity and long tail	adjacent	DROID spans many scenes, tasks, buildings, and months, making it useful for testing distribution breadth in embodied trajectories.	Demonstration data alone does not provide counterfactual outcomes for untried actions.
Causal and control modeling	insufficient evidence	The source is a trajectory dataset for policy learning, not a benchmark of candidate actions and future outcomes.	Needs explicit action-conditioned rollout targets and intervention evaluation.

Links Into The Wiki

Open Questions

How much policy transfer comes from broader scene coverage versus better temporal coverage of manipulation trajectories?
Which parts of DROID should be modeled as observation history, static context, action history, or exogenous variation?