D4RL: Datasets for Deep Data-Driven Reinforcement Learning

Source

Raw Markdown: paper_d4rl-2020.md
PDF: paper_d4rl-2020.pdf

Core Claim

D4RL packages offline RL trajectories as state-action-reward-next-state datasets across locomotion, navigation, dexterous manipulation, and kitchen tasks.

Action-Time-Series Notes

Treats time as episodic transition sequences rather than regularly sampled calendar time.
Action channel is explicit and is usually the environment control vector.
Useful as a clean low-dimensional starting point for action-conditioned dynamics and model-based offline RL.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Control and counterfactuals	adjacent	Offline trajectories expose state, action, reward, and next-state transitions for learning decision policies from fixed logs.	Simulated episodic RL states are not a streaming multivariate TSFM interface, and the Markdown extract is only an include stub.
Benchmarks: what level of modeling is tested?	partially closes	The benchmark stresses policy utility under narrow, biased, multitask, sparse-reward, human-demo, and mixed-policy datasets.	It does not test observability, numeric context, channel metadata, or always-on latent-state maintenance.
Data diversity, curriculum, and long tail	warning	The paper shows that realistic offline data collection procedures expose failures hidden by simpler online-RL-derived datasets.	No foundation-model-scale pretraining or rare-regime curriculum is provided.

Links Into The Wiki