RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

Source

RL Unplugged collects logged RL transitions from several domains, including Atari, DeepMind Control, and DeepMind Lab, for offline RL evaluation.

The time-series unit is a stream of replayed transitions with observations, actions, rewards, and discounts.
Action semantics vary by domain, from Atari discrete controls to continuous control actions.
It is valuable for world models because the data is already formatted as action-conditioned dynamics.

Agenda slot	Verdict	Evidence	Missing pieces
Causal structure, counterfactuals, and control	partially closes	Provides logged transition tuples with observations, actions, rewards, next observations, and sequence data for offline RL.	Simulated/game domains do not match operational time-series interventions or digital-system remediation.
Benchmarks: what level of modeling is tested?	partially closes	Separates online and offline policy selection and includes partial observability, action-space diversity, stochasticity, and nonstationarity.	Benchmark scores control algorithms, not general TSFM state, forecast, generation, and explanation surfaces.
Streaming state, long context, and constant updates	adjacent	Sequence data include future states, actions, and rewards for recurrent models needing memory.	Does not test always-on serving or retained latent-state refresh.