RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

Source

Core Claim

RL Unplugged collects logged RL transitions from several domains, including Atari, DeepMind Control, and DeepMind Lab, for offline RL evaluation.

Action-Time-Series Notes

  • The time-series unit is a stream of replayed transitions with observations, actions, rewards, and discounts.
  • Action semantics vary by domain, from Atari discrete controls to continuous control actions.
  • It is valuable for world models because the data is already formatted as action-conditioned dynamics.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Causal structure, counterfactuals, and controlpartially closesProvides logged transition tuples with observations, actions, rewards, next observations, and sequence data for offline RL.Simulated/game domains do not match operational time-series interventions or digital-system remediation.
Benchmarks: what level of modeling is tested?partially closesSeparates online and offline policy selection and includes partial observability, action-space diversity, stochasticity, and nonstationarity.Benchmark scores control algorithms, not general TSFM state, forecast, generation, and explanation surfaces.
Streaming state, long context, and constant updatesadjacentSequence data include future states, actions, and rewards for recurrent models needing memory.Does not test always-on serving or retained latent-state refresh.