NeoRL-2

Summary

NeoRL-2 is a near-real-world offline reinforcement-learning benchmark with datasets and evaluation simulators. It provides non-vision action-conditioned trajectories across control tasks designed to expose delays, exogenous factors, safety constraints, rule-based policies, and limited-data regimes.

Official Artifacts

Dataset Shape

The core paper/GitHub tasks are Pipeline, Simglucose, RocketRecovery, RandomFrictionHopper, DMSD, Fusion, and SafetyHalfCheetah. Each task exposes observations, continuous actions, rewards, next observations, and termination flags.

Role In The Wiki

NeoRL-2 belongs in the Tier 1 non-vision action-conditioned world-model dataset bucket. It is a stronger fit than passive forecasting, anomaly-only, or contextual bandit datasets because it exposes transition tuples and rewards.

Relation To Foundation TSFM Agenda

Use the source-level agenda mapping in neorl2-2025 rather than duplicating verdict rows here.

At the entity level, NeoRL-2 is useful as a compact benchmark for action-conditioned dynamics under practical constraints. Its main caveats are simulation realism, preprint status, and artifact/license pinning.