NeoRL-2
Summary
NeoRL-2 is a near-real-world offline reinforcement-learning benchmark with datasets and evaluation simulators. It provides non-vision action-conditioned trajectories across control tasks designed to expose delays, exogenous factors, safety constraints, rule-based policies, and limited-data regimes.
Official Artifacts
- Official GitHub: https://github.com/polixir/NeoRL2
- Official Hugging Face dataset: https://huggingface.co/datasets/polixirai/NeoRL2
- arXiv preprint: https://arxiv.org/abs/2503.19267
- Dataset metadata snapshot: neorl2-2025
Dataset Shape
The core paper/GitHub tasks are Pipeline, Simglucose, RocketRecovery, RandomFrictionHopper, DMSD, Fusion, and SafetyHalfCheetah. Each task exposes observations, continuous actions, rewards, next observations, and termination flags.
Role In The Wiki
NeoRL-2 belongs in the Tier 1 non-vision action-conditioned world-model dataset bucket. It is a stronger fit than passive forecasting, anomaly-only, or contextual bandit datasets because it exposes transition tuples and rewards.
Relation To Foundation TSFM Agenda
Use the source-level agenda mapping in neorl2-2025 rather than duplicating verdict rows here.
At the entity level, NeoRL-2 is useful as a compact benchmark for action-conditioned dynamics under practical constraints. Its main caveats are simulation realism, preprint status, and artifact/license pinning.