# NeoRL-2

Canonical source: <https://github.com/polixir/NeoRL2>
Official dataset: <https://huggingface.co/datasets/polixirai/NeoRL2>
Introducing source: <https://arxiv.org/abs/2503.19267>
Wiki source: [NeoRL-2](../../wiki/sources/neorl2-2025.md)

## Dataset Type

NeoRL-2 is a near-real-world offline reinforcement-learning benchmark with offline datasets and corresponding evaluation simulators. It extends NeoRL with tasks designed around delays, exogenous disturbances, global safety constraints, rule-based behavior policies, and limited data.

## Temporal Structure

The GitHub interface returns training and validation data with `obs`, `next_obs`, `action`, `reward`, `done`, and `index` fields. The Hugging Face parquet dataset uses the equivalent `observations`, `actions`, `rewards`, `next_observations`, and `terminals` fields.

## Tasks And Shapes

| Task | Observation shape | Action shape | Done flag | Max timesteps |
|---|---:|---:|---|---:|
| Pipeline | 52 | 1 | false | 1000 |
| Simglucose | 31 | 1 | true | 480 |
| RocketRecovery | 7 | 2 | true | 500 |
| RandomFrictionHopper | 13 | 3 | true | 1000 |
| DMSD | 6 | 2 | false | 100 |
| Fusion | 15 | 6 | false | 100 |
| SafetyHalfCheetah | 18 | 6 | false | 1000 |

The paper and GitHub README describe these seven tasks. Hugging Face config metadata also lists `Salespromotion` and `Simglucose-high`, so artifact users should pin the exact task/config set they intend to use.

## Actions Or Interventions

NeoRL-2 has explicit continuous action channels and next-observation labels. The datasets are generated from online RL algorithms or PID policies; suboptimal policies with returns from 50% to 80% of expert return are selected to produce conservative offline trajectories.

## Reported Scale

The Hugging Face dataset card reports 980848 rows and total file size of about 130 MB. It also says typical dataset sizes are about 100k transitions, while Fusion, RocketRecovery, and SafetyHalfCheetah are smaller by design.

## Suitability Note

NeoRL-2 is a Tier 1 non-vision action-conditioned world-model dataset. It exposes `observation_t`, `action_t`, `reward_t`, `observation_{t+1}`, and termination signals, and it explicitly stresses delayed effects, external factors, safety constraints, traditional controllers, and low-data regimes.

The paper's important caution is that current offline RL baselines often do not significantly outperform the behavior policy, and no reported baseline reaches the paper's "solved" score threshold.

## Access And License Notes

The GitHub README says all datasets are CC BY 4.0 and code is Apache 2.0. The Hugging Face dataset frontmatter marks the dataset repository as `apache-2.0`. Treat this as an artifact-level license mismatch to pin before reuse. This knowledge base records metadata only and does not mirror parquet payloads or simulators.
