NeoRL-2

Source

Dataset metadata snapshot: neorl2-2025
Metadata JSON: metadata.json
Official GitHub: https://github.com/polixir/NeoRL2
Official Hugging Face dataset: https://huggingface.co/datasets/polixirai/NeoRL2
arXiv preprint: https://arxiv.org/abs/2503.19267

Core Claim

NeoRL-2 is a near-real-world offline reinforcement-learning benchmark with explicit transition tuples and evaluation simulators. It targets practical offline RL difficulties that are underrepresented in simpler benchmarks: delays, exogenous factors, safety constraints, rule-based behavior policies, conservative data, and limited data.

Dataset Notes

The paper and GitHub README describe seven tasks: Pipeline, Simglucose, RocketRecovery, RandomFrictionHopper, DMSD, Fusion, and SafetyHalfCheetah.
The GitHub interface returns obs, next_obs, action, reward, done, and index.
The Hugging Face parquet artifact uses observations, actions, rewards, next_observations, and terminals.
Datasets are generated by online RL algorithms or PID policies, then suboptimal policies with returns from 50% to 80% of expert return are selected.
Hugging Face reports 980848 rows and about 130 MB total file size.

Task Shapes

Task	Observation shape	Action shape	Done flag	Max timesteps
Pipeline	52	1	false	1000
Simglucose	31	1	true	480
RocketRecovery	7	2	true	500
RandomFrictionHopper	13	3	true	1000
DMSD	6	2	false	100
Fusion	15	6	false	100
SafetyHalfCheetah	18	6	false	1000

Action-Time-Series Notes

NeoRL-2 is a clean action-conditioned trajectory source:

observation_t + action_t -> reward_t + observation_{t+1} + terminal_t

This makes it better aligned with action-conditioned world-model training than logged decision datasets that only expose one-step outcomes. The harder part is that the benchmark intentionally includes delayed effects, external factors, conservative behavior policies, and safety constraints.

Gotchas

The paper is an arXiv preprint from 2025; use it as a current benchmark artifact, not as peer-reviewed settled evidence.
The tasks are simulated to reflect practical issues; they are not direct real-world business data.
The paper reports that current baselines often fail to significantly improve over the behavior policy, and no reported baseline reaches the paper’s solved threshold.
Hugging Face config metadata lists Salespromotion and Simglucose-high in addition to the seven paper/GitHub tasks.
GitHub says datasets are CC BY 4.0 and code is Apache 2.0, while Hugging Face frontmatter marks the dataset repo as apache-2.0.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Causal structure, counterfactuals, and control	partially closes	Provides explicit `observation_t`, `action_t`, reward, `observation_{t+1}`, and terminal signals across seven non-vision control tasks.	Simulators are benchmark approximations; no direct real-world deployment data.
Benchmarks: what level of modeling is tested?	partially closes	Stresses delayed effects, external factors, safety constraints, rule-based behavior policies, conservative data, and limited data.	Needs TSFM-native model comparisons and benchmark protocols beyond offline RL baselines.
Time representation and irregular event streams	adjacent	Pipeline and Simglucose explicitly test delay; trajectories have finite horizons and termination signals.	Mostly fixed simulator step interfaces rather than irregular event streams.
Context interface: channel context and general context	adjacent	Task identity and environment-specific properties define different control domains and constraints.	No unified typed context schema for cross-domain transfer.

Alex Open Research Wiki

Explorer

NeoRL-2

NeoRL-2

Source

Core Claim

Dataset Notes

Task Shapes

Action-Time-Series Notes

Gotchas

Foundation TSFM Relevance

Links Into The Wiki

Graph View

Table of Contents

Backlinks