RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations

Source

Raw Markdown: rl2grid-2025
Rendered / retrieved PDF: paper_rl2grid-2025.pdf
OpenReview submission status check: https://openreview.net/forum?id=7J2C4QnQrl
Official code: https://github.com/emarche/RL2Grid
External source: https://arxiv.org/abs/2503.23101

Publication And Credibility

Paper date: arXiv published 2025-03-29; v2 updated 2025-06-20.
Venue/status: arXiv preprint. The ICLR 2025 OpenReview submission was checked and is marked withdrawn.
Credibility: Strong benchmark-design team with RTE, MIT, Georgia Tech, National Grid ESO, 50Hertz, and University of Edinburgh authors. Treat as important benchmark evidence, but not as an accepted ICLR paper.

Core Claim

RL2Grid standardizes Grid2Op/RTE power-grid operation tasks, state/action spaces, rewards, operational heuristics, and baseline RL evaluations to expose where current RL methods fail on realistic power-grid constraints.

L2RPN / Grid2Op Notes

This is the single-agent benchmark counterpart to MARL2Grid-TR. It is useful for choosing fair baseline protocols before comparing new Grid2Op agents, especially because it centers reward, action-space, and safety-constraint choices rather than only model architecture.

Action-Time-Series / World-Model Notes

For world-model work, RL2Grid provides the benchmark scaffolding but not a learned dynamics model. It should be used to pin environments, action sets, rewards, and safety constraints before testing history + action -> future grid trajectory models.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Causal structure, counterfactuals, and control	partially closes	Explicit simulator-backed actions, rewards, and operational constraints make action-conditioned control evaluation possible inside Grid2Op.	This is not evidence for learned causal discovery or a learned transition model; needs learned transition-model baselines and simulator-query budgets.
Benchmark hygiene	partially closes	Standardizes task definitions and baseline RL comparisons.	OpenReview status is withdrawn; cite arXiv rather than ICLR acceptance.
Context interface	partially closes	Grid topology, forecasts, and operational heuristics are part of the task contract.	Needs a portable TSFM schema.

Alex Open Research Wiki

Explorer

RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations

RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations

Source

Publication And Credibility

Core Claim

L2RPN / Grid2Op Notes

Action-Time-Series / World-Model Notes

Foundation TSFM Relevance

Links Into The Wiki

Graph View

Table of Contents

Backlinks