RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations
Source
- Raw Markdown: rl2grid-2025
- Rendered / retrieved PDF: paper_rl2grid-2025.pdf
- OpenReview submission status check: https://openreview.net/forum?id=7J2C4QnQrl
- Official code: https://github.com/emarche/RL2Grid
- External source: https://arxiv.org/abs/2503.23101
Publication And Credibility
- Paper date: arXiv published 2025-03-29; v2 updated 2025-06-20.
- Venue/status: arXiv preprint. The ICLR 2025 OpenReview submission was checked and is marked withdrawn.
- Credibility: Strong benchmark-design team with RTE, MIT, Georgia Tech, National Grid ESO, 50Hertz, and University of Edinburgh authors. Treat as important benchmark evidence, but not as an accepted ICLR paper.
Core Claim
RL2Grid standardizes Grid2Op/RTE power-grid operation tasks, state/action spaces, rewards, operational heuristics, and baseline RL evaluations to expose where current RL methods fail on realistic power-grid constraints.
L2RPN / Grid2Op Notes
This is the single-agent benchmark counterpart to MARL2Grid-TR. It is useful for choosing fair baseline protocols before comparing new Grid2Op agents, especially because it centers reward, action-space, and safety-constraint choices rather than only model architecture.
Action-Time-Series / World-Model Notes
For world-model work, RL2Grid provides the benchmark scaffolding but not a learned dynamics model. It should be used to pin environments, action sets, rewards, and safety constraints before testing history + action -> future grid trajectory models.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Causal structure, counterfactuals, and control | partially closes | Explicit simulator-backed actions, rewards, and operational constraints make action-conditioned control evaluation possible inside Grid2Op. | This is not evidence for learned causal discovery or a learned transition model; needs learned transition-model baselines and simulator-query budgets. |
| Benchmark hygiene | partially closes | Standardizes task definitions and baseline RL comparisons. | OpenReview status is withdrawn; cite arXiv rather than ICLR acceptance. |
| Context interface | partially closes | Grid topology, forecasts, and operational heuristics are part of the task contract. | Needs a portable TSFM schema. |