L2RPN / Grid2Op
Summary
L2RPN, “Learning to Run a Power Network”, is a series of power-grid operation challenges built around the Grid2Op ecosystem. For this wiki, it is the strongest current energy-domain example of a non-vision action-conditioned graph time-series environment: agents observe power-grid state and scenario context, choose topology or other control inputs, and are scored by safe operation and cost-like outcomes.
Official Artifacts
- L2RPN papers and references: https://l2rpn.chalearn.org/papers-references
- Grid2Op documentation: https://grid2op.readthedocs.io/
- Grid2Op GitHub: https://github.com/Grid2op/grid2op
- l2rpn-baselines documentation: https://l2rpn-baselines.readthedocs.io/
- l2rpn-baselines GitHub: https://github.com/Grid2op/l2rpn-baselines
- lightsim2grid documentation: https://lightsim2grid.readthedocs.io/
- Grid2Viz: https://github.com/rte-france/grid2viz
- Grid2Game: https://github.com/BDonnot/grid2game/
- ChroniX2Grid: https://github.com/BDonnot/ChroniX2Grid
Benchmark Contract
flowchart LR Context[Scenario context: load, generation, weather, maintenance, contingencies] Obs[Grid observation: flows, topology, limits, cooldowns, forecasts] Agent[Agent or controller] Action[Control input: topology, redispatch, curtailment, storage] Disturbance[Exogenous or adversarial events: maintenance, line disconnections] Sim[Physical simulator / Grid2Op backend] Outcome[Next observation, safety status, cost/reward] Context --> Obs Obs --> Agent Agent --> Action Action --> Sim Context --> Sim Disturbance --> Sim Sim --> Outcome Outcome --> Obs
This contract is why L2RPN belongs in Action-Conditioned Time-Series Datasets. The data is graph-structured and multivariate, the action space is large and combinatorial, and the relevant failures are often rare safety events. A passive forecaster over line flows is not enough; useful models must preserve the state needed to compare candidate actions or control inputs.
The action/event distinction is important. Topology changes, redispatching, curtailment, and storage commands are actions or control inputs when the agent chooses them. Maintenance, load/generation shifts, renewable uncertainty, line failures, and adversarial disconnections are events or exogenous variables unless the experimenter controls them as part of a benchmark condition.
Paper Set Ingested
| Source | Date | Venue/status |
|---|---|---|
| MARL2Grid-TR: A Multi-Agent RL Benchmark in Power Grid Operations | OpenReview published 2026-01-26; last modified 2026-05-06 | ICLR 2026 Poster; OpenReview primary source, no arXiv version found |
| Interpretable Policy Distillation for Power Grid Topology Control | 2026-05-30 | arXiv preprint on Grid2Op policy distillation |
| Hierarchical Reinforcement Learning with Runtime Safety Shielding for Power Grid Operation | 2026-04-15 | arXiv preprint on Grid2Op safety shielding |
| Physics Informed Reinforcement Learning with Gibbs Priors for Topology Control in Power Grids | 2026-04-02 | arXiv preprint with action-conditioned GNN risk surrogate |
| LLM-Guided Safe Reinforcement Learning for Energy System Topology Reconfiguration | 2026-03-14 | arXiv preprint on LLM-guided safe RL for Grid2Op-style topology control |
| Power Grid Control with Graph-Based Distributed Reinforcement Learning | 2025-09-02 | arXiv preprint on distributed graph RL in Grid2Op |
| AI challenge for safe and low carbon power grid operation | 2025; DOAJ lists Dec 2025 issue | Energy and AI 22:100564; metadata-only ingest because no arXiv/open PDF was retrieved |
| Learning Topology Actions for Power Grid Control: A Graph-Based Soft-Label Imitation Learning Approach | 2025-03-19; arXiv v2 on 2025-06-19 | ECML PKDD 2025 ADS track / arXiv; DOI 10.1007/978-3-032-06129-4_8 |
| Graph Neural Networks for Transmission Grid Topology Control: Busbar Information Asymmetry and Heterogeneous Representations | 2025-01-13; arXiv v3 on 2025-10-03 | arXiv preprint from TenneT/Radboud authors |
| RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations | 2025-03-29; arXiv v2 on 2025-06-20 | arXiv preprint; ICLR 2025 OpenReview submission checked and marked withdrawn |
| Graph Reinforcement Learning for Power Grids: A Comprehensive Survey | 2024-07-05; arXiv v4 on 2026-01-07 | Energy and AI 2025 survey / arXiv; DOI 10.1016/j.egyai.2025.100671 |
| RL for Mitigating Cascading Failures: Targeted Exploration via Sensitivity Factors | 2024-11-27 | NeurIPS 2024 Climate Change AI workshop / arXiv preprint |
| Learning to Run a Power Network under Varying Grid Topology | May 2022 | IEEE ENERGYCON 2022; metadata-only ingest, no arXiv/open PDF found |
| Learning to run a Power Network Challenge: a Retrospective Analysis | 2021-03-02; arXiv v2 on 2021-10-21 | PMLR NeurIPS 2020 Competition and Demonstration Track, 2021 |
| Power Grid Congestion Management via Topology Optimization with AlphaZero | 2022-11-10 | NeurIPS 2022 RL4RealLife Workshop preprint; L2RPN WCCI 2022 winning approach |
| Winning the L2RPN Challenge: Power Grid Management via Semi-Markov Afterstate Actor-Critic | 2021-01-12; OpenReview last modified 2023-05-05 | ICLR 2021 Spotlight, published on OpenReview |
| Exploring grid topology reconfiguration using a simple deep reinforcement learning approach | 2020-11-26; arXiv v2 on 2021-04-17 | IEEE 2021 reference on the L2RPN page; arXiv preprint available |
| Active Power Correction Strategies Based on Deep Reinforcement Learning—Part II: A Distributed Solution for Adaptability | 2021; DOI 10.17775/CSEEJPES.2020.07070 | CSEE Journal of Power and Energy Systems / IEEE-indexed reference on the L2RPN page |
| Adversarial Training for a Continuous Robustness Control Problem in Power Systems | 2020-12-21; arXiv v3 on 2021-04-16 | IEEE 2021 reference on the L2RPN page; arXiv preprint available |
| AI-Based Autonomous Line Flow Control via Topology Adjustment for Maximizing Time-Series ATCs | 2019-11-08 | IEEE 2020 reference on the L2RPN page; arXiv PDF-only e-print available |
| LEAP Nets for System Identification and Application to Power Systems | 2020 | Neurocomputing, DOI 10.1016/j.neucom.2019.12.135; HAL metadata retrieved |
| Neural Networks for Power Flow: Graph Neural Solver | 2020-12-14; PSCC 2020 reference | Electric Power Systems Research / PSCC 2020 PDF |
| Learning to run a power network challenge for training topology controllers | 2019-12-05 | PSCC 2020 reference; arXiv preprint available |
| Introducing machine learning for power system operation support | 2017-09-27 | IERP 2017 lineage source; arXiv preprint available |
| Fast Power system security analysis with Guided Dropout | 2018-01-30 | ESANN 2018; arXiv preprint available |
| Anticipating contingengies in power grids using fast neural net screening | 2018-05-03 | IJCNN 2018; arXiv preprint available |
| Optimization of computational budget for power system risk assessment | 2018-05-03 | ISGT Europe 2018; arXiv preprint available |
| Guided Machine Learning for power grid segmentation | 2017-11-13; arXiv v3 on 2018-03-30 | ISGT Europe 2018 / NeurIPS 2019 workshop lineage; arXiv preprint available |
| Expert System for topological remedial action discovery in smart grids | 2018-11-12 | MedPower 2018; HAL PDF retrieved |
| LEAP nets for power grid perturbations | 2019-08-22 | ESANN 2019 lineage; arXiv preprint available |
| Graph Neural Solver for Power Systems | 2019-07-14 | IJCNN 2019; HAL PDF retrieved; DOI 10.1109/IJCNN.2019.8851855 |
| Interpreting Atypical Conditions in Systems with Deep Conditional Autoencoders: The Case of Electrical Consumption | 2020-01-01 publication metadata; ECML PKDD 2019 paper | ECML PKDD 2019 / Springer LNCS, DOI 10.1007/978-3-030-46133-1_38 |
Role In The Wiki
L2RPN/Grid2Op is a benchmark ecosystem, not a single immutable dataset payload. Any experiment should pin Grid2Op version, backend, environment name, chronics/scenario generator, action space, action mask, reward or cost function, train/test split, and whether the agent can simulate candidate actions before committing them.
The benchmark also gives a concrete alternative to vision-heavy world-model examples. It is closer to observability, telecom, energy, and industrial-control settings because the observations are numeric and graph-structured, actions are typed control inputs, and failures are rare but operationally severe.
The ingested lineage covers several distinct roles. The 2017 operation-support source is the pre-benchmark lineage: it treats historical topology changes as weakly annotated operator actions, uses counterfactual simulation to extract plausible remedial-action labels, and keeps ML proposals behind simulator validation. Expert-system remedial-action discovery adds the tactical rule baseline: simulator counterfactuals build an overload distribution graph, rank single-substation topological actions, and provide a comparator for learned controllers rather than a learned transition model. Guided power-grid segmentation adds the representation/decomposition branch: simulator interventions can turn physical grid state into a task-specific functional influence graph and operational regions, which is topology-context evidence rather than policy or benchmark evidence.
The 2019 topology-controller challenge is the early benchmark-design source: pypownet/OpenAI Gym, IEEE14, 5-minute control, topology-only actions, operational constraints, runtime-limited simulator queries, and an oracle gap. The simple topology-reconfiguration paper is a low-complexity sanity check for small-grid, topology-only CEM agents. Afterstate Actor-Critic adds a deterministic-afterstate decomposition and graph-masked policy/value approximation. Active Power Correction Distributed adds a distributed-control branch where control-area agents with partial observations choose bus-bar switching or do-nothing actions and filter candidate joint actions through Grid2Op simulation. Power Grid AlphaZero adds a search/planning caveat: the WCCI 2022 setup exposes a very large topology action set, but the reported agent reduces it and differs by simulator calls, oracle access, search budget, and latency. Adversarial Training adds the robustness boundary: challenge score, preventive N-1 robustness, and corrective behavior after a contingency are separate protocol axes.
Guided Dropout, Contingency Screening, and Risk Assessment form the fast security-analysis branch: learned surrogates rank or evaluate topology/contingency cases so scarce physical-simulator calls are spent on rare high-risk states. This is budgeted safety-evaluation evidence, not sequential policy or learned transition-model evidence. LEAP Nets adds the learned system-identification branch: topology and structural actionable variables are modeled explicitly so rare grid configurations are not treated only as hidden distribution shift. The 2019 LEAP perturbation paper is strongest for explicit-topology synthetic transfer; its real-grid section lacks exact topology intervention logs. The Graph Neural Solver papers belong to the simulator/surrogate layer, not the policy layer: they make topology-aware AC power-flow solving differentiable and faster, while the L2RPN policy papers supply actions, rewards, and sequential decision evidence.
Conditional Autoencoders for Electrical Consumption is related RTE energy representation background, not Grid2Op control evidence. It studies passive national load profiles and rare-context discovery rather than topology actions or simulator-backed rollouts.
The 2024-2026 branch changes the picture from one-off challenge agents to a more explicit benchmark-and-hybrid-control ecosystem. RL2Grid is the current single-agent benchmark-design anchor, but its ICLR 2025 OpenReview submission was checked and is withdrawn, so the arXiv version is the source of record. MARL2Grid-TR is the current multi-agent benchmark anchor and is published as an ICLR 2026 Poster; it adds decentralized substation/generator scopes, partial observability, topology plus redispatch control inputs, and safety constraints. The 2025 Energy and AI challenge paper records what worked in the L2RPN 2023 IDF safe low-carbon setting: large 118-node scenarios, multimodal action engineering, action-space reduction, neural useful-action prediction, and alerting for dangerous future states.
The strongest recent method pattern is hybrid rather than pure end-to-end RL. Soft-label imitation learning distills simulator-evaluated candidate topology actions into a GNN action-ranker that can preserve multiple viable interventions per state. Physics-informed Gibbs priors go one step closer to a learned action-conditioned world-model component: a GNN surrogate predicts post-action overload risk and uses that prediction to prune/reweight candidate topology actions before policy selection. Runtime safety shielding and AlphaZero-style planning use the physical simulator as the action-conditioned forward model at decision time. LLM-guided safe RL is a training-time transition-refinement layer rather than a dynamics model. Interpretable policy distillation is the deployment/compression branch: a PPO teacher can train auditable tree policies, but the result is still a policy surrogate that needs safety guardrails rather than a learned transition model. The GNN transmission-grid source adds a representation-hygiene warning: graph encoders that expose only current busbar adjacency can hide potential connections needed for topology-action consequences, especially under OOD N-1 topology shift. Targeted exploration with sensitivity factors, graph-based distributed RL, and heterogeneous graph representations all reinforce the same conclusion: action proposal, physics priors, graph context, and online simulation/safety filtering should be treated as separable layers.
The actual learned world-model gap remains open. Taha et al. 2022 is the older Grid2Op precedent that later surveys describe as a GCN learned physics model for action-conditioned line-loading prediction plus MCTS planning, but this ingest is metadata-only and the paper is not current SOTA. The modern Grid2Op literature still lacks a reusable latent model that learns multi-step rollouts of history + topology + injections forecast + candidate action sequence -> future rho / topology / reward / blackout risk / uncertainty.
Unresolved Reference
The L2RPN reference page lists EGC 2019: Semi-supervised labelling, Towards an Extended Expert Approch, but targeted searches did not produce a verifiable paper page, PDF, arXiv ID, HAL record, or DOI. It is intentionally not ingested until a reliable source is available.