L2RPN / Grid2Op

Summary

L2RPN, “Learning to Run a Power Network”, is a series of power-grid operation challenges built around the Grid2Op ecosystem. For this wiki, it is the strongest current energy-domain example of a non-vision action-conditioned graph time-series environment: agents observe power-grid state and scenario context, choose topology or other control inputs, and are scored by safe operation and cost-like outcomes.

Official Artifacts

Benchmark Contract

flowchart LR
  Context[Scenario context: load, generation, weather, maintenance, contingencies]
  Obs[Grid observation: flows, topology, limits, cooldowns, forecasts]
  Agent[Agent or controller]
  Action[Control input: topology, redispatch, curtailment, storage]
  Disturbance[Exogenous or adversarial events: maintenance, line disconnections]
  Sim[Physical simulator / Grid2Op backend]
  Outcome[Next observation, safety status, cost/reward]

  Context --> Obs
  Obs --> Agent
  Agent --> Action
  Action --> Sim
  Context --> Sim
  Disturbance --> Sim
  Sim --> Outcome
  Outcome --> Obs

This contract is why L2RPN belongs in Action-Conditioned Time-Series Datasets. The data is graph-structured and multivariate, the action space is large and combinatorial, and the relevant failures are often rare safety events. A passive forecaster over line flows is not enough; useful models must preserve the state needed to compare candidate actions or control inputs.

The action/event distinction is important. Topology changes, redispatching, curtailment, and storage commands are actions or control inputs when the agent chooses them. Maintenance, load/generation shifts, renewable uncertainty, line failures, and adversarial disconnections are events or exogenous variables unless the experimenter controls them as part of a benchmark condition.

Paper Set Ingested

SourceDateVenue/status
MARL2Grid-TR: A Multi-Agent RL Benchmark in Power Grid OperationsOpenReview published 2026-01-26; last modified 2026-05-06ICLR 2026 Poster; OpenReview primary source, no arXiv version found
Interpretable Policy Distillation for Power Grid Topology Control2026-05-30arXiv preprint on Grid2Op policy distillation
Hierarchical Reinforcement Learning with Runtime Safety Shielding for Power Grid Operation2026-04-15arXiv preprint on Grid2Op safety shielding
Physics Informed Reinforcement Learning with Gibbs Priors for Topology Control in Power Grids2026-04-02arXiv preprint with action-conditioned GNN risk surrogate
LLM-Guided Safe Reinforcement Learning for Energy System Topology Reconfiguration2026-03-14arXiv preprint on LLM-guided safe RL for Grid2Op-style topology control
Power Grid Control with Graph-Based Distributed Reinforcement Learning2025-09-02arXiv preprint on distributed graph RL in Grid2Op
AI challenge for safe and low carbon power grid operation2025; DOAJ lists Dec 2025 issueEnergy and AI 22:100564; metadata-only ingest because no arXiv/open PDF was retrieved
Learning Topology Actions for Power Grid Control: A Graph-Based Soft-Label Imitation Learning Approach2025-03-19; arXiv v2 on 2025-06-19ECML PKDD 2025 ADS track / arXiv; DOI 10.1007/978-3-032-06129-4_8
Graph Neural Networks for Transmission Grid Topology Control: Busbar Information Asymmetry and Heterogeneous Representations2025-01-13; arXiv v3 on 2025-10-03arXiv preprint from TenneT/Radboud authors
RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations2025-03-29; arXiv v2 on 2025-06-20arXiv preprint; ICLR 2025 OpenReview submission checked and marked withdrawn
Graph Reinforcement Learning for Power Grids: A Comprehensive Survey2024-07-05; arXiv v4 on 2026-01-07Energy and AI 2025 survey / arXiv; DOI 10.1016/j.egyai.2025.100671
RL for Mitigating Cascading Failures: Targeted Exploration via Sensitivity Factors2024-11-27NeurIPS 2024 Climate Change AI workshop / arXiv preprint
Learning to Run a Power Network under Varying Grid TopologyMay 2022IEEE ENERGYCON 2022; metadata-only ingest, no arXiv/open PDF found
Learning to run a Power Network Challenge: a Retrospective Analysis2021-03-02; arXiv v2 on 2021-10-21PMLR NeurIPS 2020 Competition and Demonstration Track, 2021
Power Grid Congestion Management via Topology Optimization with AlphaZero2022-11-10NeurIPS 2022 RL4RealLife Workshop preprint; L2RPN WCCI 2022 winning approach
Winning the L2RPN Challenge: Power Grid Management via Semi-Markov Afterstate Actor-Critic2021-01-12; OpenReview last modified 2023-05-05ICLR 2021 Spotlight, published on OpenReview
Exploring grid topology reconfiguration using a simple deep reinforcement learning approach2020-11-26; arXiv v2 on 2021-04-17IEEE 2021 reference on the L2RPN page; arXiv preprint available
Active Power Correction Strategies Based on Deep Reinforcement Learning—Part II: A Distributed Solution for Adaptability2021; DOI 10.17775/CSEEJPES.2020.07070CSEE Journal of Power and Energy Systems / IEEE-indexed reference on the L2RPN page
Adversarial Training for a Continuous Robustness Control Problem in Power Systems2020-12-21; arXiv v3 on 2021-04-16IEEE 2021 reference on the L2RPN page; arXiv preprint available
AI-Based Autonomous Line Flow Control via Topology Adjustment for Maximizing Time-Series ATCs2019-11-08IEEE 2020 reference on the L2RPN page; arXiv PDF-only e-print available
LEAP Nets for System Identification and Application to Power Systems2020Neurocomputing, DOI 10.1016/j.neucom.2019.12.135; HAL metadata retrieved
Neural Networks for Power Flow: Graph Neural Solver2020-12-14; PSCC 2020 referenceElectric Power Systems Research / PSCC 2020 PDF
Learning to run a power network challenge for training topology controllers2019-12-05PSCC 2020 reference; arXiv preprint available
Introducing machine learning for power system operation support2017-09-27IERP 2017 lineage source; arXiv preprint available
Fast Power system security analysis with Guided Dropout2018-01-30ESANN 2018; arXiv preprint available
Anticipating contingengies in power grids using fast neural net screening2018-05-03IJCNN 2018; arXiv preprint available
Optimization of computational budget for power system risk assessment2018-05-03ISGT Europe 2018; arXiv preprint available
Guided Machine Learning for power grid segmentation2017-11-13; arXiv v3 on 2018-03-30ISGT Europe 2018 / NeurIPS 2019 workshop lineage; arXiv preprint available
Expert System for topological remedial action discovery in smart grids2018-11-12MedPower 2018; HAL PDF retrieved
LEAP nets for power grid perturbations2019-08-22ESANN 2019 lineage; arXiv preprint available
Graph Neural Solver for Power Systems2019-07-14IJCNN 2019; HAL PDF retrieved; DOI 10.1109/IJCNN.2019.8851855
Interpreting Atypical Conditions in Systems with Deep Conditional Autoencoders: The Case of Electrical Consumption2020-01-01 publication metadata; ECML PKDD 2019 paperECML PKDD 2019 / Springer LNCS, DOI 10.1007/978-3-030-46133-1_38

Role In The Wiki

L2RPN/Grid2Op is a benchmark ecosystem, not a single immutable dataset payload. Any experiment should pin Grid2Op version, backend, environment name, chronics/scenario generator, action space, action mask, reward or cost function, train/test split, and whether the agent can simulate candidate actions before committing them.

The benchmark also gives a concrete alternative to vision-heavy world-model examples. It is closer to observability, telecom, energy, and industrial-control settings because the observations are numeric and graph-structured, actions are typed control inputs, and failures are rare but operationally severe.

The ingested lineage covers several distinct roles. The 2017 operation-support source is the pre-benchmark lineage: it treats historical topology changes as weakly annotated operator actions, uses counterfactual simulation to extract plausible remedial-action labels, and keeps ML proposals behind simulator validation. Expert-system remedial-action discovery adds the tactical rule baseline: simulator counterfactuals build an overload distribution graph, rank single-substation topological actions, and provide a comparator for learned controllers rather than a learned transition model. Guided power-grid segmentation adds the representation/decomposition branch: simulator interventions can turn physical grid state into a task-specific functional influence graph and operational regions, which is topology-context evidence rather than policy or benchmark evidence.

The 2019 topology-controller challenge is the early benchmark-design source: pypownet/OpenAI Gym, IEEE14, 5-minute control, topology-only actions, operational constraints, runtime-limited simulator queries, and an oracle gap. The simple topology-reconfiguration paper is a low-complexity sanity check for small-grid, topology-only CEM agents. Afterstate Actor-Critic adds a deterministic-afterstate decomposition and graph-masked policy/value approximation. Active Power Correction Distributed adds a distributed-control branch where control-area agents with partial observations choose bus-bar switching or do-nothing actions and filter candidate joint actions through Grid2Op simulation. Power Grid AlphaZero adds a search/planning caveat: the WCCI 2022 setup exposes a very large topology action set, but the reported agent reduces it and differs by simulator calls, oracle access, search budget, and latency. Adversarial Training adds the robustness boundary: challenge score, preventive N-1 robustness, and corrective behavior after a contingency are separate protocol axes.

Guided Dropout, Contingency Screening, and Risk Assessment form the fast security-analysis branch: learned surrogates rank or evaluate topology/contingency cases so scarce physical-simulator calls are spent on rare high-risk states. This is budgeted safety-evaluation evidence, not sequential policy or learned transition-model evidence. LEAP Nets adds the learned system-identification branch: topology and structural actionable variables are modeled explicitly so rare grid configurations are not treated only as hidden distribution shift. The 2019 LEAP perturbation paper is strongest for explicit-topology synthetic transfer; its real-grid section lacks exact topology intervention logs. The Graph Neural Solver papers belong to the simulator/surrogate layer, not the policy layer: they make topology-aware AC power-flow solving differentiable and faster, while the L2RPN policy papers supply actions, rewards, and sequential decision evidence.

Conditional Autoencoders for Electrical Consumption is related RTE energy representation background, not Grid2Op control evidence. It studies passive national load profiles and rare-context discovery rather than topology actions or simulator-backed rollouts.

The 2024-2026 branch changes the picture from one-off challenge agents to a more explicit benchmark-and-hybrid-control ecosystem. RL2Grid is the current single-agent benchmark-design anchor, but its ICLR 2025 OpenReview submission was checked and is withdrawn, so the arXiv version is the source of record. MARL2Grid-TR is the current multi-agent benchmark anchor and is published as an ICLR 2026 Poster; it adds decentralized substation/generator scopes, partial observability, topology plus redispatch control inputs, and safety constraints. The 2025 Energy and AI challenge paper records what worked in the L2RPN 2023 IDF safe low-carbon setting: large 118-node scenarios, multimodal action engineering, action-space reduction, neural useful-action prediction, and alerting for dangerous future states.

The strongest recent method pattern is hybrid rather than pure end-to-end RL. Soft-label imitation learning distills simulator-evaluated candidate topology actions into a GNN action-ranker that can preserve multiple viable interventions per state. Physics-informed Gibbs priors go one step closer to a learned action-conditioned world-model component: a GNN surrogate predicts post-action overload risk and uses that prediction to prune/reweight candidate topology actions before policy selection. Runtime safety shielding and AlphaZero-style planning use the physical simulator as the action-conditioned forward model at decision time. LLM-guided safe RL is a training-time transition-refinement layer rather than a dynamics model. Interpretable policy distillation is the deployment/compression branch: a PPO teacher can train auditable tree policies, but the result is still a policy surrogate that needs safety guardrails rather than a learned transition model. The GNN transmission-grid source adds a representation-hygiene warning: graph encoders that expose only current busbar adjacency can hide potential connections needed for topology-action consequences, especially under OOD N-1 topology shift. Targeted exploration with sensitivity factors, graph-based distributed RL, and heterogeneous graph representations all reinforce the same conclusion: action proposal, physics priors, graph context, and online simulation/safety filtering should be treated as separable layers.

The actual learned world-model gap remains open. Taha et al. 2022 is the older Grid2Op precedent that later surveys describe as a GCN learned physics model for action-conditioned line-loading prediction plus MCTS planning, but this ingest is metadata-only and the paper is not current SOTA. The modern Grid2Op literature still lacks a reusable latent model that learns multi-step rollouts of history + topology + injections forecast + candidate action sequence -> future rho / topology / reward / blackout risk / uncertainty.

Unresolved Reference

The L2RPN reference page lists EGC 2019: Semi-supervised labelling, Towards an Extended Expert Approch, but targeted searches did not produce a verifiable paper page, PDF, arXiv ID, HAL record, or DOI. It is intentionally not ingested until a reliable source is available.