L2RPN / Grid2Op

Summary

L2RPN, “Learning to Run a Power Network”, is a series of power-grid operation challenges built around the Grid2Op ecosystem. For this wiki, it is the strongest current energy-domain example of a non-vision action-conditioned graph time-series environment: agents observe power-grid state and scenario context, choose topology or other control inputs, and are scored by safe operation and cost-like outcomes.

Official Artifacts

L2RPN papers and references: https://l2rpn.chalearn.org/papers-references
Grid2Op documentation: https://grid2op.readthedocs.io/
Grid2Op GitHub: https://github.com/Grid2op/grid2op
l2rpn-baselines documentation: https://l2rpn-baselines.readthedocs.io/
l2rpn-baselines GitHub: https://github.com/Grid2op/l2rpn-baselines
lightsim2grid documentation: https://lightsim2grid.readthedocs.io/
Grid2Viz: https://github.com/rte-france/grid2viz
Grid2Game: https://github.com/BDonnot/grid2game/
ChroniX2Grid: https://github.com/BDonnot/ChroniX2Grid

Benchmark Contract

flowchart LR
  Context[Scenario context: load, generation, weather, maintenance, contingencies]
  Obs[Grid observation: flows, topology, limits, cooldowns, forecasts]
  Agent[Agent or controller]
  Action[Control input: topology, redispatch, curtailment, storage]
  Disturbance[Exogenous or adversarial events: maintenance, line disconnections]
  Sim[Physical simulator / Grid2Op backend]
  Outcome[Next observation, safety status, cost/reward]

  Context --> Obs
  Obs --> Agent
  Agent --> Action
  Action --> Sim
  Context --> Sim
  Disturbance --> Sim
  Sim --> Outcome
  Outcome --> Obs

This contract is why L2RPN belongs in Action-Conditioned Time-Series Datasets. The data is graph-structured and multivariate, the action space is large and combinatorial, and the relevant failures are often rare safety events. A passive forecaster over line flows is not enough; useful models must preserve the state needed to compare candidate actions or control inputs.

The action/event distinction is important. Topology changes, redispatching, curtailment, and storage commands are actions or control inputs when the agent chooses them. Maintenance, load/generation shifts, renewable uncertainty, line failures, and adversarial disconnections are events or exogenous variables unless the experimenter controls them as part of a benchmark condition.

Paper Set Ingested

Source	Date	Venue/status
MARL2Grid-TR: A Multi-Agent RL Benchmark in Power Grid Operations	OpenReview published 2026-01-26; last modified 2026-05-06	ICLR 2026 Poster; OpenReview primary source, no arXiv version found
Interpretable Policy Distillation for Power Grid Topology Control	2026-05-30	arXiv preprint on Grid2Op policy distillation
Hierarchical Reinforcement Learning with Runtime Safety Shielding for Power Grid Operation	2026-04-15	arXiv preprint on Grid2Op safety shielding
Physics Informed Reinforcement Learning with Gibbs Priors for Topology Control in Power Grids	2026-04-02	arXiv preprint with action-conditioned GNN risk surrogate
LLM-Guided Safe Reinforcement Learning for Energy System Topology Reconfiguration	2026-03-14	arXiv preprint on LLM-guided safe RL for Grid2Op-style topology control
Power Grid Control with Graph-Based Distributed Reinforcement Learning	2025-09-02	arXiv preprint on distributed graph RL in Grid2Op
AI challenge for safe and low carbon power grid operation	2025; DOAJ lists Dec 2025 issue	Energy and AI 22:100564; metadata-only ingest because no arXiv/open PDF was retrieved
Learning Topology Actions for Power Grid Control: A Graph-Based Soft-Label Imitation Learning Approach	2025-03-19; arXiv v2 on 2025-06-19	ECML PKDD 2025 ADS track / arXiv; DOI 10.1007/978-3-032-06129-4_8
Graph Neural Networks for Transmission Grid Topology Control: Busbar Information Asymmetry and Heterogeneous Representations	2025-01-13; arXiv v3 on 2025-10-03	arXiv preprint from TenneT/Radboud authors
RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations	2025-03-29; arXiv v2 on 2025-06-20	arXiv preprint; ICLR 2025 OpenReview submission checked and marked withdrawn
Graph Reinforcement Learning for Power Grids: A Comprehensive Survey	2024-07-05; arXiv v4 on 2026-01-07	Energy and AI 2025 survey / arXiv; DOI 10.1016/j.egyai.2025.100671
RL for Mitigating Cascading Failures: Targeted Exploration via Sensitivity Factors	2024-11-27	NeurIPS 2024 Climate Change AI workshop / arXiv preprint
Learning to Run a Power Network under Varying Grid Topology	May 2022	IEEE ENERGYCON 2022; metadata-only ingest, no arXiv/open PDF found
Learning to run a Power Network Challenge: a Retrospective Analysis	2021-03-02; arXiv v2 on 2021-10-21	PMLR NeurIPS 2020 Competition and Demonstration Track, 2021
Power Grid Congestion Management via Topology Optimization with AlphaZero	2022-11-10	NeurIPS 2022 RL4RealLife Workshop preprint; L2RPN WCCI 2022 winning approach
Winning the L2RPN Challenge: Power Grid Management via Semi-Markov Afterstate Actor-Critic	2021-01-12; OpenReview last modified 2023-05-05	ICLR 2021 Spotlight, published on OpenReview
Exploring grid topology reconfiguration using a simple deep reinforcement learning approach	2020-11-26; arXiv v2 on 2021-04-17	IEEE 2021 reference on the L2RPN page; arXiv preprint available
Active Power Correction Strategies Based on Deep Reinforcement Learning—Part II: A Distributed Solution for Adaptability	2021; DOI 10.17775/CSEEJPES.2020.07070	CSEE Journal of Power and Energy Systems / IEEE-indexed reference on the L2RPN page
Adversarial Training for a Continuous Robustness Control Problem in Power Systems	2020-12-21; arXiv v3 on 2021-04-16	IEEE 2021 reference on the L2RPN page; arXiv preprint available
AI-Based Autonomous Line Flow Control via Topology Adjustment for Maximizing Time-Series ATCs	2019-11-08	IEEE 2020 reference on the L2RPN page; arXiv PDF-only e-print available
LEAP Nets for System Identification and Application to Power Systems	2020	Neurocomputing, DOI 10.1016/j.neucom.2019.12.135; HAL metadata retrieved
Neural Networks for Power Flow: Graph Neural Solver	2020-12-14; PSCC 2020 reference	Electric Power Systems Research / PSCC 2020 PDF
Learning to run a power network challenge for training topology controllers	2019-12-05	PSCC 2020 reference; arXiv preprint available
Introducing machine learning for power system operation support	2017-09-27	IERP 2017 lineage source; arXiv preprint available
Fast Power system security analysis with Guided Dropout	2018-01-30	ESANN 2018; arXiv preprint available
Anticipating contingengies in power grids using fast neural net screening	2018-05-03	IJCNN 2018; arXiv preprint available
Optimization of computational budget for power system risk assessment	2018-05-03	ISGT Europe 2018; arXiv preprint available
Guided Machine Learning for power grid segmentation	2017-11-13; arXiv v3 on 2018-03-30	ISGT Europe 2018 / NeurIPS 2019 workshop lineage; arXiv preprint available
Expert System for topological remedial action discovery in smart grids	2018-11-12	MedPower 2018; HAL PDF retrieved
LEAP nets for power grid perturbations	2019-08-22	ESANN 2019 lineage; arXiv preprint available
Graph Neural Solver for Power Systems	2019-07-14	IJCNN 2019; HAL PDF retrieved; DOI 10.1109/IJCNN.2019.8851855
Interpreting Atypical Conditions in Systems with Deep Conditional Autoencoders: The Case of Electrical Consumption	2020-01-01 publication metadata; ECML PKDD 2019 paper	ECML PKDD 2019 / Springer LNCS, DOI 10.1007/978-3-030-46133-1_38

Role In The Wiki

L2RPN/Grid2Op is a benchmark ecosystem, not a single immutable dataset payload. Any experiment should pin Grid2Op version, backend, environment name, chronics/scenario generator, action space, action mask, reward or cost function, train/test split, and whether the agent can simulate candidate actions before committing them.

The benchmark also gives a concrete alternative to vision-heavy world-model examples. It is closer to observability, telecom, energy, and industrial-control settings because the observations are numeric and graph-structured, actions are typed control inputs, and failures are rare but operationally severe.

The ingested lineage covers several distinct roles. The 2017 operation-support source is the pre-benchmark lineage: it treats historical topology changes as weakly annotated operator actions, uses counterfactual simulation to extract plausible remedial-action labels, and keeps ML proposals behind simulator validation. Expert-system remedial-action discovery adds the tactical rule baseline: simulator counterfactuals build an overload distribution graph, rank single-substation topological actions, and provide a comparator for learned controllers rather than a learned transition model. Guided power-grid segmentation adds the representation/decomposition branch: simulator interventions can turn physical grid state into a task-specific functional influence graph and operational regions, which is topology-context evidence rather than policy or benchmark evidence.

The 2019 topology-controller challenge is the early benchmark-design source: pypownet/OpenAI Gym, IEEE14, 5-minute control, topology-only actions, operational constraints, runtime-limited simulator queries, and an oracle gap. The simple topology-reconfiguration paper is a low-complexity sanity check for small-grid, topology-only CEM agents. Afterstate Actor-Critic adds a deterministic-afterstate decomposition and graph-masked policy/value approximation. Active Power Correction Distributed adds a distributed-control branch where control-area agents with partial observations choose bus-bar switching or do-nothing actions and filter candidate joint actions through Grid2Op simulation. Power Grid AlphaZero adds a search/planning caveat: the WCCI 2022 setup exposes a very large topology action set, but the reported agent reduces it and differs by simulator calls, oracle access, search budget, and latency. Adversarial Training adds the robustness boundary: challenge score, preventive N-1 robustness, and corrective behavior after a contingency are separate protocol axes.

Guided Dropout, Contingency Screening, and Risk Assessment form the fast security-analysis branch: learned surrogates rank or evaluate topology/contingency cases so scarce physical-simulator calls are spent on rare high-risk states. This is budgeted safety-evaluation evidence, not sequential policy or learned transition-model evidence. LEAP Nets adds the learned system-identification branch: topology and structural actionable variables are modeled explicitly so rare grid configurations are not treated only as hidden distribution shift. The 2019 LEAP perturbation paper is strongest for explicit-topology synthetic transfer; its real-grid section lacks exact topology intervention logs. The Graph Neural Solver papers belong to the simulator/surrogate layer, not the policy layer: they make topology-aware AC power-flow solving differentiable and faster, while the L2RPN policy papers supply actions, rewards, and sequential decision evidence.

Conditional Autoencoders for Electrical Consumption is related RTE energy representation background, not Grid2Op control evidence. It studies passive national load profiles and rare-context discovery rather than topology actions or simulator-backed rollouts.

The 2024-2026 branch changes the picture from one-off challenge agents to a more explicit benchmark-and-hybrid-control ecosystem. RL2Grid is the current single-agent benchmark-design anchor, but its ICLR 2025 OpenReview submission was checked and is withdrawn, so the arXiv version is the source of record. MARL2Grid-TR is the current multi-agent benchmark anchor and is published as an ICLR 2026 Poster; it adds decentralized substation/generator scopes, partial observability, topology plus redispatch control inputs, and safety constraints. The 2025 Energy and AI challenge paper records what worked in the L2RPN 2023 IDF safe low-carbon setting: large 118-node scenarios, multimodal action engineering, action-space reduction, neural useful-action prediction, and alerting for dangerous future states.

The strongest recent method pattern is hybrid rather than pure end-to-end RL. Soft-label imitation learning distills simulator-evaluated candidate topology actions into a GNN action-ranker that can preserve multiple viable interventions per state. Physics-informed Gibbs priors go one step closer to a learned action-conditioned world-model component: a GNN surrogate predicts post-action overload risk and uses that prediction to prune/reweight candidate topology actions before policy selection. Runtime safety shielding and AlphaZero-style planning use the physical simulator as the action-conditioned forward model at decision time. LLM-guided safe RL is a training-time transition-refinement layer rather than a dynamics model. Interpretable policy distillation is the deployment/compression branch: a PPO teacher can train auditable tree policies, but the result is still a policy surrogate that needs safety guardrails rather than a learned transition model. The GNN transmission-grid source adds a representation-hygiene warning: graph encoders that expose only current busbar adjacency can hide potential connections needed for topology-action consequences, especially under OOD N-1 topology shift. Targeted exploration with sensitivity factors, graph-based distributed RL, and heterogeneous graph representations all reinforce the same conclusion: action proposal, physics priors, graph context, and online simulation/safety filtering should be treated as separable layers.

The actual learned world-model gap remains open. Taha et al. 2022 is the older Grid2Op precedent that later surveys describe as a GCN learned physics model for action-conditioned line-loading prediction plus MCTS planning, but this ingest is metadata-only and the paper is not current SOTA. The modern Grid2Op literature still lacks a reusable latent model that learns multi-step rollouts of history + topology + injections forecast + candidate action sequence -> future rho / topology / reward / blackout risk / uncertainty.

Unresolved Reference

The L2RPN reference page lists EGC 2019: Semi-supervised labelling, Towards an Extended Expert Approch, but targeted searches did not produce a verifiable paper page, PDF, arXiv ID, HAL record, or DOI. It is intentionally not ingested until a reliable source is available.

Alex Open Research Wiki

Explorer

L2RPN / Grid2Op

L2RPN / Grid2Op

Summary

Official Artifacts

Benchmark Contract

Paper Set Ingested

Role In The Wiki

Unresolved Reference

Graph View

Table of Contents

Backlinks

Alex Open Research Wiki

Explorer

L2RPN / Grid2Op

L2RPN / Grid2Op

Summary

Official Artifacts

Benchmark Contract

Paper Set Ingested

Role In The Wiki

Unresolved Reference

Related Pages

Graph View

Table of Contents

Backlinks