Graph Reinforcement Learning for Power Grids: A Comprehensive Survey
Source
- Raw Markdown: graph-rl-power-grids-survey-2024
- Rendered / retrieved PDF: paper_graph-rl-power-grids-survey-2024.pdf
- External source: https://arxiv.org/abs/2407.04522
Publication And Credibility
- Paper date: arXiv published 2024-07-05; v4 updated 2026-01-07.
- Venue/status: Energy and AI article DOI 10.1016/j.egyai.2025.100671, with arXiv source available.
- Credibility: Current peer-reviewed survey with multiple Fraunhofer/University of Kassel/University of Greifswald authors; useful for mapping but secondary relative to individual method papers.
Core Claim
The survey maps graph reinforcement learning for transmission and distribution grids, emphasizing graph representation, GNN architecture, RL method, benchmark limitations, and real-world deployment gaps.
L2RPN / Grid2Op Notes
The survey explicitly discusses Grid2Op as the common simulator for many transmission-grid control papers and highlights comparability issues around grid size, horizons, stochasticity, seeds, and action spaces.
Action-Time-Series / World-Model Notes
It contains the clearest secondary-source statement that Taha et al. use a GCN as a learned physics model for planning: predicting line loading for different actions and feeding those predictions to MCTS. That is the main reason this survey is linked into world-model pages.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Benchmark hygiene | partially closes | Summarizes Grid2Op evaluation pitfalls and recommends standardized protocols. | As a survey, it does not provide a new benchmark artifact. |
| Context interface | partially closes | Graph representation choices are central. | Needs TSFM-specific schemas and non-grid analogs. |
| Causal structure, counterfactuals, and control | adjacent | Reviews action-conditioned graph RL and learned surrogate planning. | Individual primary papers still need separate verification. |