Physics Informed Reinforcement Learning with Gibbs Priors for Topology Control in Power Grids
Source
- Raw Markdown: gibbs-priors-topology-control-2026
- Rendered / retrieved PDF: paper_gibbs-priors-topology-control-2026.pdf
- External source: https://arxiv.org/abs/2604.01830
Publication And Credibility
- Paper date: arXiv published 2026-04-02.
- Venue/status: arXiv preprint.
- Credibility: Very recent preprint; important because it directly targets Grid2Op case14/case36/case118 and compares against PPO plus a strong LJN topology-only baseline on case118. Needs independent replication before being treated as settled SOTA.
Core Claim
A semi-Markov RL agent acts only in hazardous regimes and uses a GNN surrogate to predict post-action overload risk; those predictions form a physics-informed Gibbs prior that selects a small candidate set and reweights policy logits.
L2RPN / Grid2Op Notes
This is one of the closest current Grid2Op papers to an action-conditioned learned world-model component. It reports reward/survival gains over PPO, near-oracle case14/case36 tradeoffs at much lower decision time than the Greedy oracle, and case118 gains over PPO while remaining below but faster than topology-only LJN.
Action-Time-Series / World-Model Notes
The learned object is current graph state + feasible topology action -> next-step overload risk, not a full multi-step latent dynamics model. That makes it an action-conditioned risk surrogate suitable for pruning/ranking candidate actions before expensive simulation.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Causal structure, counterfactuals, and control | partially closes | Trains a one-step action-conditioned risk predictor from simulator outcomes. | Does not roll out future graph states over multi-step action sequences. |
| Safety and rare events | partially closes | Focuses intervention on hazardous regimes and overload risk. | Needs uncertainty and calibration for operational deployment. |
| Context interface | partially closes | Graph encoder and action embedding join topology context with control inputs. | Needs transfer tests across grids and non-grid systems. |