Physics Informed Reinforcement Learning with Gibbs Priors for Topology Control in Power Grids

Source

Publication And Credibility

  • Paper date: arXiv published 2026-04-02.
  • Venue/status: arXiv preprint.
  • Credibility: Very recent preprint; important because it directly targets Grid2Op case14/case36/case118 and compares against PPO plus a strong LJN topology-only baseline on case118. Needs independent replication before being treated as settled SOTA.

Core Claim

A semi-Markov RL agent acts only in hazardous regimes and uses a GNN surrogate to predict post-action overload risk; those predictions form a physics-informed Gibbs prior that selects a small candidate set and reweights policy logits.

L2RPN / Grid2Op Notes

This is one of the closest current Grid2Op papers to an action-conditioned learned world-model component. It reports reward/survival gains over PPO, near-oracle case14/case36 tradeoffs at much lower decision time than the Greedy oracle, and case118 gains over PPO while remaining below but faster than topology-only LJN.

Action-Time-Series / World-Model Notes

The learned object is current graph state + feasible topology action -> next-step overload risk, not a full multi-step latent dynamics model. That makes it an action-conditioned risk surrogate suitable for pruning/ranking candidate actions before expensive simulation.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Causal structure, counterfactuals, and controlpartially closesTrains a one-step action-conditioned risk predictor from simulator outcomes.Does not roll out future graph states over multi-step action sequences.
Safety and rare eventspartially closesFocuses intervention on hazardous regimes and overload risk.Needs uncertainty and calibration for operational deployment.
Context interfacepartially closesGraph encoder and action embedding join topology context with control inputs.Needs transfer tests across grids and non-grid systems.