Interpretable Policy Distillation for Power Grid Topology Control

Source

Publication And Credibility

  • Paper date: arXiv published 2026-05-30.
  • Venue/status: arXiv preprint.
  • Credibility: Very recent preprint on a small Grid2Op environment. Useful for interpretability and deployment-cost hypotheses, but not sufficient by itself for large-grid SOTA claims.

Core Claim

A PPO topology-control teacher trained on Grid2Op l2rpn_case14_sandbox can be distilled into compact tree-based surrogates that outperform the teacher on held-out closed-loop reward and survival metrics.

L2RPN / Grid2Op Notes

The paper is important for operator-auditable control because the decision tree surfaces topology-variable rules and substantially cheaper inference, but its strongest results are on the 14-bus sandbox, not large L2RPN environments.

Action-Time-Series / World-Model Notes

This is policy distillation rather than dynamics modeling. It is relevant to world-model deployment as a possible interpretable action head or safety-monitor companion after a learned proposal model has been trained.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Causal structure, counterfactuals, and controladjacentConverts a neural action policy into inspectable rules over grid state.Does not model consequences of candidate actions.
Benchmark level and deployment auditabilitypartially closesDecision-tree rules and feature importances are operator-facing.Deterministic action policies, transient overloads, and topology-specific generalization remain risks.
Benchmark hygieneadjacentClosed-loop validation over held-out episodes is reported.Small environment limits SOTA interpretation.