Interpretable Policy Distillation for Power Grid Topology Control
Source
- Raw Markdown: interpretable-policy-distillation-grid-control-2026
- Rendered / retrieved PDF: paper_interpretable-policy-distillation-grid-control-2026.pdf
- External source: https://arxiv.org/abs/2606.00561
Publication And Credibility
- Paper date: arXiv published 2026-05-30.
- Venue/status: arXiv preprint.
- Credibility: Very recent preprint on a small Grid2Op environment. Useful for interpretability and deployment-cost hypotheses, but not sufficient by itself for large-grid SOTA claims.
Core Claim
A PPO topology-control teacher trained on Grid2Op l2rpn_case14_sandbox can be distilled into compact tree-based surrogates that outperform the teacher on held-out closed-loop reward and survival metrics.
L2RPN / Grid2Op Notes
The paper is important for operator-auditable control because the decision tree surfaces topology-variable rules and substantially cheaper inference, but its strongest results are on the 14-bus sandbox, not large L2RPN environments.
Action-Time-Series / World-Model Notes
This is policy distillation rather than dynamics modeling. It is relevant to world-model deployment as a possible interpretable action head or safety-monitor companion after a learned proposal model has been trained.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Causal structure, counterfactuals, and control | adjacent | Converts a neural action policy into inspectable rules over grid state. | Does not model consequences of candidate actions. |
| Benchmark level and deployment auditability | partially closes | Decision-tree rules and feature importances are operator-facing. | Deterministic action policies, transient overloads, and topology-specific generalization remain risks. |
| Benchmark hygiene | adjacent | Closed-loop validation over held-out episodes is reported. | Small environment limits SOTA interpretation. |