A World Model Based Reinforcement Learning Architecture for Autonomous Power System Control
Source
- Raw Markdown: world-model-autonomous-power-system-control-2021
- KTH DiVA metadata page checked: https://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A1637952
- DBLP record: https://dblp.org/rec/conf/smartgridcomm/TarleBLNI21
- KTH doctoral-thesis PDF containing included-paper summary: https://kth.diva-portal.org/smash/get/diva2%3A1996061/FULLTEXT01.pdf
- IEEE DOI landing page: https://doi.org/10.1109/SmartGridComm51999.2021.9632332
Publication And Credibility
- Paper date: 2021; SmartGridComm 2021, pp. 364-370, DOI 10.1109/SmartGridComm51999.2021.9632332.
- Venue/status: IEEE conference paper.
- Credibility: Peer-reviewed IEEE SmartGridComm paper, but not Grid2Op/L2RPN and older than one year. No arXiv/open standalone full text was found, so this is context evidence only.
Core Claim
WMAP is a model-based RL architecture that learns an internal world model for autonomous power-system control and includes a safety shield that can ask a human operator for guidance under high uncertainty.
L2RPN / Grid2Op Notes
This is not a Grid2Op topology-control paper. Its case study is IEEE 14-bus FACTS setpoint control. It belongs in the wiki as power-systems world-model precedent, not as L2RPN SOTA evidence.
Action-Time-Series / World-Model Notes
The paper is useful because it uses the world-model vocabulary directly in power systems and combines learned dynamics with safety shielding and decision-support modes. It does not close the Grid2Op latent action-conditioned world-model gap.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| World-model design | adjacent | Shows model-based RL plus safety shield in power systems before the recent Grid2Op wave. | Not Grid2Op, no modern multi-step TSFM benchmark, no full text retrieved. |
| Safety and human oversight | adjacent | Shield can request human guidance or operate as decision support. | Needs current replication and larger-grid evidence. |
| Benchmark hygiene | context | IEEE 14-bus case study is a useful proof of concept. | Not current SOTA and not directly comparable with L2RPN. |