LLM-Guided Safe Reinforcement Learning for Energy System Topology Reconfiguration

Source

Raw Markdown: llm-guided-safe-rl-grid-topology-2026
Rendered / retrieved PDF: paper_llm-guided-safe-rl-grid-topology-2026.pdf
External source: https://arxiv.org/abs/2603.14018

Publication And Credibility

Paper date: arXiv published 2026-03-14.
Venue/status: arXiv preprint.
Credibility: Current exploratory preprint. It is worth tracking because it tests LLM-guided safe RL on 36-bus and 118-bus Grid2Op benchmarks, but the LLM component and safety claims need careful reproduction before being treated as SOTA.

Core Claim

The paper combines Safety-SAC with a knowledge-based Safety-LLM module that refines unsafe or suboptimal transitions and inserts safer refinements into the RL replay buffer.

L2RPN / Grid2Op Notes

The reported experiments use IEEE 36-bus and 118-bus Grid2Op benchmarks and compare against SAC, ACE, and safety-enhanced variants on reward, survival time, overloads, voltage violations, and safety costs.

Action-Time-Series / World-Model Notes

This is not a world model. It is an LLM-guided transition-refinement and exploration-shaping layer for safe RL. Its value for the wiki is as a cautionary current branch: language reasoning may help action proposal, but the actual dynamics still come from the simulator/environment.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Causal structure, counterfactuals, and control	adjacent	Uses control actions and safety-cost signals in Grid2Op.	No learned transition model or candidate-action rollout interface.
Safety and rare events	partially closes	Reformulates voltage and thermal violations into safety costs.	Need audits of LLM hallucination, prompt stability, and reproducibility.
Context interface	adjacent	Natural-language domain knowledge is used as guidance.	Needs structured action/state schemas before TSFM transfer.
Benchmark hygiene	warning	LLM-guided transition refinement depends on invocation frequency, reward threshold, LLM choice, training-only compute, simulator validation, and metric weighting.	Safety ranking differs between 36-bus and 118-bus results; prompt/model/version ablations and replay-buffer accounting are needed.

Limitations / Gotchas

Do not treat this as a dynamics model. The LLM refines selected unsafe or suboptimal transitions, then the simulator/environment supplies validation.
The LLM component is training-time exploration and replay-buffer shaping, not a runtime safety certificate.

Alex Open Research Wiki

Explorer

LLM-Guided Safe Reinforcement Learning for Energy System Topology Reconfiguration

LLM-Guided Safe Reinforcement Learning for Energy System Topology Reconfiguration

Source

Publication And Credibility

Core Claim

L2RPN / Grid2Op Notes

Action-Time-Series / World-Model Notes

Foundation TSFM Relevance

Limitations / Gotchas

Links Into The Wiki

Graph View

Table of Contents

Backlinks