Winning the L2RPN Challenge: Power Grid Management via Semi-Markov Afterstate Actor-Critic

Source

Raw Markdown: winning-l2rpn-afterstate-actor-critic-2021
Rendered / retrieved PDF: paper_winning-l2rpn-afterstate-actor-critic-2021.pdf
External source: https://openreview.net/forum?id=LmUJqB1Cz8
Additional source: https://openreview.net/pdf?id=LmUJqB1Cz8
Official L2RPN reference list: https://l2rpn.chalearn.org/papers-references

Publication And Credibility

Paper date: 2021-01-12; OpenReview last modified 2023-05-05.
Venue/status: ICLR 2021 Spotlight, published on OpenReview.
Credibility: Published ICLR 2021 Spotlight paper; OpenReview currently marks it as ICLR 2021 Spotlight. It is older than one year but remains a credible challenge-winner reference.

Core Claim

The paper presents the WCCI 2020 L2RPN winning agent, using a hierarchical off-policy actor-critic approach with afterstate representation.

L2RPN / Grid2Op Notes

The method decomposes the enormous grid topology action space, evaluates post-action afterstates, and uses graph neural network value approximation to manage large real-world-scale grid states.

Action-Time-Series Notes

This source is useful when Grid2Op is treated as an action-conditioned graph time-series environment:

power-grid observations + topology / redispatch / storage control input + scenario context
  -> next grid observations + safety/cost outcome

The terminology distinction matters. Topology changes, redispatching, curtailment, and storage commands are actions or control inputs when an agent chooses them. Line failures, maintenance outages, weather-driven renewable shifts, and demand variation are events or exogenous variables unless they are deliberately controlled by the experimenter.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Causal structure, counterfactuals, and control	partially closes	Useful for TSFM control interfaces because it separates observation state, topology control input, intermediate afterstate, and long-horizon operational outcome.	It is a policy/control paper, not a learned transition model that predicts next grid trajectories under arbitrary candidate interventions.
Context interface: topology and channel context	partially closes	Power-grid state is naturally graph-structured and tied to physical assets, limits, and scenario metadata.	Needs a reusable schema that a general TSFM can consume across grids and non-grid operational systems.
Benchmark level	adjacent	L2RPN/Grid2Op provides simulator-backed trajectories with explicit controls and outcomes.	TSFM-ready comparisons require pinned environment versions, action sets, reward definitions, and train/test scenario splits.

Alex Open Research Wiki

Explorer

Winning the L2RPN Challenge: Power Grid Management via Semi-Markov Afterstate Actor-Critic

Winning the L2RPN Challenge: Power Grid Management via Semi-Markov Afterstate Actor-Critic

Source

Publication And Credibility

Core Claim

L2RPN / Grid2Op Notes

Action-Time-Series Notes

Foundation TSFM Relevance

Links Into The Wiki

Graph View

Table of Contents

Backlinks