MARL2Grid-TR: A Multi-Agent RL Benchmark in Power Grid Operations

Source

Publication And Credibility

  • Paper date: OpenReview published 2026-01-26 and last modified 2026-05-06.
  • Venue/status: ICLR 2026 Poster on OpenReview.
  • Credibility: Tier-1 conference publication from MIT, RTE, INRIA/LISN, INSA Rouen, and Northeastern authors. No arXiv version was found during targeted search, so OpenReview is the primary ingested source.

Core Claim

MARL2Grid-TR turns Grid2Op-style transmission-grid operation into a configurable multi-agent benchmark with topology control, redispatching, partial observability, expert-informed heuristics, and hard safety constraints.

L2RPN / Grid2Op Notes

This is the strongest fresh benchmark evidence that the L2RPN/Grid2Op ecosystem is moving beyond a single global agent. It makes decentralized substations and generators first-class control loci and reports that current MARL methods struggle under realistic power-grid constraints.

Action-Time-Series / World-Model Notes

For action-conditioned time series, the important contract is local or global grid observation + agent scope + topology / redispatch action -> next grid state + safety/cost outcome. This pushes the wiki toward multi-agent and partial-observation world models rather than only single-controller rollouts.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Causal structure, counterfactuals, and controlpartially closesAdds multi-agent control and partial observability to Grid2Op benchmark design.Still needs TSFM-ready logged trajectories, candidate-action rollouts, and standardized world-model baselines.
Context interface: topology and channel contextpartially closesAgent scopes, substations, generators, and grid topology are explicit context.Needs a reusable schema for non-grid operational systems.
Benchmark hygienepartially closesICLR benchmark paper, OpenReview artifact, and configurable task axes.Model comparisons remain sensitive to action spaces, safety filters, and simulator-query budgets.