MARL2Grid-TR: A Multi-Agent RL Benchmark in Power Grid Operations
Source
- Raw Markdown: marl2grid-tr-2026
- Retrieved PDF: paper_marl2grid-tr-2026.pdf
- OpenReview PDF: https://openreview.net/pdf?id=mpAMH1OyMO
- External source: https://openreview.net/forum?id=mpAMH1OyMO
Publication And Credibility
- Paper date: OpenReview published 2026-01-26 and last modified 2026-05-06.
- Venue/status: ICLR 2026 Poster on OpenReview.
- Credibility: Tier-1 conference publication from MIT, RTE, INRIA/LISN, INSA Rouen, and Northeastern authors. No arXiv version was found during targeted search, so OpenReview is the primary ingested source.
Core Claim
MARL2Grid-TR turns Grid2Op-style transmission-grid operation into a configurable multi-agent benchmark with topology control, redispatching, partial observability, expert-informed heuristics, and hard safety constraints.
L2RPN / Grid2Op Notes
This is the strongest fresh benchmark evidence that the L2RPN/Grid2Op ecosystem is moving beyond a single global agent. It makes decentralized substations and generators first-class control loci and reports that current MARL methods struggle under realistic power-grid constraints.
Action-Time-Series / World-Model Notes
For action-conditioned time series, the important contract is local or global grid observation + agent scope + topology / redispatch action -> next grid state + safety/cost outcome. This pushes the wiki toward multi-agent and partial-observation world models rather than only single-controller rollouts.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Causal structure, counterfactuals, and control | partially closes | Adds multi-agent control and partial observability to Grid2Op benchmark design. | Still needs TSFM-ready logged trajectories, candidate-action rollouts, and standardized world-model baselines. |
| Context interface: topology and channel context | partially closes | Agent scopes, substations, generators, and grid topology are explicit context. | Needs a reusable schema for non-grid operational systems. |
| Benchmark hygiene | partially closes | ICLR benchmark paper, OpenReview artifact, and configurable task axes. | Model comparisons remain sensitive to action spaces, safety filters, and simulator-query budgets. |