MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
Source
- Raw Markdown: paper_mesanet-2025.md
- PDF: paper_mesanet-2025.pdf
- Preprint: arXiv 2506.05233
Core Claim
MesaNet uses a chunkwise parallelizable Mesa layer whose in-context regression objective is solved close to optimality at each time point using conjugate gradients.
Relevance To This Wiki
It belongs to the memory-as-optimization branch: inference compute is spent solving local sequential optimization problems inside the model rather than only applying fixed layers.
Limitations
The main evidence is language modeling and long-context tasks. The extra inference FLOPs are part of the contract and need matched-budget comparison.
Foundation TSFM Relevance
Adjacent to dynamic compute and test-time adaptation for time-series systems where extra effort may be worthwhile for rare regimes or high-uncertainty windows.
Links Into The Wiki
- MesaNet
- Looped Transformers And Test-Time Memory
- Efficient Recurrent Sequence Models
- Time-Series Scaling And Efficiency
- Foundation Time-Series Model Research Agenda
Open Questions
- What matched-budget baseline should this source be compared against: unique-depth Transformer layers, recurrent state, explicit memory, or extra inference steps?
- Which claims transfer from token-sequence reasoning to multivariate time-series state tracking, event streams, or action-conditioned world models?