MiniMax Sparse Attention
Summary
MiniMax Sparse Attention (MSA) is a learned block-sparse softmax attention mechanism built on Grouped-Query Attention. A lightweight Index Branch selects Top- key-value blocks independently for each GQA group, and the Main Branch computes exact softmax attention over the selected blocks.
Role In The Wiki
MSA is the current long-context sparse-attention anchor for this KB. Its value is the combined algorithm-plus-kernel claim: content-dependent block selection can keep much of dense GQA quality while making 1M-context prefill and decode materially cheaper.
For time-series and world-model work, MSA is an architecture analogy, not direct TSFM evidence. It suggests a way to route compute to relevant history spans, but the unselected spans are invisible to the layer. Any transfer to telemetry, graph time series, or action-conditioned streams MUST include preservation probes for rare regimes, event timing, exogenous variables, and action history.
Official Artifacts
- Paper: MiniMax Sparse Attention
- Code: MiniMax-AI/MSA
- Released model using MSA: MiniMaxAI/MiniMax-M3