MiniMax Sparse Attention

Summary

MiniMax Sparse Attention (MSA) is a learned block-sparse softmax attention mechanism built on Grouped-Query Attention. A lightweight Index Branch selects Top- $k$ key-value blocks independently for each GQA group, and the Main Branch computes exact softmax attention over the selected blocks.

Role In The Wiki

MSA is the current long-context sparse-attention anchor for this KB. Its value is the combined algorithm-plus-kernel claim: content-dependent block selection can keep much of dense GQA quality while making 1M-context prefill and decode materially cheaper.

For time-series and world-model work, MSA is an architecture analogy, not direct TSFM evidence. It suggests a way to route compute to relevant history spans, but the unselected spans are invisible to the layer. Any transfer to telemetry, graph time series, or action-conditioned streams MUST include preservation probes for rare regimes, event timing, exogenous variables, and action history.

Official Artifacts

Paper: MiniMax Sparse Attention
Code: MiniMax-AI/MSA
Released model using MSA: MiniMaxAI/MiniMax-M3

Evidence

MiniMax Sparse Attention

Alex Open Research Wiki

Explorer

MiniMax Sparse Attention

MiniMax Sparse Attention

Summary

Role In The Wiki

Official Artifacts

Evidence

Graph View

Table of Contents

Backlinks

Alex Open Research Wiki

Explorer

MiniMax Sparse Attention

MiniMax Sparse Attention

Summary

Role In The Wiki

Official Artifacts

Evidence

Related Pages

Graph View

Table of Contents

Backlinks