MiniMax-M3
Summary
MiniMax-M3 is a MiniMax open-weight native multimodal MoE model release powered by MiniMax Sparse Attention. The linked Hugging Face model card describes about 428B total parameters, about 23B activated parameters, image/video/text input, 1M-context support, and coding/agentic usage.
Role In The Wiki
MiniMax-M3 is the release-level evidence that MSA is deployed in a production/open-weight multimodal model stack rather than only a standalone attention paper. It belongs near encoder-free/native multimodal releases and long-context serving efficiency sources.
For the foundation time-series agenda, MiniMax-M3 is not evidence for numeric time-series modeling. Its transferable lesson is that long-context, multimodal, agentic models increasingly make context length a serving-systems problem as much as an architecture problem.
Official Artifacts
- Hugging Face model: MiniMaxAI/MiniMax-M3
- GitHub model repo: MiniMax-AI/MiniMax-M3
- Attention/kernel code: MiniMax-AI/MSA
- Paper: MiniMax Sparse Attention
License Note
The model weights use the MiniMax Community License with commercial-use conditions. Treat MiniMax-M3 as open-weight, not as a standard Apache-2.0 or MIT model release.