Mamba-3, a new open-source state space model, has been released in March 2026 with claims of surpassing transformer architecture on key metrics — nearly 4% improved language modeling with dramatically reduced latency.
The model introduces three key technical leaps: Exponential-Trapezoidal Discretization for more expressive recurrence, Complex-Valued SSMs with the 'RoPE Trick' for better state tracking, and Multi-Input Multi-Output (MIMO) decoding to boost arithmetic intensity without slowing inference.
Mamba-3 achieves comparable perplexity to Mamba-2 while using only half the state size, and completes long-sequence tasks up to 7x faster than transformer models on identical H100 GPU hardware.
Unlike transformers that process all tokens simultaneously (quadratic scaling), state space models maintain a compact internal state that grows linearly — making them fundamentally cheaper and faster for long sequences. Mamba-3's innovation is doing this without the accuracy tradeoffs that plagued earlier SSM approaches.
The research comes from a team including Albert Gu and Tri Dao (creators of the original Mamba) along with Princeton and CMU collaborators. The project is released under Apache 2.0 license with full model code on GitHub.
While Mamba-3 isn't replacing transformers overnight, it establishes a clear trajectory: for summarization, long-document processing, and latency-sensitive applications, state space models are becoming the more efficient choice.