Mistral released Small 4 on March 16, 2026 — a 119B-parameter Mixture-of-Experts model under Apache 2.0 license that unifies capabilities previously split across four separate Mistral models.
Small 4 is the first Mistral model to combine instruction following (Mistral Small), reasoning (Magistral), multimodal understanding (Pixtral), and agentic coding (Devstral) into a single architecture.
The MoE design uses 128 total experts with only 4 activated per token, resulting in just 6.5 billion active parameters per inference despite the 119B total. This delivers 40% lower latency and triple the throughput compared to Mistral Small 3.
On benchmarks, Small 4 matches or surpasses models like GPT-OSS 120B and Qwen across reasoning, coding, and multimodal tasks — remarkable given its efficiency advantage.
The model is available through the Mistral API, AI Studio, Hugging Face, and Nvidia's NIM containers, with support for vLLM and llama.cpp inference frameworks.
The Apache 2.0 license means developers can use, modify, and commercially deploy the model without restrictions — positioning Small 4 as a compelling alternative to closed-source frontier models for organizations that need on-premises or self-hosted deployments.