The voice and music AI space saw three major launches in a single week.
Suno 5.5, released March 26, introduces voice cloning for Pro and Premier subscribers. You record a live voice sample matched to a random phrase for verification, and Suno generates songs using your voice. It also adds Custom Models — upload your original tracks and Suno tunes v5.5 to your musical style (up to 3 models). A new 'My Taste' feature learns your genre and mood preferences over time.
Mistral released Voxtral TTS on March 26 — an open-source, 4-billion-parameter text-to-speech model that runs on consumer hardware including laptops and some mobile devices. It achieves 90ms time-to-first-audio and 6x real-time factor. The standout feature: voice cloning from as little as 3 seconds of audio, capturing accent, inflections, and natural fillers like 'ums' and 'ahs.' It supports 9 languages and competes directly with ElevenLabs and Deepgram.
Smallest.ai launched Lightning V3 on March 27, scoring 3.89 MOS (Mean Opinion Score) — outperforming OpenAI, Cartesia, and ElevenLabs on quality benchmarks. It supports 15 languages, offers voice cloning in under 10 seconds, and is optimized for conversational voice agents with natural-sounding pauses and disfluencies. Pricing starts at roughly $0.25 per 10K characters.
The common thread: voice AI is becoming commoditized. Open-source models can now match or beat proprietary ones, voice cloning requires seconds of audio, and the price keeps dropping.