Deepgram Unveils Flux, First Conversational Speech Model

Flux solves the biggest problem in Voice AI Agents Interruptions. Deepgram Now One Step Closer to Passing the Audio Turing Test

Deepgram, the world’s most realistic and real-time Voice AI platform, announced from VapiCon 2025 the launch of Flux, the world’s first conversational speech recognition (CSR) model designed specifically for real-time voice agents. Unlike traditional automatic speech recognition (ASR), which was built for transcription use cases like captions or meeting notes, Flux is trained to understand the nuances of dialogue. It doesn’t just capture what was said. It knows when a speaker has finished, when to respond, and how to keep the flow of conversation natural and engaging.

The global voice AI agents market is projected to reach nearly $47.5 billion by 2034, growing at a compound annual rate of about 34.8%. This growth is primarily due to the enterprise shift toward automated customer self-service, smarter agent assist tools, and embedded conversational experiences across industries. But traditional STT systems weren’t designed to participate in live dialogue. To recreate conversational flow, developers have been forced to piece together transcription, voice activity detection, and turn-taking logic a patchwork that leads to latency, errors, and frustrating user experiences.

Flux eliminates these problems by embedding turn-taking directly into recognition. It transforms speech recognition from simply transcribing words to modeling the flow of dialogue itself. This provides developers with the tools to build responsive, human-like voice agents without the complexity of workaround code or endless threshold tuning.

What Flux Delivers:

Embedded turn-taking intelligence – Conversation-aware recognition that handles timing inside the model itself, with context-aware turn detection and native barge-in handling for fluid exchanges.
Lightning-fast performance – Ultra-low latency where it matters most with ~260ms end-of-turn detection, plus distinct events to support eager response generation before a turn is complete.
Simpler development – Turn-complete transcripts and structured conversational cues replace fragile client-side logic, so teams can ship production-ready agents in weeks, not months.
Enterprise-ready scalability – Nova-3 level accuracy, GPU-efficient concurrency with 100+ streams per GPU, and predictable costs that avoid the hidden overhead of bolted-on systems.

Also Read: Paid Raises $21M Seed to Power AI Agent Economy Infrastructure

“At Vapi, our mission has always been to give engineering teams a platform to build their conversational front-door,” said Jordan Dearsley, Founder, CEO, Vapi. “Deepgram’s launch of Flux is a perfect example of that vision coming to life. By embedding turn-taking directly into recognition, Flux solves one of the hardest challenges in conversational AI. We’re thrilled Deepgram chose VapiCon to introduce this breakthrough, and we can’t wait to see the incredible voice agents developers create with it.”

“Flux redefines what speech recognition can do for real-time AI,” said Scott Stephenson, CEO and Co-Founder, Deepgram. “For decades, ASR was built to listen and record. Flux is different it listens, understands, and guides conversations with human-like timing. It’s the foundation voice agents have been waiting for and is our latest milestone towards solving the Audio Turing Test.”

“At Lindy, our mission is to build the world’s most capable AI employees, and voice is a big part of this,” said Flo Crivello, Founder and CEO, Lindy. “Deepgram has been our partner of choice since the earliest days, and Flux brings things to the next level: there is simply nothing coming close on the market in terms of latency or conversation awareness. It’s enabled us to deliver the smoothest, most natural, interruption-free conversations for our customers.”

Source: Businesswire