Why Voice Matters for Sales Training
Text-based roleplay teaches you what to say. Voice-based roleplay teaches you how to say it. In sales, delivery matters as much as content, the pause before a key number, the confident tone when handling an objection, the ability to think on your feet when a prospect goes off-script. These are skills you can only build by speaking out loud, in real-time, under pressure.
Until now, practicing these skills required a human partner. Today, we’re changing that with full-duplex voice chat, a real-time, bidirectional voice experience that feels like talking to an actual person, not dictating to a machine.
What Full-Duplex Means
Most voice AI systems work in half-duplex mode: you speak, the system processes, then it responds. There’s a clear turn-taking rhythm that feels nothing like a real conversation. Real conversations overlap. People interrupt. They react with verbal cues while the other person is still talking.
Full-duplex voice chat eliminates the walkie-talkie dynamic. The AI listens and speaks simultaneously, just like a human would. It can interject with a clarifying question mid-sentence. It produces natural filler responses, “mm-hmm,” “interesting,” “right”, while you’re making your pitch. And it supports barge-in, meaning the AI can interrupt you the way a skeptical buyer would, forcing you to adapt and recover in the moment.
Contextual Fillers and Natural Flow
One of the subtle details that makes our voice mode feel real is contextual filler generation. When you’re explaining a complex feature, the AI doesn’t sit in dead silence, it produces brief acknowledgments that match the context. A surprised “oh, really?” when you share a surprising data point. A thoughtful “I see” when you’re building a logical argument. These aren’t random, they’re generated by our Character Agent based on the persona’s communication style and the conversation’s emotional trajectory.
How It Works
Under the hood, voice chat leverages a unified real-time pipeline. Audio streams bidirectionally through a WebSocket connection, with speech-to-text and text-to-speech running in parallel rather than sequentially. Our agentic engine processes the transcribed input through the same six-agent architecture used for text conversations, Knowledge Retrieval, Character, Guardrail, and the rest, ensuring that every spoken response is grounded in your organization’s actual knowledge base.
Latency targets are aggressive: sub-800ms for standard responses, with contextual fillers delivered in under 300ms to maintain conversational flow. The result feels remarkably natural, even over a standard internet connection.
Getting Started
Full-duplex voice chat is available today for all PersonaTrain customers on Growth and Enterprise plans. No additional setup is required, if you’ve already uploaded your knowledge base and configured scenarios, voice mode works immediately with the same content.
Voice chat is just one mode, learn about all three interaction modes including our photorealistic video avatars.
To try it, open any active scenario and click the microphone icon in the conversation interface. The system will prompt you to grant microphone access, and you’re off. We recommend starting with a familiar scenario in text mode first, then switching to voice to experience the difference. Your training managers can review transcripts and voice session analytics from the same dashboard they already use.
Ready to See PersonaTrain in Action?
Book a personalized demo and see how PersonaTrain transforms your team's training with AI that knows your business.
Book a Demo