Why voice is the strictest channel in CX
The emotional weight has a direct technical consequence. A two-second delay in a chat response is invisible. On a phone call, it is the moment a caller starts wondering if the line dropped. A generic waiting sound in a chat widget is mildly annoying. On a phone call, on loop, it signals immediately that no one really thought about this experience. A wrong answer in chat can be corrected in the next message. On a phone call, it is the last thing the caller hears before they ask for a human.
Every requirement that is forgiving on other channels is strict on voice. Response speed. Turn-taking. What the caller hears while the AI is thinking. How your brand sounds. Whether the AI can actually do something, or only answer questions. Any single failure is enough to make the conversation feel broken and send that high-value caller straight to a human, or worse, to a competitor.
Most voice AI products treat this as a feature list. We treated it as a rebuild.
Already using HALO? Read everything about HALO Voice in our Knowledge Center
What we rebuilt in HALO Voice
A new speech pipeline. We rebuilt speech-to-text and text-to-speech from the ground up. Transcription is faster and more accurate. Speech output is smoother and more consistent. Time-to-first-word was a deliberate priority, because the gap between a caller finishing their sentence and the AI starting its response is felt on voice in a way it simply is not anywhere else.
Intelligent turn detection. HALO Voice identifies when a caller has finished speaking and responds at natural intervals. No talking over each other. No silences that make the caller wonder if the system is still working. The conversation moves the way a real one does, because the AI knows when it is its turn.
Contextual filler audio and the Audioverse. When the AI is processing, the caller is waiting. We replaced generic processing sounds with contextual filler phrases that match what the agent is actually doing. During a knowledge search: "Let me see what I can find." During tool execution: "Hang on, almost got it." The Audioverse adds subtle ambient background sound that mimics a real customer service environment. It is the difference between a caller feeling like they reached someone and feeling like they reached something.
ElevenLabs and the Lexicon. We moved to ElevenLabs as the primary voice provider for HALO, deprecating Azure TTS. The difference in naturalness and expressiveness is significant. Voice personas are configurable per language and per environment, previewable before deployment. The Lexicon gives you exact control over pronunciation: prevent a brand name from being translated, set an alias for a URL or abbreviation, spell out an acronym letter by letter. Small details that add up to an experience that sounds designed, not generated.
Agentic tool integration. HALO Voice connects natively to the same agents, tools, and knowledge base that power the rest of HALO. The voice agent executes real workflows: booking, looking up account information, routing, escalating. Not just retrieving pre-written answers. DTMF input support extends this to callers who need to enter precise numeric information.
WhatsApp Calling. Customers want to call the way they want to call. For a growing share of them, that means WhatsApp, not the dial pad. HALO Voice handles both. A WhatsApp call reaches the same voice agent, with the same speech pipeline, the same turn detection, the same tools, the same context. No separate setup, no degraded experience, no second-class channel. Whether the caller dials your number or taps the call button in a WhatsApp thread they already had open, the conversation works the same way.
The part that changes everything
All of this makes HALO Voice a better voice AI. But there is one requirement no amount of speech quality or filler audio can solve on its own: the caller should not have to repeat themselves.
The caller who already tried other channels almost certainly has a history with your business. A WhatsApp message sent last week. An order placed yesterday. A complaint logged this morning. If the voice AI does not know any of that, every other improvement is undermined the moment it asks them to start over.
This is where HALO is built differently. HALO Voice is not a stand alone voice product. It is the same agent that handles your WhatsApp messages, your chat, your messaging, now picking up the phone. Same knowledge base. Same tools. Same customer data layer. Context set in any channel is automatically available in every other one, in both directions. The voice agent knows who it is talking to before the first word is spoken, because it has already been talking to them.
That is what separates a voice automation project from a genuine CX advantage. Not a voice AI that sounds natural in isolation. An agent that already knows the customer, on whichever channel they reach for next.
Get started with HALO: how to create AI agents in HALO
The channel worth getting right
Voice has been treated as a legacy channel for too long. Too expensive to scale, too risky to automate, too complex to modernize. The result is that most businesses have invested heavily in every other channel and left their most emotionally charged touchpoint running on infrastructure from a decade ago.
The customers who call are already your most engaged ones. They needed help badly enough to pick up the phone. How that call goes shapes how they feel about your business far more than any chat interaction will. The technology to handle those calls well exists now. The question is whether the platform behind it is built to match.
That is what HALO is.