If you are a developer, “best voice-agent tool” is the wrong question. The right one is “how much of this do I want to build myself?” That single decision splits the whole market, and every tool below sits somewhere on the line it draws.
Here is the line. At one end is the managed call platform: you hand it a prompt and a phone number, it wires the speech recogniser, the language model, the voice and the carrier together, and you get a working agent in an afternoon. At the other end are the raw building blocks: an open-source framework or a bare speech-to-speech model, where you assemble every piece yourself, pay each provider directly, and own the result completely. Neither end is better. They are answers to different questions about how much control you want and how much time you have.
So this is not a list of “the most popular voice tools”. It is the five worth shortlisting if you are the one writing the code, ranked by how well each serves a developer building their own agent, and arranged so you can see where each sits on that build-versus-buy line. Let me tell you what I weighed, then walk each one.
What actually matters when you are the one building
Five things, in rough order of how often they decide it for a developer.
- How much is assembled for you. A managed platform hands you a working loop: speech in, model, voice out, phone line, all connected. A framework hands you the wiring and lets you choose every part. The first gets you live faster. The second gives you control the first cannot. Knowing which you want is most of this decision.
- The pricing model, not just the price. These tools bill in four different shapes. Per-minute on a managed platform. Audio tokens on a raw model. Agent-minutes on a hosted framework. Free open-source plus whatever your providers charge. The shape decides how predictable your bill is, and it matters more than a cent or two on the headline.
- Control over the components. Can you swap the voice, change the model on simple questions, bring your own speech recogniser? On a building block, yes, all of it. On a managed platform, sometimes, within limits. The more you want to tune, the further down the open end you belong.
- Whether the phone line is included. Some tools ship telephony in the box. Others expect you to bolt on Twilio yourself and pay it separately. That is a small job for a developer, but it is a job, and it changes the all-in cost, so know which camp your pick is in.
- What you own at the end. Self-hosting an open-source framework means the stack is yours, with no platform that can change its pricing or shut down underneath you. The trade is that running production voice infrastructure is real engineering work you now carry.
No single tool tops all five, so the ranking below is really about which trade-off suits you. A managed platform wins on speed-to-working-agent and gives up a little flexibility and the cheapest floor. A raw building block wins on control and price and asks for your engineering time in return. Read each entry for where it sits on that line, not just for the number next to its name.
How I ranked these
The order below is my editorial read of fit for a developer building a custom agent, best first. It is not the raw score from the tables, because “best for developers” is about the right trade between convenience and control, not an all-round average. I have weighted it toward the typical developer, the one who wants to ship something working soon and keep the option to tune it later, rather than the systems engineer who wants to own every byte from day one. If that second description is you, read the list bottom to top.
One disclosure up front. Some of these tools run affiliate programmes we may earn from, and Deepgram is the one with a confirmed programme here. The ranking is not for sale, no vendor saw this page before it went live, and if a tool ever pays to appear it will be labelled sponsored and kept out of the ranked positions, so a paid slot can never pass for an earned one. We have not placed our own timed test calls yet either, so there are no Voxrater latency numbers here, just sourced prices, features and an honest opinion. The order is the one I would give a developer friend who asked, with nothing else weighing on it.
1. Vapi: the fastest route to a working agent that you still control
If you want to be making real calls this week without giving up the ability to swap parts later, start here. Vapi sits at the managed end of the line, but it is the managed option built for developers rather than for no-code agencies, and that is why it tops the list for most people writing code. It runs the call, and you choose and pay for the pieces. Want to swap Deepgram for Azure on the speech recogniser, or run a cheaper model on the easy questions? You can, and every piece shows up on the bill.
The pricing model is honest about this. Vapi charges $0.05 a minute to host the call, and that is the only number Vapi actually sets. The speech-to-text, the model and the voice are billed straight through from whoever you plug in, at their own rates, with no Vapi markup when you bring your own keys. So the headline is the floor, not the bill, and the all-in lands anywhere from about $0.05 to $0.30 a minute depending on what you wire in. For a developer, that pass-through model is the appeal: you tune the cost by tuning your component choices, rather than paying for a bundle you did not assemble.
It carries the operational kit you would otherwise build yourself: bring your own phone-number supplier over SIP, hand a live call to a human with the AI’s summary attached, launch whole outbound campaigns, and let other AI tools trigger calls through its MCP connection (that is Model Context Protocol, the standard way to let one AI tool call another). Concurrency starts at 10 simultaneous lines, then runs about $10 per extra line a month. The compliance paperwork is there too: SOC 2 Type II, GDPR and PCI DSS v4.0.1 at the platform layer, with HIPAA a $2,000 a month add-on if you need it.
The honest catch is that Vapi expects you to assemble the pieces, which is why it scores lowest of this group on ease of use. The flexibility is the whole product. For a developer that is the point. For a non-technical buyer it is more wiring than they wanted, and a fully bundled platform would suit them better. But measured against the other tools on this page, Vapi is the one that gets a developer to a working, controllable agent with the least time spent on plumbing.
Pick Vapi if you want a working agent soon, you still want to choose and price each component, and you would rather tune a pass-through bill than assemble a framework from scratch. Read the full Vapi review for the pass-through pricing detail.
2. LiveKit: the strongest open-source base, with the phone line built in
LiveKit is where I would send a developer who wants to own the whole pipeline on open-source foundations. It sits firmly at the building-block end, but it is the most complete base of the lot, and that is what earns it second place. LiveKit is the real-time transport that carries the audio between caller and agent with low enough delay that the call feels like a call, plus an Agents framework that lets you wire your own speech-to-text, model and voice on top. OpenAI uses LiveKit to move the audio for ChatGPT’s Advanced Voice, which tells you the scale it runs at.
There are two ways to run it, and the pricing model follows. Self-host the open-source core, the server and the Agents framework, both Apache 2.0 licensed, and it costs you nothing beyond the servers you run them on plus the providers you bring. Or use LiveKit Cloud, the managed service, which meters the agent session that orchestrates each call at about $0.01 a minute past the included allowance, with tiny WebRTC participant minutes and bandwidth on top. The speech, model and voice are billed by those providers, not by LiveKit, usually another $0.06 to $0.15 a minute, so a Cloud voice agent lands around $0.08 to $0.20 a minute all-in, most of which is the providers you chose rather than LiveKit’s cut.
The feature that lifts it above the other open-source pick is telephony. Plenty of building-block tools leave you to bolt on Twilio. LiveKit ships inbound and outbound phone support over SIP in the box, with its own US local inbound at about $0.01 a minute and cheaper third-party SIP options. The framework also handles the awkward real-time problems for you: working out when the caller has stopped talking, handling interruptions when they talk over the agent, streaming audio through the pipeline. The plans run from a free Build tier (no card, 1,000 agent minutes) up through Ship at $50 a month and Scale at $500. One compliance catch: the HIPAA Business Associate Agreement and region pinning begin on the Scale tier, so a healthcare build cannot use the free plan and stay compliant.
What it deliberately does not give you is the speech, the brain or the voice. You bring those, plugged in as provider plugins. That is the flexibility and the homework in one sentence. Nobody locks you to a house voice, and nobody hands you a working stack either.
Pick LiveKit if you want to own the whole voice pipeline on open-source foundations, you value telephony being built in rather than bolted on, and self-hosting at no platform cost appeals. The LiveKit review covers the Cloud tiers and the self-host split.
3. Pipecat: vendor-neutral framework, free to self-host
Pipecat is the other open-source pick, and the reason it sits just behind LiveKit rather than level with it comes down to what is built in. It is two things that share a name, and getting them straight is the whole story. The first is the Pipecat framework, an open-source Python project from Daily (the audio and video infrastructure company) that wires together the parts a voice agent needs: speech-to-text, a model, a voice, and the transport that carries the call. It has roughly 12,600 GitHub stars and a BSD-2-Clause licence, so it is genuinely free and you can use it commercially. The second is Pipecat Cloud, Daily’s managed hosting that runs your agent at scale.
The pricing model mirrors that split. Self-hosting the framework costs nothing in Pipecat fees; you run it on your own server and pay only your chosen providers. Pipecat Cloud bills per running agent: the base size, agent-1x, is $0.01 a minute while active and a tiny $0.0005 a minute when reserved (one-twentieth of the active rate, to keep instances warm so calls connect instantly), with bigger agents at $0.02 and $0.03. Telephony is billed on top: a Daily phone number over the normal network is $0.018 a minute, a SIP connection to your own supplier is $0.005. Put it together for a typical phone agent and a realistic all-in lands around $0.03 to $0.20 a minute depending heavily on the models you pick.
The appeal for a developer is that Pipecat is vendor-neutral on purpose. It does not give you a voice, a model or a speech recogniser of its own. You choose every component, Deepgram or Whisper for transcription, OpenAI or Anthropic for the brain, ElevenLabs or Cartesia for the voice, and Pipecat is the wiring that makes them talk to each other in real time. On compliance, Pipecat Cloud is in a stronger spot than most open-source tooling because Daily sits underneath it: Daily’s docs state it runs on HIPAA-compliant infrastructure and that Daily can sign a single Business Associate Agreement covering Pipecat Cloud, the WebRTC layer and transcription. We have left SOC 2 unticked, because Daily describes it as on the roadmap rather than a published certificate.
So why behind LiveKit? Two honest reasons. Pipecat is Python-only, so if your team is not comfortable in Python it is the wrong starting point. And it leans on Daily or a third party for telephony and transport rather than shipping the same depth of built-in real-time stack. For a Python developer who wants maximum control with no platform markup, that is a fair trade and Pipecat is one of the best starting points going. For a developer who wants the transport and telephony handled in the same framework, LiveKit is the cleaner base.
Pick Pipecat if you have a Python developer, you want a free vendor-neutral framework with no platform markup, and a cheap managed cloud waiting for when you outgrow your own servers. The Pipecat review has the framework-versus-cloud breakdown.
4. OpenAI Realtime: the best raw voice, if you build everything around it
The Realtime API is the rawest building block on this list, and it earns its place on voice quality alone. It is not a product you log into and configure. It is the engine underneath one. OpenAI gives you a single speech-to-speech model, meaning one model that hears the caller, works out what to say, and speaks back, without bolting together a separate speech-to-text step, a model and a voice. That tight loop is why it sounds so good. The current flagship is gpt-realtime-2, and its voices, especially the two newer ones called marin and cedar, are about as natural as anything you can buy today. It speaks 32-plus languages with native prosody and does tool calling and MCP, so it can look things up and take actions mid-call.
The pricing model is the thing that trips developers up, so let me show the workings. Billing is by audio tokens, not by the minute. OpenAI’s pricing page lists gpt-realtime-2 at $32.00 per 1M audio input tokens ($0.40 per 1M if cached) and $64.00 per 1M audio output tokens, with a cheaper gpt-realtime-mini at $10.00 in and $20.00 out. The encoding rule is one token per 100 milliseconds of caller audio and one per 50 milliseconds of agent audio. A minute split roughly half caller, half agent is only about $0.05 in raw audio. But that is the floor, not the bill, because your system prompt and the whole conversation so far get re-sent as text tokens on every turn, and a chatty agent racks those up fast. An independent breakdown across 11 call profiles landed at roughly $0.18 to $0.46 a minute with no caching, dropping to about $0.05 to $0.10 once prompt caching is on. So if you build on this, build caching in from day one.
The reason it sits at number four rather than higher for a developer is how much you assemble. There is no dashboard, no flow builder, no campaign manager, and no phone line. You bring your own telephony, almost always Twilio at about $0.014 a minute on top, and you write the code that connects a call to the model. For a developer that is a day or two of work, not a blocker, but it is more building than any tool above. On compliance it inherits the OpenAI API platform’s posture: SOC 2 Type 2, a HIPAA Business Associate Agreement available, GDPR with a Data Processing Addendum, plus ISO 27001 and 27701.
The honest framing: the voice is the best reason to choose this, and the token pricing plus the do-it-yourself assembly are the reasons plenty of teams pair it with a framework instead of using it bare. In fact both LiveKit and Pipecat can run the Realtime model inside their pipelines, which is often the smarter way to meet it, the OpenAI voice with a framework handling the plumbing.
Pick OpenAI Realtime if voice quality is the thing you will not compromise on, you are happy to write the agent loop yourself, and you can model a token-based bill rather than read a per-minute rate off a page. The Realtime review shows the full token-to-minute maths.
5. Deepgram: the cheapest unified runtime, bring your own phone line
Deepgram is the budget pick for a developer, and it places last here not because it is weak but because it is the narrowest fit for the general “build a voice agent” job. It started as a speech-to-text company, and it shows in the pricing. The Voice Agent API takes the three pieces a phone agent needs, the speech-to-text, the model and the voice, and bills them as one number instead of three. On the Standard tier that number is about $0.075 a minute, among the lowest all-in rates of any serious platform. For a developer who wants a cheap, certified, unified runtime to build on, that is genuinely hard to beat.
The pricing model is the simplest on this page: one bundled per-minute rate for the runtime, with the option to go cheaper still. The cheaper tiers let you bring your own model or voice (ElevenLabs, Cartesia and AWS Polly all plug straight in), dropping Deepgram’s cut to about $0.041 a minute on the Growth plan, though you then pay the outside provider on top, so the saving is smaller than it first looks. Where Deepgram is unusually strong is the compliance paperwork: its trust page states SOC 2 Type 1 and Type 2, a HIPAA Business Associate Agreement on request, plus GDPR and PCI. That is more than several flashier competitors can show in writing, and the BAA is on request rather than gated behind a four-figure add-on.
The trade you make for the price is the phone line. Deepgram does not bring its own telephony, so you wire up Twilio yourself and pay it separately, around $0.014 a minute on top, and Twilio is then a supplier you manage. There is no built-in warm transfer, no batch calling, no SIP trunking of its own. The other limit is reach: seven languages and no documented voice cloning, so if you need to sound local in twenty markets, this is not the tool.
The honest reason it ranks last for the general developer job is scope. Deepgram is a runtime, not a framework. It gives you the cheap unified engine, but it does the least to help you assemble the agent around it: no transport stack like LiveKit, no orchestration framework like Pipecat, no managed call platform like Vapi. For a developer whose main constraint is cost per minute and who is happy to do the assembly, that price plus that compliance is a strong combination. For one who wants help building the agent, the tools above do more.
Pick Deepgram if you are building your own agent, you want the cheapest unified runtime with a clean BAA on request, and you are happy to bolt on Twilio and assemble the rest yourself. The Deepgram review covers the bring-your-own options.
The build-versus-buy reality check
Here is the part developers skip until it bites, and it deserves saying plainly. The further you move toward the building-block end, the more of the agent you own, and ownership cuts both ways. Self-hosting LiveKit or Pipecat costs nothing in platform fees, and the stack is yours with no vendor that can change its pricing underneath you. But running production voice infrastructure is real work: autoscaling, recovery, telephony plumbing, monitoring, all of it now your job. The cheapest headline on this page can become the most expensive build once you count the engineering hours.
The managed end has the mirror-image catch. Vapi gets you live fast and handles the plumbing, but you are building on a platform whose pricing and roadmap you do not control, and whose per-minute fee sits on top of every call forever. Neither trade is wrong. The mistake is picking the end without pricing the trade. A solo developer shipping a side project almost always wants the managed end, because the scarce resource is time, not money. A team building a product they will run at scale for years often wants the open end, because the control and the absent platform fee start to pay back once volume climbs and the engineering cost is amortised.
One more thing that is easy to miss. These tools are not all mutually exclusive. The Realtime model runs inside LiveKit and Pipecat. Deepgram’s speech-to-text plugs into both frameworks and into Vapi. So the real question is often not “which one” but “which managed platform or framework, with which components inside it”, and the answer to the second half is its own decision.
Before you commit, build this
Whichever way you lean, do not commit to a tool off a polished demo and a pricing page, because the demo is the vendor’s happy path and the pricing page is the floor. Spend a day and a small budget building the smallest real thing instead.
- Stand up a single working call. One agent, one prompt, one phone number you control. Time how long it takes you to get from signup to a call that connects and answers a question. That number, more than any feature list, tells you where the tool really sits on the build-versus-buy line for your skills.
- Read the bill after, not the quote before. Run a handful of real calls and check the invoice against the headline. On a token-priced tool like OpenAI Realtime, watch what your prompt length does to the cost. On a pass-through tool like Vapi, watch what your component choices add. On a framework, add up the provider bills you now owe directly.
- Test the part you will lean on hardest. If you need warm transfer, build it and confirm it reaches a human with context. If you need to swap a component, swap it and see how much code it touches. If you self-host, deploy it and see what breaks at the edges.
- Decide who owns the stack in a year. If it is you, the open end is fine and cheap. If it is a small team that will not have time to babysit infrastructure, a managed platform earns its fee.
That day of building will tell you more than any roundup, this one included. We will publish our own timed call results against the same scenarios when the test rig ships, and if they contradict what a vendor told you, the measured numbers win.
Who I left off, and why
You will notice some big names missing. Synthflow, Retell and Bland are not here, and that is deliberate. They are excellent platforms, but they sit at the no-code or turnkey end and hand a non-technical buyer a finished agent, which is a different shortlist from this one. A developer who wants to build will feel the ceiling of a no-code builder fast, so they belong on the outbound-sales roundup and category pages, not here. ElevenLabs, Cartesia and Murf are narration and text-to-speech engines, brilliant at turning a script into a voice, and several of them are providers you plug into the tools above rather than alternatives to them.
I have also kept our own site off this list, and I always will. A directory that ranks itself into its own “best” roundups has told you everything you need to know about how much to trust it. The only names here are tools you would actually build on.
Bottom line
There is no single winner, because “best for developers” genuinely depends on how much you want to build. But for the typical developer, the one who wants a working agent soon and the freedom to tune it later, I will commit: start with Vapi. It gets you to a controllable agent faster than anything else here, and the pass-through pricing means you are not locked out of swapping components when you need to.
After that, let the shape of your project break the tie.
- A developer who wants a working agent fast, with components they can still swap: Vapi.
- A team that wants to own the whole pipeline on open-source, with telephony built in: LiveKit.
- A Python team that wants a free vendor-neutral framework and no platform markup: Pipecat.
- A build where voice quality is the thing you will not compromise on: OpenAI Realtime, inside a framework rather than bare.
- A cost-driven build that wants the cheapest unified runtime and a clean BAA: Deepgram, with Twilio bolted on.
None of these is a wrong answer. They are answers to different versions of the same question, which is the one you should answer first: how much of this do you actually want to build?
Start with the Vapi review if you want to ship fast, read the LiveKit and Pipecat profiles if you would rather own the stack, and put your real call volume through the cost calculator before you commit, because at the building-block end the provider bills, not the platform headline, are the part of this decision that shows up on the invoice.