Building a Human-Like AI Outbound Calling Agent at ~$0.01/Minute - part 1

So why did this startup come to me in the first place?

Because they were quietly burning an enormous amount of money every single day.

They were running outbound sales at scale, employing close to 200 call center agents whose entire job was to dial leads, repeat the same conversations, and move prospects through a fairly predictable script. It worked—but it was expensive. Daily lead processing costs were piling up, and every attempt to scale meant hiring more people, managing more shifts, and absorbing more overhead. As a startup, that trajectory just wasn’t sustainable.

Naturally, they looked at existing AI voice solutions. Tools like Retell were impressive, polished, and powerful—but when they ran the numbers, the cost structure didn’t make sense for their volume. At the scale they needed, those platforms were simply too expensive. So instead of buying a solution, they asked a more dangerous question: What if we built our own?

That’s when they came to me with the real challenge.

They didn’t just want an AI caller. They wanted an AI-powered outbound calling agent that could handle 50+ concurrent calls, sound indistinguishable from a human, respond fast enough that even a 300–500ms delay would feel unnatural, and run at around $0.01 per minute all-in—telephony, speech recognition, LLM reasoning, and text-to-speech included. This wasn’t a research project or a demo; this was meant to replace a meaningful chunk of a real call center operation.

Once you frame it that way, the constraints make sense—and they’re brutal.

Concurrency was non-negotiable because one AI agent that can only handle a single call is useless at this scale. Voice quality had to be human-level because outbound sales is unforgiving—people hang up the moment something feels off. Latency had to stay under half a second because conversational timing is everything. And cost had to stay absurdly low because the entire point was to beat human labor and existing AI platforms on price.

My first instinct, like many developers, was Twilio. I’ve used it extensively, and it’s usually the default choice. But once I started modeling 50+ simultaneous calls, streaming audio both ways, and layering on real-time speech and LLM processing, the cost curve got ugly fast. Twilio is great—but at this scale, familiarity becomes expensive.

That’s when Telnyx entered the picture.

Telnyx offered significantly cheaper call rates, native bi-directional real-time media streaming, and much better low-level control over audio—exactly what you need when you’re building a tight, real-time voice pipeline. After running the numbers and reviewing the APIs, the decision stopped being emotional and became purely architectural. For this use case, Telnyx was simply the better foundation.

From there, the system design almost assembled itself. Telnyx handled outbound calls and streamed audio in real time. ElevenLabs transcribed the recipient’s voice with low enough latency to keep the conversation feeling natural. OpenAI acted as the brain, turning those transcripts into context-aware responses while maintaining conversational state. ElevenLabs then converted those responses back into ultra-realistic speech, which streamed straight back into the live call. Nothing fancy on its own—but carefully orchestrated.

The real unlock wasn’t any single provider. It was making everything asynchronous. Each call ran independently, no blocking operations, no shared bottlenecks, allowing dozens of conversations to happen at once without latency spikes or runaway costs. That’s what made fifty concurrent calls feasible—and what made the penny-per-minute target realistic.

That’s where this part of the story ends. The startup had a path forward, a viable architecture, and a way out of their call-center cost spiral. In the next chapter, I’ll talk about what actually broke during development, how we pushed latency down into human territory, and how this entire system went from idea to working prototype in under a week.

Say something nice to IndieHackers920609…

1

The interesting part isn’t the stack — it’s how the cost constraint reshaped the architecture.
Once you anchor on $0.01/min, everything else (async, Telnyx, no blocking calls) becomes inevitable. Curious which assumption broke first when you actually shipped?

JamesCampbell920609

·
3 months ago
·