We focus on AI voice agent niche. In order to validate market and ideas, we are working as a freelancer.
We have delivered 10+ voice agents using different tools (Bland, VAPI, Retell) for different use cases, like AI receptionist, lead qualification, call center, etc. We learned a lot on AI voice agent and got some experience.
TLDR of our observations:
- Only 20% of AI voice agents will be used by our customers. We only got two use case working, the first being operator training and the seconding being AI receptionist. The other 80% just go nowhere. It is sad. We feel like that technology are not there for a little complicated use case.
- Devils are on user requirement part. Writing prompt is easy, but handling different requirements can take huge effort. For AI receptionist case, the most important thing is to do warm transfer to different stakeholders. If stakeholders don't answer, the agent should take control again. We spent 1 and half months to build it and make it work.
- Testing is extremely hard. Our testing approach is to do manual test. As there are many corner cases, we need to manual call the AI phone agent each time when we change some prompt. We know that those tools can do automatic test, but they can't cover a lot of corner cases.
We will keep hassle.
Do you think there’s a case for creating a sort of “voice UX canon”? Like, repeatable patterns or blueprints for common conversational bottlenecks? Seems like the community could benefit from a shared library of well-tested prompt + logic flows (especially for transfers, fallback loops, etc.) the way we do for UI components.. Especially as I assume we will get to a world where V0 for voice interactions exists.
Scribbles that down in a notebook for later
Good point. In the AI voice agent, two major components:
conversation flow has two parts, including domain specific (like ask question 1, then question 2) or generic (like phone number). I think generic part can be put into a shared library. Like how to say phone number: instead of saying one hundred, say one zero zero.
For the automation => this part can be definitively reused. I think make, zapier and N8N already provide a lot of templates for this.
But couldn't you even break down the domain specific parts into repeatable patterns. For instance as a business owner I always want my response to have a specific structure.... Let's pull an example, if its a B2B sales call I might want my initial first response to the users quarry to be always be structured as a "Heroes Journey" that tells the user the story of how our product can make them the hero within their own company. can be broken up into a few component parts. Now, I can go and write that prompt myself (setting up the "logic" within the agent) but that takes time an effort.
At a high level, there are two major reusable layers:
These are both:
Generic prompts – like phone number capture, error recovery, confirmations.
Structured narrative patterns – for instance, your example of the Hero’s Journey for B2B sales is spot-on. That structure could be modularized.
A "Hero's Journey" module might include:
Call to Adventure: “What if you could cut onboarding time in half?”
Mentor Appears: “We’ve worked with teams just like yours to overcome that exact pain.”
The Transformation: “By integrating X, companies see Y outcome.”
The Return: “And now, you’re the one who brings the solution back to your team.”
Once defined, that narrative arc becomes a reusable conversational component—just like a UI card component that adapts to content.
These are integrations—send SMS, log CRM notes, route escalations. This part is already halfway solved by Zapier/Make/N8N, but it lacks tight coupling to voice-first logic flows. Imagine being able to attach a “Send Hero Follow-Up Email” automation to your Hero’s Journey block with one click.
Where This Goes:
Eventually, we’ll reach a “V0” layer for voice just like we have for web apps—pre-built blocks for onboarding, scheduling, lead capture, etc. Teams won’t need to start from scratch every time. Instead, they’ll assemble flows from pre-tested narrative patterns and logic blocks that are proven to work for their domain.
Imagine a library of these "Conversational Components":
Capture.PhoneNumber.Standard
Sales.B2B.HerosJourney.Intro
Support.Troubleshooting.FallbackLoop
HR.CandidateScreening.ThreeStepIntro
Plug them in. Customize lightly. Deploy.
This is less about templates and more about a design system for voice—where story structure, tone, fallback logic, and actions are all abstracted into composable elements. The time savings would be massive. The UX consistency? Even better.
I like your composable elements idea. I am thinking that inbound call can follow some predefined flow as callers are more cooperative.
One callout is that the conversation with real human doesn't always follow a predefined script (especially outbound sales call). Let us say users will ask questions
Can I understand that we need to build different components to handle different corner cases? I am debating if we have many components, will the customer (who use the library) get confused?
Would love to hear more about the different requirements you encountered.
Like:
Makes sense