We focus on AI voice agent niche. In order to validate market and ideas, we are working as a freelancer.
We have delivered 10+ voice agents using different tools (Bland, VAPI, Retell) for different use cases, like AI receptionist, lead qualification, call center, etc. We learned a lot on AI voice agent and got some experience.
TLDR of our observations:
We will keep hassle.
Most people are using AI agents wrong
Why early progress often feels real before it actually is
Is there any point in creating a product in a crowded market?
Do you think there’s a case for creating a sort of “voice UX canon”? Like, repeatable patterns or blueprints for common conversational bottlenecks? Seems like the community could benefit from a shared library of well-tested prompt + logic flows (especially for transfers, fallback loops, etc.) the way we do for UI components.. Especially as I assume we will get to a world where V0 for voice interactions exists.
Scribbles that down in a notebook for later
Good point. In the AI voice agent, two major components:
conversation flow has two parts, including domain specific (like ask question 1, then question 2) or generic (like phone number). I think generic part can be put into a shared library. Like how to say phone number: instead of saying one hundred, say one zero zero.
For the automation => this part can be definitively reused. I think make, zapier and N8N already provide a lot of templates for this.
But couldn't you even break down the domain specific parts into repeatable patterns. For instance as a business owner I always want my response to have a specific structure.... Let's pull an example, if its a B2B sales call I might want my initial first response to the users quarry to be always be structured as a "Heroes Journey" that tells the user the story of how our product can make them the hero within their own company. can be broken up into a few component parts. Now, I can go and write that prompt myself (setting up the "logic" within the agent) but that takes time an effort.
At a high level, there are two major reusable layers:
These are both:
Generic prompts – like phone number capture, error recovery, confirmations.
Structured narrative patterns – for instance, your example of the Hero’s Journey for B2B sales is spot-on. That structure could be modularized.
A "Hero's Journey" module might include:
Call to Adventure: “What if you could cut onboarding time in half?”
Mentor Appears: “We’ve worked with teams just like yours to overcome that exact pain.”
The Transformation: “By integrating X, companies see Y outcome.”
The Return: “And now, you’re the one who brings the solution back to your team.”
Once defined, that narrative arc becomes a reusable conversational component—just like a UI card component that adapts to content.
These are integrations—send SMS, log CRM notes, route escalations. This part is already halfway solved by Zapier/Make/N8N, but it lacks tight coupling to voice-first logic flows. Imagine being able to attach a “Send Hero Follow-Up Email” automation to your Hero’s Journey block with one click.
Where This Goes:
Eventually, we’ll reach a “V0” layer for voice just like we have for web apps—pre-built blocks for onboarding, scheduling, lead capture, etc. Teams won’t need to start from scratch every time. Instead, they’ll assemble flows from pre-tested narrative patterns and logic blocks that are proven to work for their domain.
Imagine a library of these "Conversational Components":
Capture.PhoneNumber.Standard
Sales.B2B.HerosJourney.Intro
Support.Troubleshooting.FallbackLoop
HR.CandidateScreening.ThreeStepIntro
Plug them in. Customize lightly. Deploy.
This is less about templates and more about a design system for voice—where story structure, tone, fallback logic, and actions are all abstracted into composable elements. The time savings would be massive. The UX consistency? Even better.
I like your composable elements idea. I am thinking that inbound call can follow some predefined flow as callers are more cooperative.
One callout is that the conversation with real human doesn't always follow a predefined script (especially outbound sales call). Let us say users will ask questions
Can I understand that we need to build different components to handle different corner cases? I am debating if we have many components, will the customer (who use the library) get confused?
Would love to hear more about the different requirements you encountered.
Like:
Makes sense