2
8 Comments

AI Voice Agent Building Experience as a contractor

We focus on AI voice agent niche. In order to validate market and ideas, we are working as a freelancer.

We have delivered 10+ voice agents using different tools (Bland, VAPI, Retell) for different use cases, like AI receptionist, lead qualification, call center, etc. We learned a lot on AI voice agent and got some experience.

TLDR of our observations:

  1. Only 20% of AI voice agents will be used by our customers. We only got two use case working, the first being operator training and the seconding being AI receptionist. The other 80% just go nowhere. It is sad. We feel like that technology are not there for a little complicated use case.
  2. Devils are on user requirement part. Writing prompt is easy, but handling different requirements can take huge effort. For AI receptionist case, the most important thing is to do warm transfer to different stakeholders. If stakeholders don't answer, the agent should take control again. We spent 1 and half months to build it and make it work.
  3. Testing is extremely hard. Our testing approach is to do manual test. As there are many corner cases, we need to manual call the AI phone agent each time when we change some prompt. We know that those tools can do automatic test, but they can't cover a lot of corner cases.

We will keep hassle.

on April 24, 2025
  1. 1

    Do you think there’s a case for creating a sort of “voice UX canon”? Like, repeatable patterns or blueprints for common conversational bottlenecks? Seems like the community could benefit from a shared library of well-tested prompt + logic flows (especially for transfers, fallback loops, etc.) the way we do for UI components.. Especially as I assume we will get to a world where V0 for voice interactions exists.

    Scribbles that down in a notebook for later

    1. 1

      Good point. In the AI voice agent, two major components:

      1. conversation flow with prompt.
      2. automation, like send SMS, email, CRM, transfer

      conversation flow has two parts, including domain specific (like ask question 1, then question 2) or generic (like phone number). I think generic part can be put into a shared library. Like how to say phone number: instead of saying one hundred, say one zero zero.

      For the automation => this part can be definitively reused. I think make, zapier and N8N already provide a lot of templates for this.

      1. 1

        But couldn't you even break down the domain specific parts into repeatable patterns. For instance as a business owner I always want my response to have a specific structure.... Let's pull an example, if its a B2B sales call I might want my initial first response to the users quarry to be always be structured as a "Heroes Journey" that tells the user the story of how our product can make them the hero within their own company. can be broken up into a few component parts. Now, I can go and write that prompt myself (setting up the "logic" within the agent) but that takes time an effort.

        At a high level, there are two major reusable layers:

        1. Conversational Building Blocks (Prompt Design)
          These are both:
          Generic prompts – like phone number capture, error recovery, confirmations.
          Structured narrative patterns – for instance, your example of the Hero’s Journey for B2B sales is spot-on. That structure could be modularized.

        A "Hero's Journey" module might include:
        Call to Adventure: “What if you could cut onboarding time in half?”
        Mentor Appears: “We’ve worked with teams just like yours to overcome that exact pain.”
        The Transformation: “By integrating X, companies see Y outcome.”
        The Return: “And now, you’re the one who brings the solution back to your team.”

        Once defined, that narrative arc becomes a reusable conversational component—just like a UI card component that adapts to content.

        1. Automation Routines (Back-End Logic)
          These are integrations—send SMS, log CRM notes, route escalations. This part is already halfway solved by Zapier/Make/N8N, but it lacks tight coupling to voice-first logic flows. Imagine being able to attach a “Send Hero Follow-Up Email” automation to your Hero’s Journey block with one click.

        Where This Goes:
        Eventually, we’ll reach a “V0” layer for voice just like we have for web apps—pre-built blocks for onboarding, scheduling, lead capture, etc. Teams won’t need to start from scratch every time. Instead, they’ll assemble flows from pre-tested narrative patterns and logic blocks that are proven to work for their domain.

        Imagine a library of these "Conversational Components":

        Capture.PhoneNumber.Standard
        Sales.B2B.HerosJourney.Intro
        Support.Troubleshooting.FallbackLoop
        HR.CandidateScreening.ThreeStepIntro
        Plug them in. Customize lightly. Deploy.

        This is less about templates and more about a design system for voice—where story structure, tone, fallback logic, and actions are all abstracted into composable elements. The time savings would be massive. The UX consistency? Even better.

        1. 1

          I like your composable elements idea. I am thinking that inbound call can follow some predefined flow as callers are more cooperative.

          One callout is that the conversation with real human doesn't always follow a predefined script (especially outbound sales call). Let us say users will ask questions

          1. who are you
          2. objection handling, which is common in sales.

          Can I understand that we need to build different components to handle different corner cases? I am debating if we have many components, will the customer (who use the library) get confused?

  2. 1

    Would love to hear more about the different requirements you encountered.

    1. 1

      Like:

      1. Address for ambulance and home service business. Address is super hard to get it right given the STT. We even use Google map as a fallback to confirm. But the experience is jarring
      2. Handle no answer and allow the transfer number to reject/accept the transfer call in AI receptionist. Also the office hour only allow transfer. The LLM just don’t follow the prompt
      3. Ask more than 10 questions and need to save each question to CRM
      4. Fuzzy search against the database in order to find a match agent for trucking dispatch,
      1. 1
        1. My experience has been to send a SMS during / after call and then have user correct address using Google Maps API using a type ahead. Works pretty well, but agree, address and email are hard to do over voice. Surnames get completely mangled. I think you have to capture it via web to ultimately confirm and get right.
        2. Tools like Call Rail can handle this but this is only useful is the client has a system like that which can handle multi step call routing, round robin, based on timing, etc.
        3. Tools like Retell and Vapi are starting to handle this well with Post Call Analysis where you define your JSON variables and then have prompts for each variable so the LLM knows what you want to stuff in that variable.
Trending on Indie Hackers
I built a text-to-video AI in 30 days. User Avatar 64 comments What 300 Builders Taught Us at BTS About the Future of App Building User Avatar 52 comments I built something that helps founders turn user clicks into real change 🌱✨ User Avatar 49 comments From a personal problem to a $1K MRR SaaS tool User Avatar 32 comments How An Accident Turned Into A Product We’re Launching Today User Avatar 29 comments You don't need to write the same thing again User Avatar 29 comments