2
5 Comments

I thought building AI agents was the hard part. It wasn’t.

A few weeks ago, I started experimenting with turning APIs into AI agents.

At first, it felt easy:

  • define some tools
  • connect an LLM
  • done

But the moment I tried using it with a real API, everything broke.

Not in obvious ways — in subtle, frustrating ones.


What actually goes wrong

1. The agent sends incomplete payloads

You ask it to create something → it forgets required fields → API rejects it.


2. It “hallucinates success”

The API returns an error, but the agent says:

“Done!”

No idea what actually happened.


3. Debugging is a nightmare

Once the agent chains multiple actions:

  • which step failed?
  • what was sent?
  • what was the response?

Good luck figuring it out.


4. Auth becomes messy fast

Headers, tokens, scopes…

You end up writing more glue code than actual logic.


The realization

The hard part isn’t:

getting an LLM to call an API

It’s:

making that interaction reliable enough to trust in production


Curious how others are handling this

If you’ve built anything with:

  • LangChain
  • LangGraph
  • custom agent setups

How are you dealing with:

  • observability?
  • error handling?
  • safety for state-changing actions?

Would love to learn how others approached this.

on April 12, 2026
  1. 2

    The observability gap is the one that scales worst IMO. The confirmation pattern for mutations is solid, but once chains get past 3 steps you also need structured tracing.

    What worked for us: log every tool invocation as a structured event — tool name, input, raw response, status code, latency. When step 4 of 7 fails, you replay the chain from logs instead of guessing. It's basically the OpenTelemetry span pattern applied to agent tool calls. Without it you're debugging blind.

    On hallucinated success specifically — the deeper fix is treating every API response as untrusted input. Parse both the HTTP status AND the response body before reporting success. Some APIs return 200 with an error nested in the JSON. If you only gate on status code, the agent says "Done!" when nothing happened.

    For auth — wrapping each API as a self-contained tool with auth baked in eliminated most of our glue code. The agent calls create_customer(name, email) and never sees tokens. The tool handles headers, refresh, retry internally. Keeps credentials out of conversation history too, which matters more than people realize.

    1. 1

      Yeah, this is exactly the direction we ended up taking.

      We treat every tool call as a structured event (basically span-like), so you get full tracing + replay out of the box. Makes debugging chains way less painful.

      Also fully agree on the “untrusted response” point — we validate both status and payload before considering anything successful, otherwise it’s just false positives.

      Feels like these should be defaults honestly, not things you have to build yourself.

  2. 2

    This is painfully accurate.

    I went through the exact same thing a couple months ago. Getting the agent to call an API is easy — getting it to do it correctly and consistently is where everything falls apart.

    The “hallucinated success” point especially hit home. I’ve had cases where the API clearly returned a 400, and the agent still confidently replied like everything worked. That’s honestly scary if you’re thinking about production use.

    What helped a bit on my side:

    Strict schema validation before sending requests (basically rejecting incomplete payloads early)
    Wrapping every call with explicit success/error checks instead of trusting the model
    Logging everything — raw input, constructed payload, response — even though it gets noisy fast

    But even with that, once you start chaining steps, it becomes really hard to reason about what’s happening.

    For safety, I’ve been defaulting to:

    anything that mutates state = requires confirmation

    Feels clunky UX-wise, but safer.

    Still feels like we’re missing a solid abstraction layer here. Curious if anyone has found a cleaner way without rebuilding half the stack themselves.

    1. 2

      Yeah, 100% — went through the exact same pain.

      The “hallucinated success” is honestly the worst part. It looks like it worked, but nothing actually happened… super risky in production.

    2. 1

      This comment was deleted 2 months ago.

  3. 1

    I know a few software engineers who've built and deployed AI agents that faced similar issues. They'd likely be happy to answer any questions you have about their approaches to observability and error handling.

Trending on Indie Hackers
Your build-in-public audience is not your market. I learned the difference the slow way. User Avatar 198 comments I built a WhatsApp AI bot for doctors in Peru — launched 3 weeks ago, 0 paying customers, and stuck waiting for Meta to approve my app User Avatar 62 comments Built a "stocks as football cards" thing. 5 days in, my launch tweet got 7 views. What am I missing? User Avatar 33 comments From broke and burned out as a PM, to launching my SaaS and optimizing my health User Avatar 32 comments Why Claude Skills Are Becoming Important for Tech Careers User Avatar 24 comments I kept starting projects and dropping them. So I built a system that wouldn’t let me User Avatar 23 comments