2
2 Comments

I saw Indie Hackers getting $1,000 AI bills — so I built a safety‑first LLM gateway (and you can steal the patterns)

I haven’t personally lost $1k to a runaway agent. But I’ve spent weeks watching the AI boom and noticing a terrifying pattern.

People are great at prompts, but they are often terrible at infrastructure.

I see builders shipping apps with raw API keys and "hope" as their only circuit breaker. I look at these architectures and I see the gaps: the lack of idempotency, the in-memory counters that reset on deploy, and the worker loops that have no kill switch.

I didn't build KeelStack as a "passion project" for myself. I built it because I saw an unaddressed engineering flaw in the market: The Token Bleed.

Before the pitch, here are the four patterns I baked into the engine to stop that bleed. Steal them for your own stack, or just use the one I already built.


4 patterns every AI wrapper needs (but rarely ships)

1. Per‑user token budgets (that survive restarts)

In‑memory counters reset on every deploy. I’ve seen teams overbill users for exactly that reason.

How you can use this tomorrow:
Store usage in Redis or Postgres with a rolling hour window. Before calling the LLM, run canSpend(userId, estimatedTokens). If false, reject the request – no API call, no bill.

2. Per‑job circuit breakers

A summarization job runs for 20 minutes. Halfway through it goes haywire and starts generating millions of tokens. Without a kill switch, you pay for all of it.

How you can use this tomorrow:
Attach a token limit to each background job. The worker checks periodically – over budget? Cancel the stream immediately.

3. Idempotency for every mutating request

Stripe retries a webhook. A user double‑clicks “Generate.” Your chatbot processes the same prompt twice, bills twice.

How you can use this tomorrow:
Generate an idempotency key per request (or accept one from the client). Store the first response; return it for duplicates. No extra LLM calls.

4. Crash‑recoverable job queues

A worker dies mid‑stream. The job is lost – but the LLM keeps streaming tokens. Orphaned billing loops are real.

How you can use this tomorrow:
Use a durable queue (Redis) with atomic claiming. A Lua script ensures only one worker touches a given job. Recover orphans by resetting jobs stuck in “processing” for too long.


The hard truth

Vibe coding works for MVPs. But the moment you have paying users, “vibe” doesn’t stop a $800 bill at 2am.

Most indie hackers don’t want to learn Redis Lua scripts or debug distributed state machines. They want to build features. That’s fine.

You have two options:

  1. DIY – Take the patterns above, roll your own budget tracker, job queue, and idempotency middleware. Expect a few weeks of debugging race conditions across replicas.

  2. Grab a gateway – Use something that already has these patterns baked in.

I chose option 2 after watching others struggle. I built KeelStack so I don’t have to write Lua scripts ever again. It’s open‑source, budget‑aware, and drops in front of OpenAI, Anthropic, or any model.


What KeelStack gives you (out of the box)

  • Per‑user/hourly token budgets (persistent, global)
  • Per‑job circuit breakers with auto‑cancel
  • Idempotency middleware (24h TTL by default)
  • Redis job queue with atomic claiming + orphan recovery

You don’t write Lua. You don’t debug race conditions. You just set a budget and go back to building features.


The honest CTA

I’m sharing this on Indie Hackers because long‑form posts rank well – but also because I genuinely want fewer people to wake up to a nightmare bill.

Feel free to copy the logic from this post and roll your own. The patterns are universal.

Or, if you’d rather not spend two weeks debugging distributed state, check out what I built:

👉 KeelStack – Safety‑first LLM gateway

No fake $1k story. Just real patterns, real code, and a tool that makes them easy.

– Built for indie hackers, by someone who watches the space


on April 3, 2026
  1. 1

    The $1k bill stories usually share one thing: no hard limit, just trust that usage stays reasonable.

    I've been running several AI-powered bots for a few months and the thing that helped most wasn't prompt optimization — it was picking tools with predictable cost ceilings. Groq's free tier for lightweight tasks, flat-rate plan for heavier reasoning. Total comes to under $1/month for the automated side.

    The other thing: I had an API key sitting exposed longer than it should have. Nothing happened, but that was luck. Rotating it out was the fix, not the architecture.

    Small ops don't need a full gateway. But a spending cap and key hygiene go a long way.

    1. 1

      Yeah, you're not wrong.

      For a small bot on free tier + spending cap + key rotation? That's genuinely enough. I'd be overbuilding if I said everyone needs a gateway.

      The only reason I built KeelStack is because I kept seeing people cross that invisible line – multiple users, background jobs, webhooks – and suddenly the simple stuff stops working. That's where the $1k bills come from.

      So yeah, different scales. Appreciate you calling it out. And glad you caught that key before anything happened – luck counts for something

Trending on Indie Hackers
I shipped a productivity SaaS in 30 days as a solo dev — here's what AI actually changed (and what it didn't) User Avatar 289 comments I built a tool that shows what a contract could cost you before signing User Avatar 102 comments The coordination tax: six years watching a one-day feature take four months User Avatar 58 comments 85% of visitors leave our pricing page without buying. sharing our raw funnel data User Avatar 51 comments Are indie makers actually bad customers? User Avatar 42 comments My users are making my product better without knowing it. Here's how I designed that. User Avatar 39 comments