1
0 Comments

Why My Claude Costs Kept Rising (Even When Output Didn’t): 3 Token Leakage Patterns in Agent Workflows

I used to assume rising AI cost meant one of two things:

  1. model pricing changed, or
  2. I was simply using AI more.

But after debugging real Claude + agent workflows, I kept seeing a third pattern:

Token leakage — spend goes up, useful output doesn’t.

If you’re building solo products or shipping fast with agents, this is probably happening to you too.

This post is a practical breakdown of the 3 leakage paths I’ve seen most, plus what actually worked to stop them.


1) Duplicate calls: same intent, multiple executions

This was the biggest silent cost driver.

Typical triggers:

  • manual run + background automation run
  • same task executed across multiple agent windows
  • one step fails, whole pipeline reruns

What it looked like in logs:

  • near-identical prompts in short windows
  • request count rising faster than useful outputs
  • repeated outputs with little incremental value

What reduced spend quickly:

  • add an idempotency key per task
  • retry failed node only (not full-chain rerun)
  • short-window dedupe (e.g. 30–120s)

2) Context bloat: every request carries too much history

I thought “more context = better quality.”
Sometimes true. Often expensive noise.

Typical triggers:

  • long conversations never segmented
  • full history passed “just in case”
  • prompts growing over time without cleanup

Signals:

  • input token ratio keeps climbing
  • later turns cost much more than early turns
  • quality gain is marginal compared to token increase

Fixes that worked:

  • split sessions by task boundary
  • summarize every N turns instead of full-history carry
  • prompt layering: fixed rules + task goal + minimal context

3) Retry storms: minutes can burn a day’s budget

When upstream gets unstable, bad retry logic becomes expensive fast.

Typical triggers:

  • unbounded retries
  • timeout too aggressive, causing cascading re-sends
  • no separation between retryable and non-retryable errors

Signals:

  • short request bursts + clustered errors
  • sharp cost spike without matching productivity spike

Fast mitigation:

  • exponential backoff + jitter
  • max retry count and retry time budget
  • explicit error classes (retryable / non-retryable / degradable)

The minimal data I now track (to debug in 5–10 minutes)

I stopped trying to build a huge dashboard first.
This small schema was enough to find most issues:

  • timestamp
  • task/conversation ID
  • model/provider
  • input/output tokens
  • status/error type
  • retry count
  • latency

With this, I can usually answer:

  • Is this duplicate execution?
  • Is context size drifting?
  • Is retry policy causing spikes?

Practical sequence that worked best

I used to over-engineer too early. That was a mistake.

Now I follow this order:

  1. Stop loss first
    cap retries, reduce context window, disable suspicious triggers

  2. Find top contributors
    sort by anomaly contribution, fix biggest 20% first

  3. Turn fixes into rules
    dedupe, retry boundaries, session segmentation

This shifted cost management from “monthly surprise” to “same-day containment.”


What I’m building around this

I’m building around a simple operational loop:

detect → attribute → contain

Not “another pretty dashboard,” but a way to make cost control part of everyday workflow.

If you want to try the exact setup I’m using, here’s the install command for macOS/Linux:

curl -fsSL https://aikeylabs.com/zh/i/ih02 | sh

If you’re testing it, I’d love your feedback on where cost leaks show up first in your workflow (duplicate calls, context bloat, or retry storms).

posted to Icon for group AI Tools
AI Tools
on May 21, 2026
Trending on Indie Hackers
AI runs 70% of my distribution. The exact stack. User Avatar 115 comments I'm a solo founder. It took me 9 months and at least 3 stack rewrites to ship my SaaS. User Avatar 104 comments Show IH: I'm building a lead gen + CRM tool for web designers targeting local businesses without websites — starting with Spain User Avatar 73 comments I built a URL indexing SaaS in 40 days — here's the honest story User Avatar 58 comments We could see our AI bill, but not explain it — so I built AiKey Avatar for AiKey Labs 25 comments Creative Generator — create product-focused visuals and ad concepts faster User Avatar 11 comments