I used to assume rising AI cost meant one of two things:
But after debugging real Claude + agent workflows, I kept seeing a third pattern:
Token leakage — spend goes up, useful output doesn’t.
If you’re building solo products or shipping fast with agents, this is probably happening to you too.
This post is a practical breakdown of the 3 leakage paths I’ve seen most, plus what actually worked to stop them.
This was the biggest silent cost driver.
Typical triggers:
What it looked like in logs:
What reduced spend quickly:
I thought “more context = better quality.”
Sometimes true. Often expensive noise.
Typical triggers:
Signals:
Fixes that worked:
When upstream gets unstable, bad retry logic becomes expensive fast.
Typical triggers:
Signals:
Fast mitigation:
I stopped trying to build a huge dashboard first.
This small schema was enough to find most issues:
With this, I can usually answer:
I used to over-engineer too early. That was a mistake.
Now I follow this order:
Stop loss first
cap retries, reduce context window, disable suspicious triggers
Find top contributors
sort by anomaly contribution, fix biggest 20% first
Turn fixes into rules
dedupe, retry boundaries, session segmentation
This shifted cost management from “monthly surprise” to “same-day containment.”
I’m building around a simple operational loop:
detect → attribute → contain
Not “another pretty dashboard,” but a way to make cost control part of everyday workflow.
If you want to try the exact setup I’m using, here’s the install command for macOS/Linux:
curl -fsSL https://aikeylabs.com/zh/i/ih02 | sh
If you’re testing it, I’d love your feedback on where cost leaks show up first in your workflow (duplicate calls, context bloat, or retry storms).