I measure AI coding sessions by tokens to first useful diff

by JohnMadison

I started using a simpler AI workflow metric:

tokens to first useful diff.

Not total tokens for the day. Not which model was cheapest on paper. The useful question is: how many tokens did it take before the session produced a change I would actually keep?

When that number climbs, it usually means one of four things:

I pasted too much old context
The chat is carrying stale tool output
I am asking the model to decide something I have not decided
I should restart with a smaller, cleaner brief

The painful part is that you usually notice this after the bill, after the rate limit, or after the session gets slow.

That is why I built TokenBar. It is a small macOS menu bar token counter for AI work. The point is not to make people obsess over every token. It is to make token burn visible while there is still time to change the workflow.

A tiny rule I use now:

If tokens are climbing but the diff is not getting closer, I stop adding prompts. I either trim the context, restart the session, or switch models.

Postmortem dashboards are useful for accounting. Live token visibility is useful for behavior.

https://tokenbar.site/

Curious how other founders measure AI tool waste. Tokens per task? Cost per feature? Time saved?

JohnMadison

on May 11, 2026

Say something nice to JohnMadison…

Post Comment

1

Good Project!
I liked the home page and site design!
----
Can you upvote my project (pterocos) in indiehackers at Products DB button in Menu

Aminekhd

·
2 days ago
·
Reply