Most of my AI token waste happens after the hard part is done

by JohnMadison

I kept assuming the expensive part of an AI workflow was the hard prompt.

It usually wasn't.

The bigger leak was what happened after I got unstuck:

That last 20 percent of the task was often burning tokens like the first 80 percent.

What changed for me was watching token usage live instead of checking a dashboard later.

A few habits came out of that:

Switch down once the hard reasoning step is over
If the model has already found the bug or the plan, I do not need frontier-model pricing for renaming, formatting, or wrap-up work.
Restart when the chat becomes a scratchpad
Once a session turns into pasted logs plus abandoned branches, it stops being context and starts being luggage.
Track tokens per finished task, not per day
A daily total does not tell me much. A single bugfix that costs as much as a feature writeup does.
Treat context windows like budget, not capacity
Just because the model can hold more does not mean the extra context is helping.

I built TokenBar for this because I wanted the number visible while I work, right in the macOS menu bar.

Curious how other people handle this:
Do you explicitly switch models mid-task, or do you mostly stay in one lane until the session ends?