I kept assuming the expensive part of an AI workflow was the hard prompt.
It usually wasn't.
The bigger leak was what happened after I got unstuck:
That last 20 percent of the task was often burning tokens like the first 80 percent.
What changed for me was watching token usage live instead of checking a dashboard later.
A few habits came out of that:
Switch down once the hard reasoning step is over
If the model has already found the bug or the plan, I do not need frontier-model pricing for renaming, formatting, or wrap-up work.
Restart when the chat becomes a scratchpad
Once a session turns into pasted logs plus abandoned branches, it stops being context and starts being luggage.
Track tokens per finished task, not per day
A daily total does not tell me much. A single bugfix that costs as much as a feature writeup does.
Treat context windows like budget, not capacity
Just because the model can hold more does not mean the extra context is helping.
I built TokenBar for this because I wanted the number visible while I work, right in the macOS menu bar.
Curious how other people handle this:
Do you explicitly switch models mid-task, or do you mostly stay in one lane until the session ends?