While building TokenBar, I noticed something annoying about my own dev workflow.
I kept treating token usage like a pricing problem.
But a lot of my spend was really a retry problem.
A chat would get bloated. The task would get fuzzier. The model would miss once, then I would clarify, rerun, paste in more logs, switch models, and try again.
The original prompt was not that expensive.
The loop was.
That changed how I think about context windows.
Bigger context windows are useful, but they also make it easier to keep a mediocre session alive for too long. You can keep stuffing in code, tool output, and old decisions instead of resetting the conversation and stating the task cleanly.
What helped me was seeing token usage live while I worked.
Not because I obsess over every cent.
Because the meter became a workflow signal.
If token usage jumped fast while progress stayed flat, it usually meant one of three things:
That is the behavior change I wanted, and it is a big part of why I built TokenBar for macOS.
Live token visibility has been more useful for debugging my workflow than any after-the-fact spend report.