Building TokenBar changed one thing in how I work with LLMs:
I stopped treating a bigger context window like free insurance.
Once the model could hold more, I got lazier.
I pasted old specs just in case.
I left tool output in the thread.
I kept retrieval chunks that were only maybe relevant.
The session still worked, so I assumed the workflow was fine.
It was not.
The first problem was not cost.
The first problem was drag.
Responses got slower.
Model choice got fuzzier.
I spent more time deciding whether to keep going or restart.
By the time the bill looked bad, the workflow had already become messy.
Watching tokens live made that obvious in a way dashboards never did.
A few rules I use now:
That is basically why I built TokenBar for Mac.
Not to produce another usage report.
To make token growth visible while I am still able to change what I am doing.