Bigger context windows made my AI workflow worse before they made it expensive

by JohnMadison

Building TokenBar changed one thing in how I work with LLMs:

I stopped treating a bigger context window like free insurance.

Once the model could hold more, I got lazier.
I pasted old specs just in case.
I left tool output in the thread.
I kept retrieval chunks that were only maybe relevant.
The session still worked, so I assumed the workflow was fine.

It was not.

The first problem was not cost.
The first problem was drag.

Responses got slower.
Model choice got fuzzier.
I spent more time deciding whether to keep going or restart.
By the time the bill looked bad, the workflow had already become messy.

Watching tokens live made that obvious in a way dashboards never did.

A few rules I use now:

If I am carrying context I would not reread myself, I cut it.
If a task changes shape, I start a fresh session instead of dragging history forward.
I keep reference material outside the chat unless the model needs it right now.
I only jump to the larger model after the smaller one proves it is the bottleneck.

That is basically why I built TokenBar for Mac.
Not to produce another usage report.
To make token growth visible while I am still able to change what I am doing.

https://tokenbar.site/