I shipped a Claude Code workflow last month that was supposed to log into a SaaS dashboard, scrape a table, and email the result. Stage one of testing, it ran beautifully. Stage two — same task, longer dashboard — it died at "context limit reached" forty turns in.
I checked the usage dashboard. 180,000 input tokens. For a workflow that, written by hand, would be a 30-line Playwright script.
That's when I started auditing where the tokens were actually going. What I found surprised me, and I think it'll surprise you too if you're using Playwright MCP or Chrome DevTools MCP with Claude Code, Cursor, Codex, or any of the agentic IDEs.
The thing nobody tells you about browser MCP
A typical MCP server — say a database adapter or a Git wrapper — returns small JSON results. A few hundred tokens per tool call. The agent reads the result, reasons about it, and moves on.
A browser MCP server returns screenshots. Or accessibility trees. Or DOM dumps. The "tool result" for a single browser action can be 1,600-2,400 input tokens just for the image, before the model has done any reasoning.
That's already 5-10× the cost of a regular MCP server. But the part that breaks the math is what happens next: every screenshot stays in conversation history forever. Turn 1's screenshot is still in context on turn 40, getting re-sent to the model on every single API call.
So the cost isn't 2,400 tokens per screenshot. It's 2,400 × (number of remaining turns). By turn 40 of a session with 40 screenshots, you're paying 96,000 tokens every single turn just to re-read images the model already acted on 30 turns ago.
This is why MCP token usage feels non-linear in browser tasks. Doubling task length quadruples the bill.
What I tried that didn't work
First instinct: write better prompts. "Be concise. Don't take screenshots unless necessary."
The model agreed and kept taking screenshots anyway. Of course it did — Playwright MCP exposes 25+ tools and the cheapest, idiot-proof one is browser_screenshot. The model picks it because it always works.
Second instinct: increase context window. Anthropic shipped 1M-token context for Claude — surely that solves it?
It made the symptom rarer but the dollar cost worse. A million-token context window means you pay for a million tokens, and at $3/M for input on Sonnet that's $3 per turn near the ceiling. The agent finishes the task and presents you with a $40 bill.
Third instinct: rebuild the whole thing without MCP. This worked but threw out the baby. MCP is genuinely useful when the agent is exploring — when it doesn't know the page structure and needs to react turn-by-turn.
What actually worked
Four patterns, in order of impact:
Disable unused MCP tools. Playwright MCP defaults to 25+ tools. Most of my workflows used 5. Cutting from 25 to 5 saved 4,000-9,000 tokens per turn just from schema overhead. Easy win, took 5 minutes.
Stop using screenshots as confirmation. "Did the form submit?" Don't take a screenshot. Use a selector check. 12 tokens vs 2,400. The agent doesn't need vision for this — it needs a boolean.
Move repeatable workflows out of MCP into a CLI. This was the biggest unlock. MCP for exploration, CLI for execution. Once I knew what the workflow was, I committed it to a single CLI invocation that ran headlessly and returned one consolidated result. 40 turns of MCP collapsed into 1 turn of CLI. Token bill went from 80-150K per run to 5-15K.
Compile the most-run workflows into skills. For things I ran weekly — competitor pricing scrapes, lead enrichment — I built them as skills the agent calls by name. The agent's context never sees the browser state at all. 200-2K tokens per run.
The decision tree I use now: explore with MCP, productize with CLI, scale with skills. Each step takes me from "expensive but flexible" to "cheap but specific."
The numbers from one week of real use
Same workflow, three implementations:
Running this 50×/week:
For a workflow I built in a weekend.
What I learned
Most "reduce MCP token usage" advice online is generic — be concise, use dynamic toolsets. None of it explained the specific reason browser MCP is 5-10× worse than other kinds. The reason is screenshot replay in conversation history, and once you see it, every other optimization is downstream of that one insight.
If you're hitting context limits on browser tasks with Claude Code, Cursor, or Codex, the fastest experiment is to take one repeated MCP sequence and rewrite it as a single CLI call. You'll see the bill drop the same day.
Full writeup with comparison tables, config snippets, and the math is on the BrowserAct blog if you want to dig in:
https://www.browseract.com/blog/reduce-mcp-token-usage-browser-automation
Curious how others are handling this — anyone else hitting context-window failures on browser MCP, or have a different pattern that's working for you?