We almost shipped a Claude skill last week that would have lost us money on every single run.
Quick context: we build BrowserAct — browser automation for AI agents. When Anthropic shipped Agent Skills, we did what every B2B team does — built a skill, smoke-tested it, queued it for release.
Then one of us asked the annoying question: "Does this actually save tokens, or does it just feel more structured?"
So we ran the same Google Trends extraction twice. Same Claude Opus 4.7. Same browser tool. Same prompt.
No skill: 11 minutes, $3.15 in tokens, and a hallucinated answer.
With a real browser skill: 7 minutes, $1.20, actual Google Trends data.
The no-skill run didn't fail at the beginning. It failed late, after spending a lot of tokens trying to reason around live-web uncertainty. It got redirected, lost state, and eventually invented a synthetic answer from npm trends and GitHub stars.
The skill run worked because it did not just describe the task. It constrained execution.
That distinction changed how we think about "skills."
A lot of public Claude skill examples are basically markdown prompts with a nice folder structure. Useful, sure. But not a production skill for browser automation.
A real skill needs four things:
Most examples have #1 and maybe #4. The expensive parts are #2 and #3.
That's where token spend hides. Without execution constraints, the agent keeps exploring. It retries the wrong layer. It summarizes intermediate pages. It asks the model to reason around missing browser state instead of using the browser correctly.
We wrote up the full breakdown here, including the exact anatomy, cost comparison, and a 5-minute skill build:
→ https://www.browseract.com/blog/what-are-claude-skills-browser-automation
Curious how other founders are thinking about this: are you treating "skills" as prompt packaging, or as execution infrastructure?