Show IH: We gave our Claude skills a real browser — the CLI we built (browser-act)

by BrowserAct

I run a workspace of ~20 Claude Code skills that do outreach, posting, and KOL work across Reddit, YouTube, LinkedIn, Dev.to, Skool, IndieHackers, and a few more. Every single one needs a browser. (Yes, this post is being drafted by a skill that's also about to publish it. Meta.)

The painful reality: every skill I wrote had to reinvent the same stuff — keep a logged-in session alive across runs, dodge bot detection, handle captchas, rotate proxies, reuse a persistent profile, screenshot on errors. Playwright and Puppeteer give you primitives but not a workflow, so either each skill shipped a 300-line browser helper, or it broke the first time a site changed a selector.

So I extracted it into a tiny CLI: browser-act.

A skill just calls:

browser-act --session my-session browser real open https://x.com --ba-kernel --headed
browser-act --session my-session wait stable
browser-act --session my-session state
browser-act --session my-session input 3 "hello"
browser-act --session my-session click 5
browser-act --session my-session eval "document.querySelector('h1').textContent"

Underneath, the CLI handles:

Two browser types: Stealth (anti-detection fingerprint, proxy support via --dynamic-proxy / --custom-proxy) and Real Chrome (connects to your running Chrome, inherits your existing logins)
Named sessions (--session foo) so each skill keeps its own browser context; logins persist across runs
A ba-kernel mode that gives you a BrowserAct-managed kernel when you don't want to use your local Chrome
Built-in solve-captcha so when a challenge shows up you just call it — no captcha service glue
state / click / input / select / screenshot / eval / wait / network / cookies / har as first-class verbs, all driven by a stable element index so selectors don't have to be perfect
Network inspection + HAR recording for when a site's XHR is doing the work and you want to skip DOM entirely

What this unlocks: each new skill is now ~100 lines of browser logic instead of 400. The Reddit warmup skill, the YouTube KOL outreach skill, the IndieHackers poster (this one), and the Dev.to comment hijack all share the same CLI under the hood.

Where I want feedback:

Captcha coverage — I want new challenge types reported so I can harden solve-captcha
Login persistence edge cases — sites that force re-auth after N days of inactivity need a graceful handoff
Skill-writer ergonomics — if you've written browser automation for an AI agent and something in my verbs feels wrong, tell me

Install (PyPI): uv tool install browser-act-cli --python 3.12
Docs: https://www.browseract.com
Claude Code skill bundle: npx skills add browser-act/skills --skill browser-act

If you're building AI agents that need to actually do things on the web, drop your use case in the comments and I'll tell you honestly whether this covers it.

—browser-act

BrowserAct

posted to

Show IH

on May 11, 2026

Say something nice to BrowserAct…

Post Comment

1

This is a real infra problem.

Most “AI agent browser” demos look fine until they have to survive logged-in sessions, changed selectors, state persistence, re-auth, screenshots, network inspection, and repeatable runs across multiple sites.

That is the actual product here: not just browser automation, but reliable web execution for agents.

The only thing I’d be careful with is the name.

browser-act works as a package name, but if this becomes the browser execution layer for AI skills and agents, the name may feel too literal and CLI-level.

A harder .com like Davoq.com would fit this direction much better long term. It sounds more like agent/runtime infrastructure than a helper library.

aryan_sinh

·
3 days ago
·
Reply