Every AI agent I've built that touches the web ends up reinventing the same stack:
puppeteer-core or playwright for automationpuppeteer-extra-plugin-stealth for bot detection2captcha / anti-captcha for CAPTCHAsA day of plumbing before the agent logic even starts. The real problem isn't any single plugin — it's that every project rebuilds the same glue.
So I built BrowserAct, a single CLI that collapses all of it.
1. Indexed interactive elements — the feature I missed most
browser-act --session agent state
# Returns URL, title, and numbered interactive elements:
# 0: <a href=...>Sign in</a>
# 1: <input placeholder="email">
# 2: <button>Continue</button>
browser-act --session agent input 1 "[email protected]"
browser-act --session agent click 2
The agent doesn't generate CSS selectors. It reads an indexed list and picks a number. This alone killed most of the brittle-selector problems I had with raw Puppeteer.
2. Three browser modes, one CLI
browser open <browser_id> — managed stealth browser with proxy built in (--dynamic-proxy with region selection, or --custom-proxy)browser real open <url> — your logged-in Chrome via CDP auto-discoverybrowser real open <url> --ba-kernel — bundled Chromium (no host-Chrome dependency)Switch modes with a flag, not a different library.
3. Stealth and CAPTCHA built in
browser-act stealth-extract <url> # one-shot anti-detection extract
browser-act solve-captcha # auto-solve on current page
browser-act human-assist-url --objective "..." # zlink URL for human-in-the-loop
No separate 2captcha API key, no stealth-plugin version drift.
4. LLM-friendly page data
browser-act --session agent get markdown # page as markdown — feed straight to the model
browser-act --session agent network requests --type xhr --status 200
browser-act --session agent network har start
get markdown is what I needed to stop shipping 500KB HTML blobs into prompts.
5. Session persistence across CLI calls
browser-act --session agent1 navigate https://site.com/login
# ... login flow ...
browser-act --session agent1 cookies export cookies.json
# Later, same session still logged in:
browser-act --session agent1 navigate https://site.com/dashboard
My AI agents are in Python, Go, and sometimes Rust. A CLI means the same five verbs (state / click / input / get markdown / eval) work from any language — no bindings, no version alignment.
--session a1/a2/a3solve-captcha cleared Cloudflare Turnstile and hCaptcha on a site I was scraping dailyget markdown + network requests cut my LLM token bill roughly in half on research agentsGitHub: https://github.com/browser-act/skills/tree/main/browser-act
Curious how others have solved this. Still on puppeteer-extra? Switched to Playwright? Wrote your own wrapper? Would love to compare notes.