I got tired of the puppeteer-extra + stealth + 2captcha setup. Built a CLI with indexed elements, 3 browser modes, and built-in solve.

Every AI agent I've built that touches the web ends up reinventing the same stack:

puppeteer-core or playwright for automation
puppeteer-extra-plugin-stealth for bot detection
2captcha / anti-captcha for CAPTCHAs
Your own cookie jar for session persistence
A manual CSS/XPath selector layer, because the agent can't reliably generate selectors
When the stealth plugin breaks: fallback to real Chrome via CDP

A day of plumbing before the agent logic even starts. The real problem isn't any single plugin — it's that every project rebuilds the same glue.

So I built BrowserAct, a single CLI that collapses all of it.

What's actually in it (everything below is shipped, no roadmap)

1. Indexed interactive elements — the feature I missed most

browser-act --session agent state
# Returns URL, title, and numbered interactive elements:
# 0: <a href=...>Sign in</a>
# 1: <input placeholder="email">
# 2: <button>Continue</button>
browser-act --session agent input 1 "[email protected]"
browser-act --session agent click 2

The agent doesn't generate CSS selectors. It reads an indexed list and picks a number. This alone killed most of the brittle-selector problems I had with raw Puppeteer.

2. Three browser modes, one CLI

browser open <browser_id> — managed stealth browser with proxy built in (--dynamic-proxy with region selection, or --custom-proxy)
browser real open <url> — your logged-in Chrome via CDP auto-discovery
browser real open <url> --ba-kernel — bundled Chromium (no host-Chrome dependency)

Switch modes with a flag, not a different library.

3. Stealth and CAPTCHA built in

browser-act stealth-extract <url>              # one-shot anti-detection extract
browser-act solve-captcha                       # auto-solve on current page
browser-act human-assist-url --objective "..."  # zlink URL for human-in-the-loop

No separate 2captcha API key, no stealth-plugin version drift.

4. LLM-friendly page data

browser-act --session agent get markdown       # page as markdown — feed straight to the model
browser-act --session agent network requests --type xhr --status 200
browser-act --session agent network har start

get markdown is what I needed to stop shipping 500KB HTML blobs into prompts.

5. Session persistence across CLI calls

browser-act --session agent1 navigate https://site.com/login
# ... login flow ...
browser-act --session agent1 cookies export cookies.json
# Later, same session still logged in:
browser-act --session agent1 navigate https://site.com/dashboard

On the agent side

My AI agents are in Python, Go, and sometimes Rust. A CLI means the same five verbs (state / click / input / get markdown / eval) work from any language — no bindings, no version alignment.

Where I've used it in production

Three agents running concurrently via --session a1/a2/a3
solve-captcha cleared Cloudflare Turnstile and hCaptcha on a site I was scraping daily
get markdown + network requests cut my LLM token bill roughly in half on research agents

Looking for

Sites where your current stealth setup keeps failing — I want to test against them
Feedback from anyone else who's been stuck rebuilding the same Puppeteer + stealth + captcha triangle
Contributors — source is open

GitHub: https://github.com/browser-act/skills/tree/main/browser-act

Curious how others have solved this. Still on puppeteer-extra? Switched to Playwright? Wrote your own wrapper? Would love to compare notes.