We're the team behind BrowserAct. We just released browser-act CLI — a command-line tool that handles stealth browsing, captcha solving, and clean data extraction out of the box.
We built it because we kept hitting the same walls with Playwright and Puppeteer: sites blocking headless browsers, selectors breaking after layout changes, and raw HTML burning too many tokens when piped into LLMs.
Two browser modes
Stealth browser — anti-detection fingerprinting, proxy support, persistent login sessions. For sites that block automation:
browser-act browser create "my-scraper" --proxy socks5://host:port
browser-act browser open <browser_id> https://target-site.com
Real Chrome — connects to your running Chrome via CDP, reuses your existing logins. No setup needed:
browser-act browser real open https://dashboard.example.com
Captcha solving
Cloudflare Turnstile, reCAPTCHA — one command:
browser-act solve-captcha
This is what headless scripts hit on protected sites. browser-act handles it natively.
Clean data extraction
Pages come back as markdown instead of raw HTML. One product page we tested: 854,044 characters of HTML vs 22,815 of markdown — 37x smaller.
browser-act get markdown # structured text
browser-act get text 5 # text of a specific element
browser-act get value 3 # value of an input field
If you're feeding pages to an LLM, this alone changes your cost math.
How it works in practice
The core loop: open → inspect → interact → verify.
browser-act browser open <id> https://example.com/login
browser-act state
# → [3] input "Email", [4] input "Password", [5] button "Sign In"
browser-act input 3 "[email protected]"
browser-act input 4 "password123"
browser-act click 5
browser-act wait stable
browser-act state # re-inspect after page change
Everything runs from the terminal. Chain with &&, pipe into scripts, schedule with cron.
Install:
uv tool install browser-act-cli --python 3.12
Source and docs: github.com/browser-act/skills
We'd love feedback — what browser automation tasks eat up your time? What would you want a tool like this to handle?
"Reducing HTML to Markdown by 37x is a complete game-changer for anyone piped into LLMs—the token savings alone make BrowserAct an easy sell. Native stealth and captcha solving out of the box finally makes headless automation feel reliable again.
Since you've built such a high-leverage tool, you should enter it into this competition--“Prize pool just opened at $0. Your odds are genuinely the best they'll ever be.
$19 entry. Winner gets a real trip to Tokyo — flights and hotel booked by us.
Round 01 closes at 100 entries. tokyolore.com”
This looks pretty solid — especially the markdown extraction part, that’s a real practical improvement for LLM workflows.
One thing that stood out though — the product itself feels like infrastructure-level tooling, but the “browser-act” naming comes across more like a utility or repo rather than something you’d build a company around.
For tools solving blocking/captcha/automation at this level, perception plays a bigger role once people start relying on it in production.
Curious if you’re thinking of this as a long-term product/company, or more of a dev tool for now?