I built a tool to check if AI agents can actually use a website. Ran it on 23 well-known SaaS sites, the average score was 35.7/100

by pystar

Quick context: I'm building Hunch, which turns websites into something AI agents can actually call and act on. Before pitching anyone on that, I wanted to know how bad the actual problem is — so I built a scanner and pointed it at 23 SaaS products people here probably use daily, to see if an AI agent landing on the homepage could find pricing, book something, or get support without a human translating the page first.
Average score: 35.7/100.
Only 4 sites cleared the bar I'd call "ready" — Beehiiv, Plausible, Raycast, Lemon Squeezy. Three landed in "not ready," and one of them was Stripe, at 4/100.
That one surprised me. Stripe's checkout is genuinely good. But the homepage has 2,153 links, 227 buttons, and exactly one form — labeled at 5%. A human glances at a field and infers what it wants from placeholder text and layout. An agent reading raw HTML doesn't get any of that for free. It just sees a blank input with no name attribute.
The pattern repeated across almost every site I scanned, not just the bad ones:

Forms exist, but fields aren't labeled, so there's nothing for an agent to map input to.
Buttons say "Learn More" or "Submit" instead of naming the actual action — average button clarity across all 23 sites was 36.8%.
Only 5 of 23 sites declare their workflows in any structured, machine-readable way at all.

None of this is a design problem. These are all perfectly usable, often well-designed sites — for humans. The gap only shows up the moment something other than a person tries to read the page.
How it's scored, for anyone who wants to poke holes (please do): up to 7 pages per site, real browser render where possible, falls back to raw server HTML otherwise — which is what most agents actually get when they fetch a page. Points come from page coverage, form/button quality, structured action metadata, and whether high-intent surfaces (pricing, booking, checkout) exist at all, with penalties stacking for things like sub-50% form labeling.
Caveat I'll volunteer before anyone else does: it's a heuristic scanner, not a manual audit, and it can miscount. On one site, part of what it flagged as "forms" was actually a cookie consent banner. Treat the scores as directional, not gospel.
Made the scanner free to run on your own site, no signup — partly because I think the finding is more interesting than the pitch, and partly because yeah, it's also how people find Hunch. Not going to pretend otherwise. https://hunchbank.com/agent-readiness-audit
Curious if anyone's run into this from the other side — building or selling an agent that's tried to use a site and just... couldn't.

pystar

on June 28, 2026

Say something nice to pystar…

Post Comment

1

Stripe at 4/100 tracks with what we see running agents on real tasks: the page is fine, what's missing is the implicit contract a human reads off the layout. The "understand the page" vs "complete the action" split bit us hardest, since an agent that parses everything stalls the moment one unlabeled field gates the flow. Did the 4 that passed share a structured-action pattern, or cleaner HTML?

theuniverseson

·
7 hours ago
·
Reply
1

That’s really interesting!

Shakermover2004

·
8 hours ago
·
Reply
1

We see the same thing building agents on the Cloudify side: the wall is almost never the checkout, it's unlabeled forms and auth gates that a human infers and an agent can't. The hard part for Hunch isn't the score, it's that no site owner rewrites HTML for a sliver of agent traffic until there's revenue attached to it. So I'd lead with the side that already feels the pain (the agent builders), or sell "be the default answer when an agent asks for X in your category" to the one company that wants to own that category, and the runtime layer sells itself from there.

GregoryScottHenson

·
8 hours ago
·
Reply
1

35.7/100 is rough. What's the biggest blocker — poor semantic HTML, missing labels, or something else? This feels like a wake-up call for the accessibility + AI usability intersection that not enough teams are paying attention to.

Sandy_0517

·
15 hours ago
·
Reply
1

The Stripe example is the part that makes this click for me. A site can be excellent for humans and still pretty hostile to agents because so much meaning lives in layout, labels, and visual context. I’d be curious to see the scores split by “can understand the page” vs “can actually complete the action.” Those feel like two different failure modes.

Falaq_ai

·
a day ago
·
Reply
1

This is a useful angle. I’ve been noticing something similar with coding agents: success depends less on the model alone and more on whether the environment gives it clean state.

For coding, the equivalent problem is context hygiene. If the agent inherits stale assumptions, old failed attempts, or unclear next steps, it can look “dumb” even when the model is capable.

Curious: in your tests, did failures mostly come from UI/navigation issues, or from the agent not understanding what state/action it was supposed to be in?

roshandxt

·
a day ago
·
Reply
1

I've run into this from the revenue-path side. The useful split for me is: can the agent understand the page, and can it complete a high-intent path without ambiguity? Pricing, checkout, support, and booking fail for different reasons, so a single readiness score is good for discovery, but the fix list needs to map each issue to a business action. Your caveat about directionality is the right framing.

miz27

·
a day ago
·
Reply
1

Worth splitting two problems that get blended here. Yours is operability: can the agent reach the page and act. Upstream of that is whether the AI recommending sites even surfaces you. A perfectly structured page still won't get pulled into a Perplexity or ChatGPT answer if it isn't ranking high enough to be in the pool they quote from. Different failure, different fix.

On the bot-detection point above, that's the bigger killer than markup in my experience. Geo and bot gating means an agent often eats a 403 before it ever sees the DOM, so a scanner reads the page fine while a real run dies at the door.

sablekithq

·
a day ago
·
Reply
1

Human-optimized vs. agent-readable" is a great way to frame this. We spend so much time making sites look beautiful for people that we've completely ignored how machines parse them. Awesome insight and a great wake-up call for SaaS devs.

DineshRegar

·
a day ago
·
Reply
1

35.7 is rough but not surprising. Most sites are built for human eyes, not for an agent that needs structure and stable selectors. Was the biggest failure mode auth and login walls, or unreadable DOM/semantics? My guess is login and bot-detection kill more agent runs than bad markup, but curious what your data actually showed. Nice benchmark idea.

alex_thryvate

·
a day ago
·
Reply
1

This is a really interesting way to frame the problem — most sites are “human-optimized,” not “agent-readable,” and people don’t notice the gap until you test it like this.

The Stripe example actually makes sense. Great UX for humans often means implicit context (layout, placeholders, visual hierarchy), which completely breaks for agents reading raw structure.

The button labeling point is also underrated. “Learn more” works visually, but for an agent it’s basically meaningless without context.

Feels similar to early SEO days — sites looked fine, but weren’t structured for machines. Now we might be heading toward “agent SEO” or whatever this becomes.

Curious how much of this can realistically be fixed with better HTML/semantics vs needing a whole new layer (like your approach).

quill_ai

·
a day ago
·
Reply
1

I built a tool to check if AI agents can actually use a website...

OctopusX

·
a day ago
·
Reply
1

The Stripe result is interesting, but the bigger takeaway is that most sites were built assuming a human is the user. If AI agents become a real traffic source, "agent-readable" could end up being as fundamental as mobile-friendly or SEO. Feels like we're still treating it as a nice-to-have instead of a new interface.

aryan_sinh

·
3 days ago
·
Reply
1

This is a fascinating view into the agent readability gap. The 35.7/100 average tells the story - sites optimize for human perception (visual hierarchy, implied context) and ignore the structured signals agents need. The 4 "ready" sites (Beehiiv, Plausibl, Raycast, Lemon Squeezy) are probably worth studying as benchmarks. Did you find any common patterns in their implementations - like universal form labeling, explicit action semantics, or machine-readable metadata adoption? Curious if site redesigns to improve agent readability end up improving human UX too.

galdayan

·
3 days ago
·
Reply