I ran my tool against 50 real sites before launching. SaaS products, hosting platforms, tech publications, website builders. Stripe, Notion, Shopify, Vercel, ProductHunt, Framer. I expected most would score fine and I'd find a few interesting edge cases. 20 of them scored zero.
Why I built this
I work in development, mostly .NET. SEO has honestly been the same conversation for as long as I can remember. Then AI crawlers showed up and for the first time in years there's something genuinely new happening. Google's June 2025 core update pruned an estimated 15-20% of the index. More pages than ever are stuck in "crawled but not indexed" limbo. Site owners are worried about where their traffic went, blocking bots they shouldn't be, missing ones they don't know exist. The tooling hasn't caught up.
The specific gap that pulled me in: Google Search Console has a status called "Crawled, currently not indexed." Google visited your page, decided it wasn't worth keeping, and left. No details, no explanation. Existing tools check whether you're indexed (yes or no), but when you already know the answer is no, that tells you nothing. You need to know what's actually wrong.
So I built IsItIndexed. Paste a URL, get 13 diagnostic checks covering technical blockers, content quality, and AI bot visibility. No account, no signup.
What 50 sites taught me
That gap between indexability and AI readiness is the interesting part. Most sites are wide open to every AI crawler but failing basic Google checks.
Some standouts:
ProductHunt.com scored 0 for indexability. The homepage returned a 403, had noindex/nofollow, no canonical tag, no sitemap, and 3 words of content. A launch platform whose own homepage is effectively unlaunchable in search.
Framer.com scored 0 for indexability while shipping 2.74MB of HTML and 564 images without lazy loading. The heaviest homepage in the dataset came from a design tool.
Kit.com (ConvertKit) scored 0 for indexability, 98 for AI readiness. HTTP 403, 20 words, no sitemap. An email marketing platform that blocks its own homepage from crawlers.
Perplexity.ai scored 0 for indexability, 98 for AI readiness. HTTP 403, noindex, 3 words. An AI search company whose own pages block indexing.
blog.cloudflare.com scored 0 for indexability, 98 for AI readiness. 1,269 words but zero H1 tags and 58 images without lazy loading. A tech blog with plenty of content that still fails on structure.
Best in the whole sample: Plausible.io at 86/100 indexability, 96 AI readiness. 1,302 words, 65KB HTML, 6ms response time.
The AI visibility part
This is what I couldn't find in any existing tool. Every major AI company runs separate bots for different jobs. OpenAI has GPTBot for training data, OAI-SearchBot for search indexing, and ChatGPT-User for live page visits. Anthropic and Google have similar splits.
They're controlled separately in robots.txt, but most site owners either block everything or block nothing. There's a useful middle ground: block training, keep search and retrieval open. Your content stays out of model training, but ChatGPT and Perplexity can still cite you.
In my sample, 88% allow all AI bots with no restrictions at all. Only 3 out of 50 block training bots, and one of those (TechCrunch) also blocks retrieval by mistake. Zero sites use AI-specific meta tags like noai. The "71% accidentally block everything" stat that gets passed around came from a BuzzStream study of major news publishers, not general sites. I'd rather show what I actually measured.
AI citations send less traffic than Google right now, but the clicks tend to be more qualified. Getting this balance right is worth the five minutes it takes to check.
Where things stand
Launched earlier this week. A handful of visitors, mostly me testing. One unknown user ran a scan on what looked like a test domain. The pipeline held up. Revenue: $0.
4 checks are free. Full 13-check report is $4.99. For context, SEO agencies charge $500+ for audits covering similar ground, and the cheapest monthly tools start around $50. Built with Next.js, TypeScript, and cheerio. Rule-based checks, no LLM anywhere. Reports get a permanent URL you can bookmark, and you can export them as text or PDF.
What I'd like to know
If you've dealt with "crawled, currently not indexed" on your own site, what ended up fixing it? And if you try scanning your URL, I'd like to hear whether the report matched what you already knew or caught something you missed.
This was actually a pretty interesting read.
I didn’t expect so many well-known sites to score zero for indexability. Especially ProductHunt and Framer.
The gap between AI crawler access and Google indexability is also surprising. It feels like a lot of teams are focusing on AI visibility now but forgetting the basics of search.
Curious if you think Google will start changing how indexing works now that AI crawlers are becoming more important?
Thanks, glad it landed.
On your question, I think the indexing changes are already happening, just not framed that way. The June 2025 core update was massive. Google pruned an estimated 15-20% of the index. The official line is "quality," but the timing lines up with AI search pulling traffic away from Google. A smaller, higher-quality index is cheaper to maintain and harder for AI competitors to scrape value from.
What I'd actually watch for:
Structured data mattering more. Google's been pushing JSON-LD for years, but now it feeds their AI Overviews directly. Sites with good schema markup give Google machine-readable answers without needing to parse the page. 26 out of 50 sites in my sample had zero structured data. That's a gap that's going to cost them.
The "indexed but invisible" problem. You can be in the index and still lose traffic to AI Overviews. Being indexed used to mean you'd show up in results. Now it means Google might summarize your content and keep the click. That changes the calculus for a lot of site owners.
Google-Extended vs Googlebot. Google already split these. You can block Gemini training without affecting search indexing. Most sites haven't noticed yet. Only a handful in my sample made any distinction.
The short version: I think "indexing" as a concept is getting more layered. It used to be binary, you're in or you're out. Now there's in-but-summarized, in-but-deprioritized, crawled-but-rejected. The tools haven't caught up to those distinctions, which is basically why I built this.
What's your site? Happy to run a scan if you're curious how it scores.