We built hundreds of SEO landing pages with an AI agent. Here's what Google indexed and what it ignored.

At Inithouse, a studio shipping a growing portfolio of products in parallel, we run a product called Watching Agents. It deploys AI agents that track predictions about the future: each agent watches a question, builds hypotheses, collects evidence in real time, and alerts when something shifts.

Every public agent gets its own page. That means every prediction question becomes a potential search landing page, complete with structured data, evidence sources, probability scores, and FAQPage schema. We thought this was a clean programmatic SEO play. We were partly right.

Here's what we learned after several months of watching Google's crawler interact with hundreds of these pages.

The setup

Each public agent page on watchingagents.com follows the same template:

/agent/{slug}
  - H1: the prediction question
  - Probability score (updated by the agent)
  - Evidence panel: 3-8 sources with dates, snippets, relevance
  - Trend indicator (rising/falling/stable)
  - FAQPage schema (3-5 auto-generated Q&A pairs)
  - Internal links to related agents in the same topic cluster

We organized agents into topical clusters: AI and automation, climate and energy, geopolitics, crypto, health tech. Within each cluster, agents link to each other, creating a tight internal linking graph. The idea was to build topical authority signal one cluster at a time.

What indexed fast

Pages with specific entities and dates in the question performed dramatically better in Google's crawl queue. A question like "Will NVIDIA hit $200 by Q3 2026?" got discovered and indexed within days. Same for "Will the EU AI Act enforcement trigger mass compliance spending in 2025?"

The pattern: named entities (companies, legislation, people) plus a time bound gave Google enough signal that this page contained something fresh and specific.

Pages in clusters with five or more interlinked agents also indexed faster than isolated pages. The topical cluster approach worked as expected here; Google treated the cluster as a coherent section of the site rather than a pile of orphan URLs.

What Google ignored

Generic, entity-free questions got crawled but sat in "discovered, currently not indexed" limbo for weeks. Things like "Will AI replace most jobs?" or "Is remote work going to stay?" These pages had the same template, the same schema, the same internal linking. But from Google's perspective, they added nothing that thousands of existing pages didn't already cover.

We also hit a painful technical issue. Our SPA (built on Lovable/React) had a meta tag rendering bug: some pages served "Loading..." as the title tag to Googlebot instead of the actual question. The crawler saw the page, read the title, and concluded there was nothing worth indexing. Indexation rate for affected pages dropped to single digits.

This wasn't a content problem. It was an infrastructure problem wearing a content mask. We spent weeks reviewing question quality and rewriting FAQPage schema before we realized the crawler was literally seeing a blank page title.

# What Googlebot saw on affected pages:
<title>Loading...</title>
<meta name="description" content="">

# What users saw after JS hydration:
<title>Will NVIDIA hit $200 by Q3 2026? | Watching Agents</title>
<meta name="description" content="AI-tracked prediction with live evidence...">

That SPA rendering gap cost us months of potential indexation. We saw the same pattern across other products in our portfolio; at Be Recommended, our AI visibility reporting tool, a similar SPA meta issue left dozens of pages in "crawled, not indexed" state despite solid content.

The decision framework we use now

After this experience, we evaluate every new agent page against a simple checklist before marking it public:

Named entity? The question must reference a specific company, person, technology, or regulation. Generic "will X happen" without entities stays draft-only.
Time bound? Open-ended predictions without a date don't index well. We add explicit timeframes.
Cluster depth: We don't publish a page unless its topic cluster already has three or more indexed pages. Orphan pages sit in queue.
Title renders server-side? After the SPA bug, we added a pre-publish check that curls the page with a Googlebot user-agent string and verifies the title tag contains the actual question. If it returns "Loading..." or empty, the page doesn't go public.

# Pre-publish indexability check
curl -s -A "Mozilla/5.0 (compatible; Googlebot/2.1)" \
  "https://watchingagents.com/agent/$SLUG" \
  | grep -oP '<title>\K[^<]+' \
  | grep -v "Loading"

Cross-portfolio learning

The indexation struggle at Watching Agents taught us something we've applied across the entire Inithouse portfolio. Programmatic SEO works, but only when two conditions hold simultaneously: the content must be genuinely unique (not template-with-rotating-noun), and the technical delivery must be bulletproof for server-side rendering.

We measured a stark contrast. At Be Recommended, pages that passed the SSR check indexed at a much higher rate than those that didn't, holding content quality constant. The rendering layer matters as much as the content layer. Maybe more, because bad rendering is invisible to the team reviewing drafts in a browser (where JS hydration makes everything look fine).

This also changed how we think about scale. Our original plan was to push toward 500+ agent pages fast. We pulled back. Fewer pages that actually index beat a large sitemap that Google ignores. We now batch new agents in clusters of ten, wait for indexation confirmation, then proceed.

What we'd do differently

If we started over:

SSR from day one. We'd either use a framework with built-in SSR or implement a prerendering layer before launching any public pages. The meta tag bug cost us more than any content decision.

Start with entity-dense clusters. Instead of spreading across AI, climate, crypto, and geopolitics simultaneously, we'd pick one cluster with high entity density (named companies, dated events) and go deep. Build authority signal in one vertical before expanding.

Instrument crawl behavior early. We added GSC monitoring and server log analysis too late. By the time we noticed the "Loading..." title problem, it had been happening for weeks. A simple daily cron checking <title> tags via curl would have caught it on day one.

Fewer pages, tighter quality gate. Programmatic SEO tempts you to generate volume. The pages that performed were the ones where the AI agent had genuinely differentiated evidence, not just a rephrased question and three generic news links.

Where we are now

We're still running Watching Agents and still publishing new agent pages, but at a slower, more deliberate pace. Each page goes through the pre-publish checklist. Clusters are growing organically rather than being force-filled.

The bet on public agents as a search distribution channel is intact. Inithouse, a studio running parallel product experiments, treats each product as a learning lab. What we learned about SPA rendering and programmatic SEO at Watching Agents has already saved us time at Be Recommended and across the portfolio.

Programmatic SEO with AI-generated pages still works. But "generate 500 pages and hope Google indexes them" does not. The pages that work are the ones built with the same care you'd give a hand-written blog post, just produced at a pace that only an AI agent can sustain.

If you're running your own programmatic SEO experiments, curious what your indexation rates look like. Are you seeing the same entity-density pattern?

watchingagents.com

Say something nice to jakubinit…

1

Interesting write-up.

One thing I'd be careful with:

The technical issue was real, but I'd be hesitant to assume the indexing lesson is primarily a rendering lesson.

Sometimes fixing the obvious bottleneck creates a lot of confidence around a conclusion that hasn't actually been tested yet.

That's one of those decisions that can quietly shape the entire SEO strategy from here.

aryan_sinh

·
2 days ago
·