2
2 Comments

Show IH: I crawled 46 million websites to build a B2B lead tool - here's what I learned

90 days ago I decided to build something I personally needed as a growth person: a way to find companies by the software they use.

The idea is simple. If you're selling to e-commerce brands, you want a list of stores using Shopify. If you're an agency that migrates WordPress sites, you want WordPress sites. Today that kind of data is either locked behind expensive enterprise tools or doesn't exist at all.

So I built TechSpy.

What it does

You search by technology, get a list of real companies using it, filter, export.

That's it. No fluff.

The build

I'm a solo founder. My "co-founder" was Claude - an AI coding agent. I used it for ~90% of the code. My job was architecture decisions, product sense, and saying no to features.

The hardest technical problem: scale. To make the product useful, I needed a lot of data. So I built a distributed web crawler:

  • 6 crawler workers across 2 servers (EU + US)
  • 2 Playwright workers for JS-heavy sites
  • PostgreSQL with ~46M domain records
  • Custom tech fingerprinting (1,300+ technologies detected)

The crawler runs 24/7 and adds ~130K enriched sites per week.

Where I am now

  • 46M domains indexed
  • 557,000+ active sites with technology data
  • 1,300+ technologies fingerprinted
  • Payments live (Stripe), free tier available
  • First paying users

What I'm figuring out

Distribution is the hard part. The product works - I've validated it myself. Now I need to get it in front of SDRs and growth teams who would actually pay for it.

The channel I'm testing right now: industry research reports. Since the crawler collects real tech stack data, I can publish things like "we scanned the top 50 US furniture brands - here's what ecommerce platforms, email tools, and ad tech they actually use." Just published the first one. No idea yet if this is the right bet.

The plan is to do more of these across verticals - fashion, beauty, health, retail. But before going further I want to figure out the distribution side, because writing research no one reads is just expensive busywork.

Two questions for IH:

  1. How do you choose what to research and write about? Is it audience-first (find the community, then pick a topic they care about), or topic-first (find something interesting in the data, then find who'd care)?

  2. If you've used research/data reports for distribution - what actually moved the needle? Trade press outreach? Posting in industry communities? Reaching out directly to the brands you mentioned in the report?

techspy.pro

on May 20, 2026
  1. 1

    This is a strong B2B wedge because the product already has something most lead tools struggle to prove: proprietary data. Crawling 46M sites, fingerprinting 1,300+ technologies, and turning that into searchable buyer lists is not just “lead gen.” It is sales intelligence built around actual tech adoption signals.

    The distribution angle should probably lean harder into that. Reports are useful, but the real buyer trigger is not “interesting research.” It is “show me companies using X stack right now so I can sell to them before everyone else does.” SDRs, agencies, and growth teams care about timing, stack change, and buying context more than broad market reports.

    The one thing I would not ignore is the name. TechSpy explains the feature, but it also makes the product feel smaller and more scrappy than the data layer underneath. If this becomes a serious B2B intelligence product, the name needs to feel more enterprise-grade and less like a lightweight scraping tool. Beryxa.com would fit that direction much better because it sounds more like a durable data/intelligence platform, not just a tech lookup utility.

  2. 1

    I think your biggest challenge now is not the crawler or the data quality. It’s making the product feel tied to revenue instead of “interesting data.”

    Most sales teams won’t buy because you have 46M domains indexed.
    They’ll buy if they can say:
    “this helped us find better leads.”

    So I’d focus less on broad reports and more on highly actionable stuff.

    For example:
    “companies that recently switched from Magento to Shopify”
    is way more valuable than:
    “top ecommerce stacks in fashion.”

    One is curiosity content.
    The other is pipeline.

    I’d also narrow the positioning more.
    Right now it still sounds like a database product.
    But the stronger angle is probably:
    “find companies likely to buy based on infrastructure changes.”

    That’s a much easier sell to agencies and outbound teams.

    Also, I’d test integrations early.
    If this sits inside workflows people already use, it becomes much stickier than trying to get people to regularly visit another dashboard.

Trending on Indie Hackers
30 days ago I posted here with $0 revenue. Here's what actually happened next. User Avatar 148 comments I used $30,983 of AI tokens last month in Claude code on $200/mo plan User Avatar 91 comments How to spot high-intent customers in 5 minutes, for free. User Avatar 44 comments Fixing broken scrapers instead of working on my actual product. So I made it my problem. User Avatar 39 comments I Built a Habit Tracker SaaS Alone in 6 Weeks (No CS Degree, No Team). Here's Exactly How User Avatar 39 comments I built an open-source PII masking layer for LLM APIs — early traction, looking for design partners User Avatar 28 comments