1
1 Comment

How to client-side scrape JavaScript page (aka how is Heap’s Event Visualizer doing this)?

I’m interested in how it’s possible to display another webpage inside a React webpage and highlight parts of that interior page (eg an overlay of certain divs or react on mouseover etc). As an example, here is a video of the Heap.io Event Visualizer, where this exact behaviour is happening: https://heap.io/blog/company/the-event-visualizer.

I don’t think I can use an iframe, and as I want the main page to be a regular webpage, I don’t want to do something like use browser extensions or Puppeteer which downloads a Chromium browser (~300MB).

What options do I have? In particular, which options might work for pages which are authenticated? All pages on the website to render inside a React component can be assumed to be a mixture of HTML & JS (ie none are pure HTML only).

  1. 1

    Without looking into it deeply, I'd guess Heap is able to do that because the site owner embedded Heap's JS onto their site already. So Heap's JS can modify the page arbitrarily. Without cooperation from the site owner, you'd have to do something like use a browser extension, Puppeteer, or maybe ScrapingBee.

Trending on Indie Hackers
Getting first 908 Paid Signups by Spending $353 ONLY. 24 comments I talked to 8 SaaS founders, these are the most common SaaS tools they use 20 comments What are your cold outreach conversion rates? Top 3 Metrics And Benchmarks To Track 19 comments How I Sourced 60% of Customers From Linkedin, Organically 12 comments Hero Section Copywriting Framework that Converts 3x 12 comments Promptzone - first-of-its-kind social media platform dedicated to all things AI. 8 comments