Developers February 20, 2021

Capturing a section of web page

torpedo77

Is there a way to capture a section of a webpage and get a live update? The idea is like scraping the visual part of it.

  1. 1

    It took me a month to build a tool to capture sections of web pages as screenshots.

    Here's the link for this tool:
    https://aiapi.me
    You can enter the url to take a screenshot directly.

    Alternatively, you can use commandline prompt to invoke the screenshot api like this:
    curl 'https://aiapi.me/api/v1/screenshot?url=indiehackers.com&clip_x=55&clip_y=50&clip_width=1200&clip_height=470'

    The above command will take a section of indiehackers' landing page.

    The clip_x, clip_y, clip_width and the clip_height define the section of interest.

  2. 1

    thanks for all the help. so far, didn't really get to a tool that does the job or at least requiring a fair bit of effort to further customize.
    another way to describe it is almost like you can "iframe" a part of other's webpage into your webpage but i know iframe is kinda' deprecated...

  3. 1

    Hey torpedo77,

    Based on the comments, you may find my article interesting:

    https://blog.hyper63.com/capturing-full-page-screenshots-with-puppeteer-and-arc-codes/

    You can look at the npm module - https://www.npmjs.com/package/puppeteer-full-page-screenshot

    The code is straight forward and would be easy to create something similar adjusted to your use case.

    Also, check out the Jimp npm module it gives you a lot of image processing tools

    https://github.com/oliver-moran/jimp#readme

    Cheers

  4. 1

    What do you mean by "get a live update"? As others have mentioned, Puppeteer can take a screenshot of a specific element. If it's just for a one-off, you can inspect the element in Chrome dev tools and right click, "Capture node screenshot".

    1. 1

      suppose i like a chart from a certain webpage, but i'd like to "copy" the chart into my webpage, can i use puppeteer to do that? and what i mean as "live update" is such that this chart gets updated visually when the original page gets updated too.

      1. 1

        No, that is not something Puppeteer can do. You are probably not going to find a practical way to do this. Aside from the technical complexity (google/search "CORS"), this isn't really an appropriate use of another site's data. If they explicitly allow such use, then they probably already have a supported way to embed content in other sites (e.g. widget/iframe).

  5. 1

    I actually had an idea for a service like this last week. Seems useful

    1. 1

      ... and probably unreliable, and may be illegal.

      1. 1

        Can you explain why you think that?

        1. 1

          Let's assume that you've worked out the screen scraping technique by reviewing the HTML output of what the target site has. I've done this myself in the past. For example, I wanted to purchase something from Best Buy that's been sold out for months, so I wrote a little screen scraper that checked the Best Buy site every 15 minutes or so and looked for the "Buy Now" button. If the button said "Out of stock", the app would move on, but if it saw "Buy Now", it would send me a text message, and I'd jump on and buy. Worked fine, until Best Buy changed the ID of the button in their code. No one told me, so my app wasn't finding the button for a good month before I realized it. And that is the crux of the problem unless you're some sort of partner of this organization with the chart or the organization knows you're screen scraping them and will alert you to any changes on their end, your scaping becomes unreliable at best or you have babysit it on a daily basis.

          The other aspect of this is the fact that sites own their content, and frown upon people or systems linking to their resources. Essentially you're using their bandwidth and features without paying for them or giving them credit, and they may definitely have a problem with that. Very similar to the gray line of jumping on your neighbor's unsecured wifi just because it's there instead of paying for your own. Shady.

  6. 1

    A screenshot of a part of the page? Sure, you can use a headless browser that loads the page, takes a screenshot and saves it somewhere.

    Puppeteer and webdriver are some tools you can use to control the headless browser on your server.

    1. 2

      thanks but how can a headless browser allow user to first select the region on the page for capturing/tracking?

      1. 2

        Something like this from the top of my mind:

        1. Client sends URL to the server
        2. Server takes a screenshot with headless browser and sends the full page screenshot back to client
        3. Client selects a part of the image using some selector ui made with Javascript
        4. Client sends coordinates of the selected area to the server
        5. Server saves the coordinates, crops the image and saves it
        6. Server starts a background job that does the screenshotting and cropping periodically based on URL and coordinates

        But since I don't know your vision, you probably need to adjust some parts when necessary :)

      2. 1

        You could take a screenshot of the whole page and then let the user select witch part of the screenshot want to save either using a visual image editor or giving your script the coordinates you want to save (e.g (0px,0px) -> (256px,256px) ) and using sharp to post process the image server side

        1. 1

          thanks, i'm actually looking for a tool or online service that does this.

          1. 1

            There are services that do the screenshot, you can add the cropping on top
            Usually it's product around testing UI/frontend...

            1. 1

              Also depends on the needs it might be more performent to recreate the visual (if it's not totally arbitrary)

              Also I think there could be a trick with IFrame and some viewport that might achieve the same this

Recommended Posts