How I extracted an account's Twitter followers without paying for Twitter's API

TL;DR I wasted hours and hours trying to be clever using Chrome extensions, ChatGPT etc, and in the end, spent a few minutes writing a script by hand.

Why do this?

I'm trying to ramp up cold outreach for https://criteria.sh and part of that is building up prospect lists.

Finding followers of large Twitter accounts who tweet about the problem your product solves is a good way to find people who are 1) in the space you're in and 2) actively trying to learn and improve their processes, practices and hopefully tooling.

Since my product is a collaborative API design platform, I'm looking for people already following API design thought leaders.

Failed approaches

I really thought this was a solved problem and part of me posting this to indie hackers is to check if I missed something obvious and made life too complicated for myself.

Chrome extensions

I already use the Snov.io extension to scrape contact data from LinkedIn. Naturally, I thought a similar thing might exist for Twitter.

The only ones I could find were paid in order to remove the limits.

Here's an example
https://chrome.google.com/webstore/detail/twitter-follower-scraper/lddmglgbjpbjboepjaccfcpbjkjalngg

I didn't want to pay for something where I didn't know if it would work, especially since Twitter changed their API policy in March.

Phantom Buster

https://phantombuster.com

This one actually looked promising, but sadly the Twitter Follow Collector "phantom" maxed out at 158 followers when it should have found over 1900.

I think I'm not the only one
https://www.reddit.com/r/phantombuster/comments/11n57yr/twitter_follower_collector_phantom_not_working/

Scraping using code written by ChatGP

So I thought all these tools that rely on the Twitter API are now getting rate limited, I'm going to have to scrape the web page itself.

I don't know how to scrape web pages so I asked ChatGPT to write me some code. Initially I was impressed as it seemed to include plausible XPath selectors for Twitter's HTML structure.

What I couldn't work out was how to get the Selenium driver to authenticate as me so it could access the page. Even iterating the prompt a few times with this requirement and modifying the code accordingly didn't seem to work.

Chrome extension scraper

There's an extension called Simplescraper which can apparently turn websites into structured JSON by automatically inspecting the elements you click and constructing appropriate selectors for them.

The infinite scroll mode didn't work with Twitter's implementing and in any case, I couldn't figure out how to extract the data I wanted cleanly - too many false positives.

What worked

From the ChatGPT code I got a sense of how to target the elements I wanted. But used the element inspector to make sure. Then I wrote some JS functions in the console to extract the handle, name and bio using native Web APIs.

This code builds a map of the follower's name and bio, keyed by their handle:

function getCellElements() {
  return [...document.getElementsByTagName("div")].filter((element) => {
    return element.getAttribute("data-testid") === "UserCell";
  });
}

function currentEntries() {
  return Object.fromEntries(
    getCellElements().map((element) => {
      const linkTexts = [...element.getElementsByTagName("a")]
        .filter(
          (linkElement) =>
            linkElement.getAttribute("class") ===
            "css-4rbku5 css-18t94o4 css-1dbjc4n r-1loqt21 r-1wbh5a2 r-dnmrzs r-1ny4l3l"
        )
        .map((linkElement) => linkElement.text);
      const divTexts = [...element.getElementsByTagName("div")]
        .filter(
          (divElement) =>
            divElement.getAttribute("class") ===
            "css-901oao r-18jsvk2 r-37j5jr r-a023e6 r-16dba41 r-rjixqe r-bcqeeo r-1h8ys4a r-1jeg54m r-qvutc0"
        )
        .map((divElement) => divElement.innerText);
      return [linkTexts[1], { name: linkTexts[0], bio: divTexts[0] }];
    })
  );
}

This only extracts what's currently loaded on the page, which is optimised for scroll performance, so it's a subset of all the data.

This code adds what's currently loaded to a global variable every second.

const allFollowers = {}

function addToAllFollowers() {
  const entries = currentEntries();
  Object.assign(allFollowers, entries);
  console.log(
    `Added ${Object.keys(entries).length} to bring total to ${
      Object.keys(allFollowers).length
    }.`
  );
}

setInterval(addToAllFollowers, 1000);

Then I just scrolled normally to the bottom and watched the logs until the number stopped going up.

Finally, this code writes to a giant CSV string, which I copied over to Google Sheets.

console.log(
  Object.entries(allFollowers)
    .map(
      ([key, { name, bio }]) =>
        `${key}\t${name.replaceAll("\t", " ")}\t${
          bio ? bio.replaceAll("\n", " ").replaceAll("\t", " ") : ""
        }`
    )
    .join("\n")
);

Conclusion

The approach that worked best was doing it the completely manual way without any outside tools.

This was very surprising to me, given the amount of hyped tech there is out there now.