10
12 Comments
Photo of Channing Allen

The web is rapidly becoming more closed in response to AI. A new report from the Data Provenance Initiative finds that a significant number of organizations feel threatened by generative AI and are taking measures to wall off their content. But this trend won't stop at limiting big companies like OpenAI, according to Shayne Longpre, the report's lead author:

“We’re seeing a rapid decline in consent to use data across the web that will have ramifications not just for A.I. companies, but for researchers, academics, and noncommercial entities.”

In other words, this affects everyone. Founders whose products rely on crawling or scraping other websites should be on notice.

Read Longre's full interview here.

  1. 4

    Yeah I noticed this ever since Twitter and Reddit starting charging exorbitant amounts of money for their API's. I totally get it, but it it's also frustrating too that a lot of the open data on the internet will just cost more money now going forwrd

    1. 1

      it's also frustrating too that a lot of the open data on the internet will just cost more money now going forward

      Totally. But usually big systemic changes like this follow patterns of creative destruction where old good things are lost but new good things are found. In this case, AI technology will probably 100x the net amount of open data on the internet by amplifying everybody's ability to produce and distribute code and content.

  2. 3

    It's been a while since I've been bullish on Twitter, but they're sitting on a particularly amazing source of data, with so many millions of short conversations happening every day. And they're investing heavily in their AI, too. Might be rosy for them going forward.

    1. 1

      they're sitting on a particularly amazing source of data, with so many millions of short conversations happening every day.

      What makes this especially potent is that not all conversations are created equal. The short convos happening on X are much higher signal than elsewhere insofar as they 1) involve a disproportionately large amount of the world's most influential people, and 2) represent the original source of news, memes, and movements which then flow downstream to other platforms and publications, like Google and news websites, which generally publish secondhand reports.

  3. 1

    Yeah I noticed this ever since Twitter and Reddit starting charging exorbitant amounts of money for their API's. I totally get it, but it it's also frustrating too that a lot of the open data on the internet will just cost more money now going forwrd.

  4. 1

    While web scraping is legal, I don't see any significant change in the trend.

Create a free account
to read this article.

Already have an account? Sign in.