Company

August 8, 2024

60,000 AI-generated news articles are published every day

Bradley Emi

CTO Pangram Labs

Pangram Labs found 6.96% of 857,434 news articles from July 1, 2024, were AI-generated, underscoring the need for better detection to prevent fraud and disinformation.

At NewsCatcher, we understand the importance of maintaining the integrity of news in a time of increased inauthentic content. We were thrilled when Pangram Labs reached out proposing a collaboration to shed light on the prevalence of AI-generated content and its impact on the news industry.

‍

Read about their experiment setup using NewsCatcher and their findings below.

Overview

News is a $150BN industry employing thousands of reporters and journalists to write news articles receiving billions of views. With AI and the rise of large language models, many lower-quality news sites, and some bad actors, have leaned on AI to generate content cheaply, quickly, and at scale. Because AI cannot fill a journalist's role, these news sites are limited to repeating info from their training or stealing and rephrasing other outlets' articles.

Inauthentic content has also been proven to be less desirable and less visited by online viewers. From a recent blog post, we cited research conducted by NP Digital which firmly found that online readers preferred and prioritized human generated articles. Specifically:

Readers spent 93% more time on page with human-written content than on purely AI-generated content.
Readers were 3.6x more likely on average to visit human articles than those that were AI generated.

These AI publications exist mainly to siphon traffic and potential ad revenue away from authentic news content, and serve as part of a growing content farming operation that captured 21% of ad impressions and more than $10BN last year in 2023.

Knowing the threat and potential damages incurred by this rise of inauthentic news, we wanted to quantify the actual scale of this problem. We collaborated with NewsCatcher to classify a given day’s sample of globally published news.

Experiment Setup

We began by first compiling a collection of all the news in the world published on July 1, 2024.

NewsCatcher’s API is the most exhaustive source of daily published, global news articles, with over 75,000 sources and serving large enterprise organizations. Their technology allowed us to query the full text of articles published from around the world - written in different languages and covering a broad range of topics.

Using NewsCatcher, we collected all the news published on one day; from this data dump, we analyzed 857,434 articles collected from 26,675 online publishers, which we'll assume as a representative set of the daily news published.

Detection Approach

After sourcing the articles, we ran our Pangram Text classifier to determine which articles were AI generated. Pangram Text is the industry-leader in classification accuracy (over 30x more accurate than the next leading commercial solution), with a strong commitment to low false positive incidence. In our technical report, we show that our false positive rate on news is only 0.001%, which allows us to be confident when we predict news as AI that it truly is. Our solution typically takes in a document or a piece of text, and returns a prediction of the likelihood that it was generated by an LLM. For a web page, we would have to do some post-processing and cleaning of the page’s content to isolate just the article text, but using the NewsCatcher solution we were able to pull the cleaned text directly and run inference with our text classifier.

Distribution of our predictions on a log scale. We use a log scale to show that predictions near 0 or 1 are 100-1000x more common than predictions in the middle of the spectrum.

We then categorized the publishers as an aggregate of each of their total articles and bucketed them by a breakdown of their total AI content. The bucketing framework is as follows:

If a publisher had less than 10% of their articles labeled as AI, that publisher would be considered human publisher
If a publisher had between 10% and 50% of their articles labeled as AI, that publisher would be considered a minor AI publisher
If a publisher had between 50% and 80% of their articles labeled as AI, that publisher would be considered a major AI publisher
If a publisher had over 80% of their articles labeled as AI, that publisher would be considered a fully AI-generated publisher

Aggregate Statistics

Of the total articles sampled, we found that:

59,653 articles were classified as AI, representing 6.96% of the article set.

The breakdown of online publishers

Publishers organized by how much AI content they publish

We then looked at the AI classifications across key features including the language the article was written, country where the article was published, and topic that article covered as well as special political relevance.

Countries with the highest frequency of AI articles (minimum 100 articles)

Graph of AI articles produced by country (percentage of total news articles written by country)

We notice in general that Ghana is a rather strong outlier in terms of AI-generated content. While the overall frequency is lower, India is also a major publisher of AI-generated content, which should not be surprising given the impact of deepfakes on the recent Indian election.

AI Frequency by Topic

Graph of AI articles produced by topic (percentage of total news articles written on each topic)

We notice that beauty (sponsored articles), tech and business (crypto scams) are especially large topics that people write AI articles about. Somewhat surprisingly, politics tends to be lower than average when it comes to AI articles: we think this is because advertisers tend to avoid political news sites due to brand safety risks, lowering the incentive for publishers to produce made-for-advertising political content.

What does AI “news” look like?

We identify several categories of AI news articles: made-for-advertising sites (MFAs), sponsored articles, fraud, and disinformation.

Made for Advertising

A site whose only purpose is to serve ads rather than deliver legitimate content an “MFA”-- a made-for-advertising site. Here’s an example of an MFA:

As we can see, above the fold of the website, there is no actual content other than the title, and there are 8 display ads clamoring for the user’s attention. The AI content below is not really meant to be read: it is just there to attract visitors to the site to soak up ad revenue before users typically immediately bounce. Advertisers are often not even aware that they are advertising on these sites: the programmatic nature of digital advertising means that bids for this ad real estate are being bought and sold in matters of milliseconds using automated bidding algorithms. Companies like Jounce Media help advertisers avoid wasting their budget on sites like this, and are part of a group of companies called “Supply Chain Optimizers”.

Jounce defines three key characteristics of an MFA:

Paid Traffic: sites that have little to no organic audience and are reliant on visits from clickbait ads from other sites.
Aggressive Monetization: Through high ad load and rapidly auto-refreshing placements, these publishers capture an arbitrage opportunity through the bidding markets but at the cost of a hostile user experience.
Superficial KPIs: These sites score highly on vanity metrics like viewability and video completion rates, but Jounce’s research shows that ads on MFAs do not actually affect buyers’ purchasing decisions.

To summarize, MFAs steal ad traffic from sites with legitimate content, in order to cheaply offer advertising space supply. They deliver vanity metrics to programmatic ad campaigns, while not actually providing any useful content or any actual ROI for advertisers. They litter the internet and make for a hostile user experience for the average internet consumer.

While there is not a concrete metric on what defines an MFA, we estimate that MFAs make up about 50% of the AI-generated content online.

Paid-For/Sponsored Content

Some news on the Internet can be bought as a means to advertise a product, while masquerading as actual content that was written by an influencer or legitimate review publication. We noticed that beauty was one of the topics with the highest frequency of AI-generated content. When we dug into the data, we found that much of the “news” articles under the beauty topic are simply sponsored articles like this one:

AI wrote this low-quality sponsored content

Many copywriters are simply resorting to the use of AI to write these low-quality sponsored articles, because the goal is simply to sell the placement, rather than generate an authentic review.

Scams

*Crypto scammers use AI to pump out content at a high velocity*

We notice a lot of run-of-the-mill scam campaigns generated with AI as well. In particular, crypto scams seem to be very commonplace, and are even promoted on reputable sites such as Medium.

Disinformation

While we find that the use of AI is typically less prevalent in political news (in large part due to the fact that many advertisers tend to avoid political news due to brand safety risk), AI is a growing component of disinformation campaigns. Newsguard has an AI tracking center that has detailed up-to-date tracking of AI-enabled disinformation.

Unlike the other forms of deception that we see bad actors using AI for, the point of these articles is actually to get people to read the content. Typically, the purpose of these campaigns is to change public sentiment or opinion on a particular topic.

As the US election approaches in November, we can only expect this kind of AI abuse to continue.

Summary

About 7% of the world’s daily news as of July 2024 is likely generated by AI.
West Africa and South Asia are outliers when it comes to the amount of AI content published.
Beauty, tech, and business have the highest proportion of AI content, while politics and opinion has the lowest.
AI content is usually associated with some kind of malintent or deceptive behavior. MFAs attempt to deceive advertisers into believing that low-quality ad space is actually premium. Sponsored content is not necessarily deceptive, but is also not genuinely authentic and cannot be mistaken for a real consumer review. Scams and disinformation genuinely threaten Internet users and the potential harm these sites cause is obvious.

Want to learn more about our map of AI content across the web, or our AI blocklist for advertisers? Get in touch at info@pangramlabs.com!

Also interesting

all articles

Black thin grid lines forming diamond-shaped pattern on a white background.

Product

February 3, 2026

Google found 69 results. We found 3,261. Here's how

NewsCatcher

Company

January 28, 2026

Why Recall Beats Precision for Real-World AI Research

NewsCatcher

Tutorial

January 14, 2026

Building a Deep Research Agent with CatchAll and CrewAI

NewsCatcher

Product

January 13, 2026

Evaluating Recall in AI Web Search: OpenAI vs Exa vs Parallel AI vs CatchAll

NewsCatcher

Tutorial

December 29, 2025

Building a Supply Chain Risk Monitor Using CatchAll and CrewAI

NewsCatcher

Product

November 21, 2025

Introducing CatchAll: A SOTA Web Search API for Real-World Events

NewsCatcher

Company

June 10, 2025

How Transparency International Uses NewsCatcher Data to Fight Health Corruption

Jonathan Cushing Programme Director

Company

March 14, 2025

Comparing News Data Search: LLMs, Analyst, and NewsCatcher Pipelines

Aditya Singh Head Of Product

Product

March 6, 2025

Measuring Product Launch Impact with News Data

Mariia Platonova Head of Marketing

Company

January 24, 2025

NewsCatcher Partners With Reworkd To Streamline Access To Actionable Web Data

Artem Bugara CEO & co-founder

Tutorial

January 22, 2025

Fake News Detection Using Python

Karthik Devan Tech Copywriter

Company

December 16, 2024

Top Media Outlets: 50 Essential News Sites to Consider for Your News Analysis in 2025

Mariia Platonova Head of Marketing

Product

December 9, 2024

How Does Our Local News API Work?

Aditya Singh Head Of Product

Tutorial

November 25, 2024

Detecting Events in News Using NewsCatcher’s Events Intelligence API

Aditya Singh Head Of Product

Product

November 5, 2024

Introducing NewsCatcher's Local News API

Aditya Singh Head Of Product

Company

October 15, 2024

How to Choose a News API

Artem Bugara CEO & co-founder

Product

September 17, 2024

Using Sentiment Analysis for Market Research

Artem Bugara CEO & co-founder

Product

May 7, 2024

Top 4 Free & Open-Source News API Alternatives

Artem Bugara CEO & co-founder

Tutorial

May 7, 2024

Ultimate Guide To Text Similarity With Python

Aditya Singh Head Of Product

Product

May 7, 2024

Using News API For Share Of Voice (SOV) Measurement & Competitor Tracking

Artem Bugara CEO & co-founder

Tutorial

May 7, 2024

How To Train Custom Named Entity Recognition [NER] Model With SpaCy

Aditya Singh Head Of Product

Company

May 7, 2024

Top 15 Takeaways From Running A Bootstrapped Startup For 1 Year

Artem Bugara CEO & co-founder

Tutorial

May 7, 2024

Named Entity Recognition (NER) with SpaCy [with code example]

Aditya Singh Head Of Product

Product

May 7, 2024

How We Built A News API Beta In 60 Days

Artem Bugara CEO & co-founder

Tutorial

May 7, 2024

How To Annotate Entities With Spacy PhraseMatcher

Aditya Singh Head Of Product

Tutorial

May 7, 2024

How To Present/Show Open-Source Projects [Practical Guide]

Artem Bugara CEO & co-founder

Tutorial

May 7, 2024

Google Kubernetes Engine as an alternative to Cloud Run

Maksym Sugonyaka

Tutorial

May 7, 2024

Google News RSS Search Parameters: The Missing Docs

Artem Bugara CEO & co-founder

Tutorial

May 7, 2024

Building A PR/Communication Media Monitoring Tool With News API

Artem Bugara CEO & co-founder

Product

May 7, 2024

100k+ Rows Topic Labeled News Dataset

Artem Bugara CEO & co-founder

Product

May 7, 2024

Announcing Free COVID-19 News API

Artem Bugara CEO & co-founder

Tutorial

March 14, 2024

SpaCy vs NLTK. Text Normalization Comparison [with code]

Aditya Singh Head Of Product

Tutorial

March 14, 2024

Top 6 Text Annotation Tools

Aditya Singh Head Of Product

Tutorial

March 14, 2024

Sentiment Analysis Using Python

Aditya Singh Head Of Product

Tutorial

March 14, 2024

Mining Financial Stock News Using SpaCy Matcher

Aditya Singh Head Of Product

Tutorial

March 14, 2024

Learning Natural Language Processing (NLP) Made Easy

Aditya Singh Head Of Product

Tutorial

March 14, 2024

How To Classify Text With Python, Transformers & scikit-learn

Aditya Singh Head Of Product

Tutorial

March 14, 2024

How To Build Your Own Crypto News Aggregator

Aditya Singh Head Of Product

Tutorial

March 14, 2024

4 Python Web Scraping Libraries To Mining News Data

Aditya Singh Head Of Product

Also interesting

all articles

Product

February 3, 2026

Google found 69 results. We found 3,261. Here's how

NewsCatcher

Company

January 28, 2026

Why Recall Beats Precision for Real-World AI Research

NewsCatcher

Tutorial

January 14, 2026

Building a Deep Research Agent with CatchAll and CrewAI

NewsCatcher

Product

January 13, 2026

Evaluating Recall in AI Web Search: OpenAI vs Exa vs Parallel AI vs CatchAll

NewsCatcher

Tutorial

December 29, 2025

Building a Supply Chain Risk Monitor Using CatchAll and CrewAI

NewsCatcher

Product

November 21, 2025

Introducing CatchAll: A SOTA Web Search API for Real-World Events

NewsCatcher

Company

June 10, 2025

How Transparency International Uses NewsCatcher Data to Fight Health Corruption

Jonathan Cushing

Programme Director

Company

March 14, 2025

Comparing News Data Search: LLMs, Analyst, and NewsCatcher Pipelines

Aditya Singh

Head Of Product

Product

March 6, 2025

Measuring Product Launch Impact with News Data

Mariia Platonova

Head of Marketing

Company

January 24, 2025

NewsCatcher Partners With Reworkd To Streamline Access To Actionable Web Data

Artem Bugara

CEO & co-founder

Tutorial

January 22, 2025

Fake News Detection Using Python

Karthik Devan

Tech Copywriter

Company

December 16, 2024

Top Media Outlets: 50 Essential News Sites to Consider for Your News Analysis in 2025

Mariia Platonova

Head of Marketing

Product

December 9, 2024

How Does Our Local News API Work?

Aditya Singh

Head Of Product

Tutorial

November 25, 2024

Detecting Events in News Using NewsCatcher’s Events Intelligence API

Aditya Singh

Head Of Product

Product

November 5, 2024

Introducing NewsCatcher's Local News API

Aditya Singh

Head Of Product

Company

October 15, 2024

How to Choose a News API

Artem Bugara

CEO & co-founder

Product

September 17, 2024

Using Sentiment Analysis for Market Research

Artem Bugara

CEO & co-founder

Product

May 7, 2024

Top 4 Free & Open-Source News API Alternatives

Artem Bugara

CEO & co-founder

Tutorial

May 7, 2024

Ultimate Guide To Text Similarity With Python

Aditya Singh

Head Of Product

Product

May 7, 2024

Using News API For Share Of Voice (SOV) Measurement & Competitor Tracking

Artem Bugara

CEO & co-founder

Tutorial

May 7, 2024

How To Train Custom Named Entity Recognition [NER] Model With SpaCy

Aditya Singh

Head Of Product

Company

May 7, 2024

Top 15 Takeaways From Running A Bootstrapped Startup For 1 Year

Artem Bugara

CEO & co-founder

Tutorial

May 7, 2024

Named Entity Recognition (NER) with SpaCy [with code example]

Aditya Singh

Head Of Product

Product

May 7, 2024

How We Built A News API Beta In 60 Days

Artem Bugara

CEO & co-founder

Tutorial

May 7, 2024

How To Annotate Entities With Spacy PhraseMatcher

Aditya Singh

Head Of Product

Tutorial

May 7, 2024

How To Present/Show Open-Source Projects [Practical Guide]

Artem Bugara

CEO & co-founder

Tutorial

May 7, 2024

Google Kubernetes Engine as an alternative to Cloud Run

Maksym Sugonyaka

Tutorial

May 7, 2024

Google News RSS Search Parameters: The Missing Docs

Artem Bugara

CEO & co-founder

Tutorial

May 7, 2024

Building A PR/Communication Media Monitoring Tool With News API

Artem Bugara

CEO & co-founder

Product

May 7, 2024

100k+ Rows Topic Labeled News Dataset

Artem Bugara

CEO & co-founder

Product

May 7, 2024

Announcing Free COVID-19 News API

Artem Bugara

CEO & co-founder

Tutorial

March 14, 2024

SpaCy vs NLTK. Text Normalization Comparison [with code]

Aditya Singh

Head Of Product

Tutorial

March 14, 2024

Top 6 Text Annotation Tools

Aditya Singh

Head Of Product

Tutorial

March 14, 2024

Sentiment Analysis Using Python

Aditya Singh

Head Of Product

Tutorial

March 14, 2024

Mining Financial Stock News Using SpaCy Matcher

Aditya Singh

Head Of Product

Tutorial

March 14, 2024

Learning Natural Language Processing (NLP) Made Easy

Aditya Singh

Head Of Product

Tutorial

March 14, 2024

How To Classify Text With Python, Transformers & scikit-learn

Aditya Singh

Head Of Product

Tutorial

March 14, 2024

How To Build Your Own Crypto News Aggregator

Aditya Singh

Head Of Product

Tutorial

March 14, 2024

4 Python Web Scraping Libraries To Mining News Data

Aditya Singh

Head Of Product