100k+ Rows Topic Labeled News Dataset - NewsCatcher

All `topics` have 15k articles except for SCIENCE which is 3774. Those articles are published by thousands of different news websites.

Find the dataset on:

We're NewsCatcher team: we collect and index news articles. We provide News API to find relevant news data.

We contribute a lot to the open-source community by sharing our work (find other links at the bottom of the description)

Dataset

We collected over 100k articles for 8 different news topics

BUSINESS  |       15000

ENTERTAINMENT  |  15000

HEALTH      |     15000

NATION      |     15000

SCIENCE     |      3774

SPORTS       |    15000

TECHNOLOGY   |    15000

WORLD     |       15000

Those articles got published over the first half of August 2020.

All `topics` have 15k articles except for SCIENCE which is 3774. Those articles are published by thousands of different news websites.

Other Useful Links

newscatcher Py package - Programmatically collect normalized news from (almost) any website.

pygooglenews - If Google News had a Python library

Support Us

The best you can do for us is to let people know about our News API

Need a bigger dataset?

Connect with me on Linkedin or email at artem [at] newscatcherapi [dot] com

100k+ Rows Topic Labeled News Dataset - NewsCatcher

Read more related blogs

Ready for Custom News Solutions?

Drop your email and find out how our API delivers precisely what your business needs.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.