Discover how Transparency International uses the NewsCatcher API to uncover corruption cases from thousands of news articles.

One of the significant uses of news data is to track events of particular interest. For example, venture capitalists and financial investors can use the news to monitor market events or receive updates about their portfolio items. Administrators can track events in the news to study and act on issues of public interest, such as crimes and infrastructure lapses.

Transparency International UN is a global coalition fighting against corruption. They work with various stakeholders such as policymakers and the people, to strengthen anti-corruption laws, and enable reporting of corruption. A significant product of their work is the Corruption Perception Index (CPI) which ranks 180 countries based on their perceived levels of public sector corruption. Media monitoring is a key component of the research that goes into their work, and they use the NewsCatcher Events API to detect events pertaining to ‘acts of corruption’ from the hundreds of thousands of news articles published every day.

Extracting event information from news data involves using LLMs and Natural Language Processing techniques, to get structured data from a free text news article. In the context of corruption, this involves specific information fields such as accused parties, the timeline of the event, and the amounts involved. With NewsCatcher, all this extraction work is done on our end and readily available via an API.

In this tutorial, let’s look at how we can detect events from news articles using NewsCatcher’s Events Intelligence API. We’ll be detecting ‘acts of corruption’ events as an example. All the code snippets are presented in Python, but they will work with any programming language via a REST API.

Basic Setup

For this tutorial, we’ll need the following:

  • NewsCatcher API Endpoint: Base URL for all API requests
  • NewsCatcher API Key: For authentication, using the x-api-token HTTP header
  • A Python (3.x) installation with requests library installed.

Let’s put these at the start of the code:

We also imported the built-in json module to pretty print and view the JSON outputs.

Getting Subscription Summary

NewsCatcher provides a subscription summary API endpoint, to conveniently view the status of our subscription. This includes the number of API calls assigned and remaining, and what events are available to us, among other things. To call this endpoint, we need to send a GET request to /api/subscription/:

This code prints the below output resulting from the API call:

In the output, we can see the various fields related to our subscription, such as whether it is active or not, the rate limit, and the number of calls assigned and remaining. We also have an additional_info field, that mentions the allowed_events we have access to, and it includes acts_of_corruption. We’ll use this event type in the further steps to get the events from the API.

Getting Event Fields

Before we get the events, NewsCatcher provides an endpoint to observe what fields will be provided for each event in the API. This will help us structure events search API calls better. We can check which fields available for our event type by sending a GET request to /api/events_info/get_event_fields. We’ll pass the event_type as a query parameter with the value act_of_corruption. Let’s look at the code:

This gives us the following output:

From the above output, we see that 16 fields are available. These include various structured data fields such as event date, extraction date, monetary amounts involved, and accused and victim parties. We use these fields to search and filter events when we use the event search API. We also see some usage examples in the outputs, and for date fields, we see that we can use convenient strings such as now and now-1d instead of providing the exact date strings.

Searching for Events

Finally, let’s look at how to search for events. For this, we’ll be sending POST requests to /api/events_search/ including our search parameters as JSON body:

A Basic Query

The above code will get us a list of events from the API:

The above snippet shows some truncated output showing the fields returned, with one event data point. Along with the event data points, the result gives us a message: "Success" field indicating that the API call was successful, along with count fields showing the number of event data points returned.

Using Filters

We can use all the fields we saw in the earlier event_fields output. Say we want to get only the events where the victim parties are “Citizens of India”. We can add the corresponding filter as follows:

This gives us the below output:

With this additional filter, we get only 7 data points where the victim parties are ‘Citizens of India’.

Getting the News Sources

For any data to be cited or presented, we always need the source of the data. So, NewsCatcher gives us an option to get the original article in which the event was detected. To use this option, we need to set attach_articles_data to true in our JSON payload:

The above code gives us the list of events with the source article data attached. Let’s see what a single event’s JSON looks like with this:

We can see that the source article from which the event was extracted has been added to the output above. A link, cover media URL and a title have been provided for the article.

Conclusion

In this tutorial, we looked at how to use the NewsCatcher Events Intelligence API to detect events from news data. We did this in three steps:

  1. Used the GET: /api/subscription/ endpoint to see which type of events are enabled for us to access
  2. Used the GET: /api/events_info/get_event_fields/ endpoint to see what fields are available for the selected event type, to use for filtering and searching events.
  3. Used the POST: /api/events_search/ to search for events with filters.

We also looked at how to use filters in the search and how to get the source articles from which the events were extracted.

The NewsCatcher Events Intelligence API is a handy feature that can be used to detect events of interest from thousands of news articles. The backend does all the heavy lifting of parsing the events from the articles, so you can directly proceed with your analysis of events. To get access to the API, visit the pricing page.