News API v3 includes robots.txt compliance information with every article, helping you build applications that respect publisher permissions. This feature provides transparency about whether content adheres to the source website’s automated access guidelines.

What is robots.txt compliance?

Robots.txt files allow website publishers to specify which parts of their site can be accessed by automated tools. News API v3 respects these guidelines by marking each article with its compliance status.

The robots.txt compliance check is performed at the time of article collection and reflects the publisher’s guidelines at that moment.

The robots_compliant field

Each API response includes a robots_compliant boolean field indicating if content can be safely accessed according to the publisher’s guidelines. If true, automated access is allowed; if false, it is restricted.

{
  "title": "Revolutionary AI Technology Breakthrough",
  "author": "Jane Smith",
  "domain_url": "techcrunch.com",
  "content": "Scientists have made a groundbreaking discovery...",
  "robots_compliant": true
  // other article field
}

Filtering by compliance

Use the optional robots_compliant parameter to filter your API requests. This parameter works with all endpoints that return articles.

{
  "q": "artificial intelligence"
}

If the parameter is ommited, the API returns all articles with compliance status indicated (default behavior).

Benefits

Legal compliance

Reduce legal risks by respecting publisher-defined access policies and maintaining good relationships with content providers.

Performance optimization

Filter non-compliant content server-side, reducing unnecessary data transfer and improving application performance.

Publisher relations

Demonstrate respect for content providers’ access policies, fostering better long-term partnerships.

Transparency

Clear visibility into which content can be safely used, enabling informed decisions about content usage.

Implementation considerations

See also