Robots.txt compliance in News API v3
Understand robots.txt compliance fields and parameters in News API v3 to build applications that respect publisher permissions
News API v3 includes robots.txt compliance information with every article, helping you build applications that respect publisher permissions. This feature provides transparency about whether content adheres to the source website’s automated access guidelines.
What is robots.txt compliance?
Robots.txt files allow website publishers to specify which parts of their site can be accessed by automated tools. News API v3 respects these guidelines by marking each article with its compliance status.
The robots.txt compliance check is performed at the time of article collection and reflects the publisher’s guidelines at that moment.
The robots_compliant field
Each API response includes a robots_compliant
boolean field indicating if
content can be safely accessed according to the publisher’s guidelines. If true,
automated access is allowed; if false, it is restricted.
Filtering by compliance
Use the optional robots_compliant
parameter to filter your API requests. This
parameter works with all endpoints that return articles.
If the parameter is ommited, the API returns all articles with compliance status indicated (default behavior).
If the parameter is ommited, the API returns all articles with compliance status indicated (default behavior).
Returns only articles that comply with publisher guidelines.
Returns only articles flagged as non-compliant.
Benefits
Legal compliance
Reduce legal risks by respecting publisher-defined access policies and maintaining good relationships with content providers.
Performance optimization
Filter non-compliant content server-side, reducing unnecessary data transfer and improving application performance.
Publisher relations
Demonstrate respect for content providers’ access policies, fostering better long-term partnerships.
Transparency
Clear visibility into which content can be safely used, enabling informed decisions about content usage.
Implementation considerations
Always present in responses
Always present in responses
The robots_compliant
field is included in every article object, ensuring
consistent access to compliance information regardless of filtering.
Optional parameter
Optional parameter
You can include the robots_compliant
parameter in requests to filter
results, or omit it to receive all content with compliance flags.
Server-side filtering
Server-side filtering
When you use the parameter, filtering happens at the API level, improving performance by reducing unnecessary data transfer.
Boolean values only
Boolean values only
The parameter accepts true
to show only compliant content, false
for
non-compliant content, or can be omitted entirely.