> ## Documentation Index
> Fetch the complete documentation index at: https://newscatcherinc-docs.mintlify.site/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# NLP features

> NLP fields available on indexed articles and how to use them in News API requests.

As part of the NewsCatcher processing pipeline, each article is enriched with
NLP data before it is indexed: theme classification, sentiment scores, named
entities, content tags, and vector embeddings. News API exposes these fields in
the response when you set `include_nlp_data` to `true`.

<Warning>
  NLP enrichment is available only for articles indexed from July 2023 onward.
  For earlier articles, the API returns `"nlp": {}`.

  To request NLP enrichment for historical articles, contact
  [support@newscatcherapi.com](mailto:support@newscatcherapi.com).
</Warning>

## How NLP processing works

Processing mode depends on the article's language and determines which response
fields are populated and which are `null`.

**Native processing** applies to English and Arabic articles. NLP runs on the
original text and results appear in the standard `nlp.*` fields.

**Translation-based processing** applies to all other languages. The article is
first translated to English, then NLP runs on that translation. Results appear
in `nlp.translation_*` fields — the corresponding standard fields are explicitly
`null`, not absent. To receive translation fields in the response, set
`include_translation_fields` to `true`.

This distinction matters when consuming NER or summary fields: a `null` value in
`nlp.ner_PER` means the article was processed via translation, not that no
entities exist — check `nlp.translation_ner_PER` instead.

## Available features

| Feature                  | What it produces                                                                                                |
| ------------------------ | --------------------------------------------------------------------------------------------------------------- |
| Theme                    | One or more topic labels per article, for example `Tech` or `Finance`. Filterable with `theme` and `not_theme`. |
| Summary                  | AI-generated article summary. `nlp.summary` for native, `nlp.translation_summary` for translation-based.        |
| Sentiment                | Tone scores from `-1.0` to `1.0` for title and content independently.                                           |
| Named entity recognition | Persons, organizations, locations, and miscellaneous entities with mention counts.                              |
| IPTC tags                | Hierarchical news category tags using the IPTC media topic standard.                                            |
| IAB tags                 | Content category tags using the IAB content taxonomy, used for audience segmentation.                           |
| Custom tags              | Organization-specific taxonomy, private to your API key.                                                        |
| Vector embeddings        | 1024-dimensional semantic vectors for similarity search and clustering.                                         |

## See also

* [Search by entity](/news-api/how-to/search-by-entity)
* [Search in translations](/news-api/how-to/search-in-translations)
* [Clustering news articles](/news-api/guides-and-concepts/clustering-news-articles)
* [Custom tags](/news-api/guides-and-concepts/custom-tags)
* [Subscription plans](/news-api/get-started/subscription-plans)
