POST
/
api
/
search_similar
Search similar articles
curl --request POST \
  --url https://v3-api.newscatcherapi.com/api/search_similar \
  --header 'Content-Type: application/json' \
  --header 'x-api-token: <api-key>' \
  --data '{
  "q": "\"supply chain\" AND Amazon NOT China",
  "include_similar_documents": true,
  "similar_documents_number": 5,
  "page_size": 10
}'
{
  "status": "<string>",
  "total_hits": 123,
  "page": 123,
  "total_pages": 123,
  "page_size": 123,
  "articles": [],
  "user_input": {}
}

Authorizations

x-api-token
string
header
required

API Key to authenticate requests.

To access the API, include your API key in the x-api-token header. To obtain your API key, complete the form or contact us directly.

Body

application/json

Request body for searching similar articles based on specified criteria such as query, language, country, source, and more.

q
string
required

The keyword(s) to search for in articles. Query syntax supports logical operators (AND, OR, NOT) and wildcards:

  • For exact phrases, use escaped quotes: \"technology news\"
  • Use * for wildcards: technolog* (cannot start with *)
  • Use + to include and - to exclude: +Apple, -Google
  • Boolean operators: technology AND (Apple OR Microsoft) NOT Google
  • Forbidden characters: [ ] / \\ : ^ and URL-encoded equivalents

Note: The API automatically inserts AND operators between standalone terms, so strings like "machine learning" become "machine AND learning". To avoid syntax errors (especially in queries with OR operators), use literal escape "\"machine learning\"".

For detailed syntax rules, see Advanced querying.

Example:

"\"supply chain\" AND Amazon NOT China"

search_in
string
default:title_content

The article fields to search in. Use a comma-separated string for multiple options, with a maximum of 2 in a single request.

Available options:

  • Standard fields: title, content, summary, title_content
  • Translation fields: title_translated, content_translated, summary_translated, title_content_translated
Example:

"title_content, title_content_translated"

include_translation_fields
boolean
default:false

If true, includes English translation fields in the response (title_translated_en, content_translated_en, and NLP translation fields).

Example:

true

include_similar_documents
boolean
default:false

If true, includes similar documents in the response.

Example:

true

similar_documents_number
integer
default:5

The number of similar documents to return.

Example:

10

similar_documents_fields
string
default:title,content

The fields to consider for finding similar documents.

Example:

"title,summary"

predefined_sources

Predefined top news sources per country.

Format: start with the word top, followed by the number of desired sources, and then the two-letter country code ISO 3166-1 alpha-2. Multiple countries with the number of top sources can be specified as a comma-separated string or an array of strings.

Examples:

  • "top 100 US"
  • "top 33 AT"
  • "top 50 US, top 20 GB"
  • ["top 50 US", "top 20 GB"]
Example:
["top 50 US", "top 20 GB"]
sources

One or more news sources to narrow down the search. The format must be a domain URL. Subdomains, such as finance.yahoo.com, are also acceptable. To specify multiple sources, use a comma-separated string or an array of strings.

Examples:

  • "nytimes.com, theguardian.com"
  • ["nytimes.com", "theguardian.com"]
Example:
["nytimes.com", "theguardian.com"]
not_sources

The news sources to exclude from the search. To exclude multiple sources, use a comma-separated string or an array of strings.

Examples:

  • "cnn.com, wsj.com"
  • ["cnn.com", "wsj.com"]
Example:
["cnn.com", "wsj.com"]
lang

The language(s) of the search. The only accepted format is the two-letter ISO 639-1 code. To select multiple languages, use a comma-separated string or an array of strings.

Examples:

  • "en,es"
  • ["en", "es"]

To learn more, see Enumerated parameters > Language.

Example:
["en", "es"]
not_lang

The language(s) to exclude from the search. The accepted format is the two-letter ISO 639-1 code. To exclude multiple languages, use a comma-separated string or an array of strings.

Examples:

  • "fr,de"
  • ["fr", "de"]

To learn more, see Enumerated parameters > Language.

Example:
["fr", "de"]
countries

The countries where the news publisher is located. The accepted format is the two-letter ISO 3166-1 alpha-2 code. To select multiple countries, use a comma-separated string or an array of strings.

Examples:

  • "US,CA"
  • ["US", "CA"]

To learn more, see Enumerated parameters > Country.

Example:
["US", "CA"]
not_countries

The publisher location countries to exclude from the search. The accepted format is the two-letter ISO 3166-1 alpha-2 code. To exclude multiple countries, use a comma-separated string or an array of strings.

Examples:

  • "UK,FR"
  • ["UK", "FR"]

To learn more, see Enumerated parameters > Country.

Example:
["UK", "FR"]
from_
default:7 days ago

The starting point in time to search from. Accepts date-time strings in ISO 8601 format and plain text strings. The default time zone is UTC.

Formats with examples:

  • YYYY-mm-ddTHH:MM:SS: 2024-07-01T00:00:00
  • YYYY-MM-dd: 2024-07-01
  • YYYY/mm/dd HH:MM:SS: 2024/07/01 00:00:00
  • YYYY/mm/dd: 2024/07/01
  • English phrases: 7 day ago, today

Note: By default, applied to the publication date of the article. To use the article's parse date instead, set the by_parse_date parameter to true.

Example:

"2024-07-01T00:00:00.000Z"

to_

The ending point in time to search up to. Accepts date-time strings in ISO 8601 format and plain text strings. The default time zone is UTC.

Formats with examples:

  • YYYY-mm-ddTHH:MM:SS: 2024-07-01T00:00:00
  • YYYY-MM-dd: 2024-07-01
  • YYYY/mm/dd HH:MM:SS: 2024/07/01 00:00:00
  • YYYY/mm/dd: 2024/07/01
  • English phrases: 1 day ago, now

Note: By default, applied to the publication date of the article. To use the article's parse date instead, set the by_parse_date parameter to true.

Example:

"2024-01-01T00:00:00.000Z"

by_parse_date
boolean
default:false

If true, the from_ and to_ parameters use article parse dates instead of published dates. Additionally, the parse_date variable is added to the output for each article object.

Example:

true

published_date_precision
enum<string>

The precision of the published date. There are three types:

  • full: The day and time of an article is correctly identified with the appropriate timezone.
  • timezone unknown: The day and time of an article is correctly identified without timezone.
  • date: Only the day is identified without an exact time.
Available options:
full,
timezone unknown,
date
Example:

"full"

sort_by
enum<string>
default:relevancy

The sorting order of the results. Possible values are:

  • relevancy: The most relevant results first.
  • date: The most recently published results first.
  • rank: The results from the highest-ranked sources first.
Available options:
relevancy,
date,
rank
Example:

"date"

ranked_only
boolean
default:true

If true, limits the search to sources ranked in the top 1 million online websites. If false, includes unranked sources which are assigned a rank of 999999.

Example:

true

from_rank
integer
default:1

The lowest boundary of the rank of a news website to filter by. A lower rank indicates a more popular source.

Required range: 1 <= x <= 999999
Example:

100

to_rank
integer
default:999999

The highest boundary of the rank of a news website to filter by. A lower rank indicates a more popular source.

Required range: 1 <= x <= 999999
Example:

100

is_headline
boolean

If true, only returns articles that were posted on the home page of a given news domain.

Example:

true

is_opinion
boolean

If true, returns only opinion pieces. If false, excludes opinion-based articles and returns news only.

Example:

true

is_paid_content
boolean

If false, returns only articles that have publicly available complete content. Some publishers partially block content, so this setting ensures that only full articles are retrieved.

Example:

false

parent_url

The categorical URL(s) to filter your search. To filter your search by multiple categorical URLs, use a comma-separated string or an array of strings.

Examples:

  • "wsj.com/politics,wsj.com/tech"
  • ["wsj.com/politics", "wsj.com/tech"]
Example:
["wsj.com/politics", "wsj.com/tech"]

The complete URL(s) mentioned in the article. For multiple URLs, use a comma-separated string or an array of strings.

Examples:

  • "https://aiindex.stanford.edu/report/, https://www.stateof.ai/"
  • ["https://aiindex.stanford.edu/report/", "https://www.stateof.ai/"]

For more details, see Search by URL.

Example:
{
"string-input": {
"summary": "Comma-separated string",
"value": "https://aiindex.stanford.edu/report/, https://www.stateof.ai/"
},
"array-input": {
"summary": "Array of strings",
"value": [
"https://aiindex.stanford.edu/report/",
"https://www.stateof.ai/"
]
}
}

The domain(s) mentioned in the article. For multiple domains, use a comma-separated string or an array of strings.

Examples:

  • "who.int, nih.gov"
  • ["who.int", "nih.gov"]

For more details, see Search by URL.

Example:
{
"string-input": {
"summary": "Comma-separated string",
"value": "who.int, nih.gov"
},
"array-input": {
"summary": "Array of strings",
"value": ["who.int", "nih.gov"]
}
}
word_count_min
integer

The minimum number of words an article must contain. To be used for avoiding articles with small content.

Required range: x >= 0
Example:

300

word_count_max
integer

The maximum number of words an article can contain. To be used for avoiding articles with large content.

Required range: x >= 0
Example:

1000

page
integer
default:1

The page number to scroll through the results. Use for pagination, as a single API response can return up to 1,000 articles.

For details, see How to paginate large datasets.

Required range: x >= 1
Example:

2

page_size
integer
default:100

The number of articles to return per page.

Required range: 1 <= x <= 1000
Example:

50

include_nlp_data
boolean
default:false

If true, includes an NLP object for each article in the response. This object provides results of NLP analysis, including article theme, summary, sentiment, tags, and named entity recognition if available.

To learn more, see NLP features.

Example:

true

has_nlp
boolean
default:false

If true, filters results to include only articles that have NLP data.

To learn more, see NLP features.

Example:

true

theme

Filters articles based on their general topic, as determined by NLP analysis. To select multiple themes, use a comma-separated string or an array of strings.

Examples:

  • "Finance, Tech"
  • ["Finance", "Tech"]

Note: The theme parameter is only available if NLP is included in your subscription plan.

To learn more, see NLP features.

Available options: Business, Economics, Entertainment, Finance, Health, Politics, Science, Sports, Tech, Crime, Financial Crime, Lifestyle, Automotive, Travel, Weather, General.

Example:
["Business", "Finance"]
not_theme

Inverse of the theme parameter. Excludes articles based on their general topic, as determined by NLP analysis. To exclude multiple themes, use a comma-separated string or an array of strings.

Examples:

  • "Crime, Tech"
  • ["Crime", "Tech"]

Note: The not_theme parameter is only available if NLP is included in your subscription plan.

To learn more, see NLP features.

Example:
["Crime"]
ner_name
string

The name of person, organization, location, product or other named entity to search for. To specify multiple names use a comma-separated string.

Example: "Tesla, Amazon"

Example:

"Tesla,Amazon"

title_sentiment_min
number

Filters articles based on the minimum sentiment score of their titles.

Range is -1.0 to 1.0, where:

  • Negative values indicate negative sentiment.
  • Positive values indicate positive sentiment.
  • Values close to 0 indicate neutral sentiment.

Note: The title_sentiment_min parameter is only available if NLP is included in your subscription plan.

To learn more, see NLP features.

Required range: -1 <= x <= 1
Example:

-0.5

title_sentiment_max
number

Filters articles based on the maximum sentiment score of their titles.

Range is -1.0 to 1.0, where:

  • Negative values indicate negative sentiment.
  • Positive values indicate positive sentiment.
  • Values close to 0 indicate neutral sentiment.

Note: The title_sentiment_max parameter is only available if NLP is included in your subscription plan.

To learn more, see NLP features.

Required range: -1 <= x <= 1
Example:

0.5

content_sentiment_min
number

Filters articles based on the minimum sentiment score of their content.

Range is -1.0 to 1.0, where:

  • Negative values indicate negative sentiment.
  • Positive values indicate positive sentiment.
  • Values close to 0 indicate neutral sentiment.

Note: The content_sentiment_min parameter is only available if NLP is included in your subscription plan.

To learn more, see NLP features.

Required range: -1 <= x <= 1
Example:

-0.5

content_sentiment_max
number

Filters articles based on the maximum sentiment score of their content.

Range is -1.0 to 1.0, where:

  • Negative values indicate negative sentiment.
  • Positive values indicate positive sentiment.
  • Values close to 0 indicate neutral sentiment.

Note: The content_sentiment_max parameter is only available if NLP is included in your subscription plan.

To learn more, see NLP features.

Required range: -1 <= x <= 1
Example:

0.5

iptc_tags

Filters articles based on International Press Telecommunications Council (IPTC) media topic tags. To specify multiple IPTC tags, use a comma-separated string or an array of strings.

Examples:

  • "20000199, 20000209"
  • ["20000199", "20000209"]

Note: The iptc_tags parameter is only available in the v3_nlp_iptc_tags subscription plan.

To learn more, see IPTC Media Topic NewsCodes.

Example:
["20000199", "20000209"]
not_iptc_tags

Inverse of the iptc_tags parameter. Excludes articles based on International Press Telecommunications Council (IPTC) media topic tags. To specify multiple IPTC tags to exclude, use a comma-separated string or an array of strings.

Examples:

  • "20000205, 20000209"
  • ["20000205", "20000209"]

Note: The not_iptc_tags parameter is only available in the v3_nlp_iptc_tags subscription plan.

To learn more, see IPTC Media Topic NewsCodes.

Example:
["20000205", "20000209"]
custom_tags

Filters articles based on provided taxonomy that is tailored to your specific needs and is accessible only with your API key. To specify tags, use the following pattern:

  • custom_tags.taxonomy=Tag1,Tag2,Tag3, where taxonomy is the taxonomy name and Tag1,Tag2,Tag3 are comma-separated tags. For POST requests, you can also specify tags as an array of strings.

Examples:

  • custom_tags.industry="Manufacturing, Supply Chain, Logistics"
  • "custom_tags.industry": ["Manufacturing", "Supply Chain", "Logistics"]

To learn more, see the Custom tags.

Example:
["Tag1", "Tag2", "Tag3"]
robots_compliant
boolean

If true, returns only articles/sources that comply with the publisher's robots.txt rules. If false, returns only articles/sources that do not comply with robots.txt rules. If omitted, returns all articles/sources regardless of compliance status.

Example:

true

Response

A successful response containing articles similar to the specified query. If no matches, returns a failed search response according to the defined schema.

The response model for a successful Search similar request. Response field behavior:

  • Required fields are guaranteed to be present and non-null.
  • Optional fields may be null or undefined if the data point is not presented or couldn't be extracted during processing.
  • To access article properties in the articles response array, use array index notation. For example, articles[n].title, where n is the zero-based index of the article object (0, 1, 2, etc.).
  • The nlp property within the article object articles[n].nlp is only available with NLP-enabled subscription plans.

The base response model containing common fields for search operations.

status
string
required

The status of the response.

total_hits
integer
required

The total number of articles matching the search criteria.

page
integer
required

The current page number of the results.

total_pages
integer
required

The total number of pages available for the given search criteria.

page_size
integer
required

The number of articles per page.

articles
Search Similar Article Object · object[]

A list of articles matching the search criteria.

user_input
object

The user input parameters for the request.