The Newscatcher Python SDK includes client-side query validation that mirrors the API’s server-side validation. This feature helps you catch invalid query syntax before making API calls.

Overview

Query validation offers these benefits:
  • Immediate feedback: Catch errors without making API calls.
  • Reduced API usage: Avoid requests for invalid queries.
  • Cost savings: Prevent billable API calls for invalid queries.
  • Essential for LLM workflows: Critical when you process thousands of generated queries.
  • Consistent behavior: Match server-side validation exactly.
  • Better developer experience: Get detailed error messages immediately.

Basic usage

Use the validate_query() method to check query syntax before you make API calls:
Basic query validation
from newscatcher import NewscatcherApi

client = NewscatcherApi(api_key="YOUR_API_KEY")

# Validate a query before using it
is_valid, error_message = client.validate_query("machine learning")
if is_valid:
    print("Query is valid!")
else:
    print(f"Invalid query: {error_message}")
The method returns a tuple:
  • is_valid (bool): Whether the query passes validation
  • error_message (str): Detailed error description if validation fails, or empty string if valid

Automatic validation in SDK methods

Query validation is enabled by default in methods like get_all_articles() and get_all_headlines(). You can control this behavior:
SDK method validation
# Enable validation (default behavior)
articles = client.get_all_articles(
    q="AI OR \"artificial intelligence\"",  # Valid query
    validate_query=True,  # Optional, True by default
    from_="7d"
)

# Disable validation (not recommended)
articles = client.get_all_articles(
    q="some query",
    validate_query=False,  # Skip client-side validation
    from_="7d"
)
When validation is enabled and a query fails validation, the method raises a ValueError with the specific error message.

Validation rules

The SDK validates queries using the same rules as the Newscatcher API.

Valid patterns

Single words and terms
"technology"   # Single word
"AI"           # Acronym

Invalid patterns

Forbidden characters
"machine[learning]"       # Square brackets not allowed
"AI/ML"                   # Forward slashes not allowed
"machine:learning"        # Colons not allowed
"data^science"            # Caret symbols not allowed

Understand automatic AND insertion

The API automatically inserts AND operators between standalone terms, which can create validation conflicts with mixed operator levels. Problem
Common AND insertion conflicts
# ❌ This fails because of automatic AND insertion:
"AI OR artificial intelligence"
# Becomes: "AI OR artificial AND intelligence" (mixed operator levels)

# ❌ Another example:
"startup OR venture capital"
# Becomes: "startup OR venture AND capital" (mixed operator levels)
Solutions
Fix AND insertion conflicts
# ✅ Fix by using exact phrase matching:
"AI OR \"artificial intelligence\""
# Stays as: "AI OR \"artificial intelligence\"" (same level)

# ✅ Or use proper grouping:
"startup OR (venture AND capital)"
# Becomes: "startup OR (venture AND capital)" (properly grouped)
Always use double quotes for multi-word terms when combining with OR operators to prevent automatic AND insertion conflicts.

Validate multiple queries

For applications that process multiple queries (like LLM-generated queries), you can validate them in bulk: This approach works well when you work with:
  • LLM-generated queries that may have syntax issues.
  • User input that needs validation before processing.
  • Batch processing scenarios where you want to filter valid queries first.
Bulk validation is especially valuable for production applications processing thousands of queries, as it prevents costly API calls for invalid queries.

Best practices

  1. Use exact phrases for multi-word terms: When you search for specific phrases, always use double quotes to prevent automatic AND insertion conflicts.
  2. Validate LLM-generated queries: Essential for applications that process thousands of AI-generated queries to save time and money.
  3. Group complex queries: Use parentheses to make query logic clear and avoid operator-level conflicts.

See also