Skip to main content
CatchAll transforms your natural language questions into structured, validated records from web sources. Ask questions in plain English and receive organized data with source citations—no need for complex keywords or filters.

What is CatchAll API?

CatchAll provides an end-to-end pipeline for converting natural language queries into structured event data. The system analyzes your question, retrieves relevant content from diverse sources (news sites, government databases, public records, corporate sites), clusters similar information, validates relevance, and extracts structured data tailored to your specific query.

How it works

When you submit a query, CatchAll follows this multi-stage pipeline:
  1. Analyze: Understands your query and generates search queries, validators, and extractors.
  2. Fetch: Retrieves relevant content from web sources.
  3. Cluster: Groups similar content into distinct events.
  4. Validate: Filters clusters to keep only relevant ones.
  5. Extract: Pulls structured data from validated clusters.
  6. Return: Delivers records with citations.
Processing typically takes 10-15 minutes per job.

What to expect

Dynamic schemas

Response schemas are generated uniquely for each query. Field names and structure in the enrichment object vary between jobs—even with identical inputs. What’s guaranteed in every record:
  • record_id
  • enrichment.record_title
  • citations array
What varies:
  • All other fields in enrichment (names, types, structure)
  • Number of records returned
  • Specific content extracted
This flexibility lets CatchAll adapt to any query type without predefined schemas.
See Understanding dynamic schemas to learn how to build integrations that handle variable response structures.

Non-deterministic results

Identical queries can produce different results because:
  • LLMs may generate different keywords, validators, and extractors.
  • Different content sources may be retrieved.
  • Field names and structure vary between runs.
  • The number of records extracted can differ significantly.
This flexibility allows the system to adapt to any query type without predefined schemas.

Asynchronous processing

Each query creates a job that processes asynchronously. You receive a job_id to poll for status and retrieve results when complete.

Base URL

For API requests use the following base URL:
https://catchall.newscatcherapi.com

Endpoints

EndpointMethodDescriptionUse Case
/catchAll/submitPOSTCreate a new jobSubmit a natural language query to start processing
/catchAll/status/{job_id}GETCheck job statusMonitor processing progress through 12 status stages
/catchAll/pull/{job_id}GETGet job resultsRetrieve structured records when job completes

Request format

Include your API key in the x-api-key header for each request. All requests must use HTTPS.

Basic request

{
  "query": "Tech company earnings this quarter",
  "context": "Focus on revenue and profit margins",
  "summary_template": "Company [NAME] earned [REVENUE] in [QUARTER]"
}

Request parameters

  • query (string, required): Natural language question describing what to find
  • context (string, optional): Additional context to focus search and extraction
  • summary_template (string, optional): Template to guide record summary formatting. When provided, adds a template_based_summary field to each record

Response format

Job creation response

{
  "job_id": "af7a26d6-cf0b-458c-a6ed-4b6318c74da3"
}

Job status response

{
  "job_id": "af7a26d6-cf0b-458c-a6ed-4b6318c74da3",
  "status": "data_fetched"
}
The list of all job statuses available in the Job statuses section of this document.

Results response

{
  "job_id": "af7a26d6-cf0b-458c-a6ed-4b6318c74da3",
  "query": "Tech company earnings this quarter",
  "status": "job_completed",
  "processing_time": "15m",
  "sources_count": 59150,
  "total_records": 2865,
  "records": [
    {
      "record_id": "5262823697790152939",
      "enrichment": {
        "record_title": "Oracle Q1 2026 Earnings Exceed Expectations",
        "company_name": "Oracle",
        "quarter_identifier": "Q1 2026",
        "revenue": "$14.9 billion",
        "revenue_change": "up 12%",
        "profit_margin": "42% non-GAAP operating margin",
        "template_based_summary": "Company Oracle earned $14.9 billion in Q1 2026"
      },
      "citations": [
        {
          "title": "Oracle Reports Strong Q1 2026 Results",
          "link": "https://example.com/article",
          "published_date": "2025-09-26 08:54:20"
        }
      ]
    }
  ]
}
The field names in the enrichment object are LLM-generated and may vary event for the same inputs. The example above shows one possible structure for earnings queries.

Job statuses

To monitor the progress of your job, use the /status/{job_id} endpoint. We recommend polling this endpoint every 30-60 seconds. Jobs move through the following statuses:
StatusDescriptionTypical Duration
pendingJob queued, waiting to startSeconds
analysis_startedBeginning query analysisSeconds
analysis_keywords_extractedKeywords identified from query30-60 seconds
analysis_enrichments_extractedValidators and extractors generated30-60 seconds
analysis_queries_extractedSearch queries created (typically 10 queries)30-60 seconds
retrieval_dispatchedQueries sent to fetching serviceSeconds
data_fetchedArticles retrieved from news database3-5 minutes
clustering_dispatchedClustering process initiatedSeconds
data_groupedSimilar articles clustered2-4 minutes
enrichment_dispatchedValidation and extraction startedSeconds
data_enrichedStructured data extracted from valid clusters4-6 minutes
job_completedJob finished, results ready-

Use cases

CatchAll is designed for applications requiring structured data from unstructured web content:
  • Market intelligence: Track company earnings, M&A activity, product launches.
  • Regulatory monitoring: Follow policy changes, government actions, compliance updates.
  • Business development: Discover partnerships, funding rounds, market entries.
  • Competitive analysis: Monitor competitor activities and announcements.
  • Research automation: Extract structured data for academic or business research.
  • News aggregation: Build topic-specific news applications with structured output.

Beta limitations

These features are planned for implementation after the beta period:
  • Formal error handling and failed status
  • Error response objects with detailed failure information
  • Maximum job duration enforcement
  • Result expiration and cleanup
  • Query deduplication (submitting the same query creates separate jobs)
  • Pagination for large result sets

Get started

  1. Book a demo and get your API key.
  2. Follow the Quickstart guide to make your first request.
  3. Review Understanding dynamic schemas to learn how to handle variable response structures.
  4. Explore the API Reference for detailed endpoint documentation.
For technical support, contact us at support@newscatcherapi.com.