Skip to main content
CatchAll generates response schemas dynamically—field names change between jobs, even with identical inputs. This guide shows how to build integrations that handle this variability.
Only 3 fields are guaranteed:
  • record_id
  • record_title
  • citations array
All other fields in enrichment vary between jobs.

Why schemas vary

Submit the same query twice, get different field names:
{
  "record_id": "5262823697790152939",
  "record_title": "Oracle Q1 2026 Earnings Exceed Expectations",
  "enrichment": {
    "company_name": "Oracle",
    "revenue": "$14.9 billion",
    "profit_margin": "42%"
  }
}
Why this happens:
  • LLMs generate extractors dynamically for each job
  • Different keywords, validators, and extractors are created
  • Field names are chosen semantically to match content
This is expected behavior, not a bug.

Integration strategies

Store raw data

Preserve the entire enrichment object as JSON:
import json

db.insert({
    "record_id": record["record_id"],
    "title": record["record_title"],
    "raw_data": json.dumps(record["enrichment"]),
    "citations": json.dumps(record["citations"])
})
Use when: You need to preserve all data without loss. Query dynamically later.

Map to canonical fields

Translate variable field names to your fixed schema using pattern matching:
FIELD_PATTERNS = {
    "revenue": ["revenue", "sales", "total_revenue"],
    "profit": ["profit", "margin", "income", "earnings"],
    "quarter": ["quarter", "q", "period"],
}

def normalize_record(record):
    normalized = {"title": record["record_title"]}

    for canonical, patterns in FIELD_PATTERNS.items():
        for key, value in record["enrichment"].items():
            if any(p in key.lower() for p in patterns):
                normalized[canonical] = value
                break

    return normalized
Use when: You have a fixed database schema and need consistent field names.

Process dynamically

Handle all fields without assumptions:
print(f"Title: {record['record_title']}\n")

for key, value in record["enrichment"].items():
    display_name = key.replace("_", " ").title()
    print(f"{display_name}: {value}")
Use when: Your application can display fields without fixed structure (dashboards, search results).

Common patterns

Find fields by pattern

def find_field(enrichment, patterns):
    """Find first field matching any pattern."""
    for key, value in enrichment.items():
        if any(pattern in key.lower() for pattern in patterns):
            return value
    return None

# Usage
revenue = find_field(record["enrichment"], ["revenue", "sales"])
profit = find_field(record["enrichment"], ["profit", "margin", "income"])

Handle multiple possible names

# Check variations with fallback
revenue = (
    record["enrichment"].get("revenue") or
    record["enrichment"].get("total_revenue") or
    record["enrichment"].get("sales") or
    "N/A"
)

Parse different formats

Revenue can appear as "$14.9 billion", "$14.9B", "14900000000", or 14.9:
import re

revenue_str = record["enrichment"].get("revenue", "0")
numbers = re.findall(r'[\d.]+', revenue_str)
revenue_value = float(numbers[0]) if numbers else 0.0

What not to do

Don’t hardcode field names

# ❌ Breaks when field name differs
revenue = record["enrichment"]["revenue"]  # KeyError
# ✅ Handle variations
revenue = record["enrichment"].get("revenue", "N/A")

Don’t validate specific fields

# ❌ Breaks when schema changes
required_fields = ["company_name", "revenue", "quarter"]
for field in required_fields:
    assert field in record["enrichment"]
# ✅ Only check guaranteed fields
assert "record_id" in record
assert "record_title" in record
assert "citations" in record

Don’t assume consistent fields across records

# ❌ Assumes all records have same fields
df = pd.DataFrame([
    {"company": r["enrichment"]["company_name"]}
    for r in results["all_records"]
])
# ✅ Handle variable fields
data = []
for record in results["all_records"]:
    row = {"title": record["record_title"]}
    row.update(record["enrichment"])  # Add all fields dynamically
    data.append(row)

df = pd.DataFrame(data)

Test your integration

Submit the same query multiple times to see schema variations:
import requests
import time

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://catchall.newscatcherapi.com"
HEADERS = {"x-api-key": API_KEY}

# Submit same query 3 times
job_ids = []
for i in range(3):
    response = requests.post(
        f"{BASE_URL}/catchAll/submit",
        headers=HEADERS,
        json={"query": "Tech earnings Q3"}
    )
    job_ids.append(response.json()["job_id"])
    print(f"Created job {i+1}: {job_ids[-1]}")

# Wait for jobs to complete
print("\nWaiting for jobs to complete...")
time.sleep(900)  # Wait 15 minutes

# Compare schemas
print("\nComparing schemas across jobs:\n")
for idx, job_id in enumerate(job_ids, 1):
    response = requests.get(
        f"{BASE_URL}/catchAll/pull/{job_id}",
        headers=HEADERS
    )
    results = response.json()

    if results.get("all_records"):
        first_record = results["all_records"][0]["enrichment"]
        field_names = list(first_record.keys())
        print(f"Job {idx} fields: {field_names}")
    else:
        print(f"Job {idx}: No records yet")
Verify your code:
  • Doesn’t crash when field names differ
  • Extracts data from all variations
  • Handles missing fields gracefully
  • Works with both empty and populated enrichment objects

Using schema parameter

The schema parameter can influence field naming but doesn’t guarantee specific names:
{
  "query": "Tech earnings",
  "schema": "[COMPANY] earned [REVENUE] in [QUARTER]"
}
What you get:
  • schema_based_summary field is added to enrichment
  • Field names may align with placeholders (company, revenue, quarter)
  • Specific field names are not guaranteed

See also