Skip to main content
CatchAll generates response schemas dynamically with LLMs. Each job creates a new schema, even with identical inputs. This guide explains this non-deterministic behavior and shows you how to build integrations that handle it.

Each job generates new schema

When you submit a query, the LLM analyzes your inputs (query, context, summary_template) and generates extractors that define the output schema. Because LLMs are non-deterministic, the same inputs can produce different schemas on different runs. Expect the following variations:
  • Field names vary between jobs (even with identical inputs)
  • The number of fields can differ
  • Field structures may change (for example, combined fields versus separate fields)
  • Results can vary significantly
This is expected behavior, not a bug.

Example: Same inputs, different schemas

This example shows how identical inputs produce different schemas. Query: “Tech company earnings this quarter” Job 1 produced:
{
  "record_title": "Oracle Q1 2026 Earnings",
  "company_name": "Oracle",
  "quarter_identifier": "Q1 2026",
  "revenue": "$14.9 billion",
  "revenue_change": "up 12%",
  "profit_margin": "42%"
}
Job 2 produced:
{
  "record_title": "Oracle Q1 2026 Results",
  "company": "Oracle",
  "quarter": "Q1",
  "fiscal_year": "2026",
  "total_revenue": "$14.9B",
  "yoy_growth": "+12%",
  "operating_margin": "42%"
}
The field names differ:
  • company_name versus company
  • quarter_identifier versus quarter + fiscal_year
  • revenue versus total_revenue
  • revenue_change versus yoy_growth
  • profit_margin versus operating_margin
Both schemas are valid. The LLM-generated different extractors from the same inputs.

Fields that are always present

The following fields appear in every record:
  • record_id
  • enrichment.record_title
  • citations array
If you provide a summary_template, the enrichment object also includes:
  • template_based_summary
All other fields in the enrichment object are generated dynamically for each job.

Handle variable schemas

Don’t hardcode field names

# ❌ Breaks when field names change
revenue = record["enrichment"]["revenue"]

Check multiple possible names

# ✅ Handles variations
revenue = (
    record["enrichment"].get("revenue") or
    record["enrichment"].get("total_revenue") or
    record["enrichment"].get("quarterly_revenue") or
    "N/A"
)

Process all fields dynamically

# ✅ Works with any schema
enrichment = record["enrichment"]

print(f"Title: {enrichment['record_title']}\n")

for key, value in enrichment.items():
    if key != "record_title":
        display_name = key.replace("_", " ").title()
        print(f"{display_name}: {value}")

Use pattern matching for key fields

# ✅ Find fields by content, not exact name
def find_field(enrichment, patterns):
    """Find first field matching any pattern."""
    for key, value in enrichment.items():
        if any(pattern in key.lower() for pattern in patterns):
            return value
    return None

# Usage
revenue = find_field(record["enrichment"], ["revenue", "sales"])
profit = find_field(record["enrichment"], ["profit", "margin", "income"])

Integration strategies

Strategy 1: Store raw data

Store the entire enrichment object as JSON. This preserves all fields regardless of schema:
import json

db.insert({
    "record_id": record["record_id"],
    "title": record["enrichment"]["record_title"],
    "raw_data": json.dumps(record["enrichment"]),
    "citations": json.dumps(record["citations"])
})

Strategy 2: Map to canonical fields

Create a mapping layer between variable field names and your application’s canonical names:
FIELD_PATTERNS = {
    "revenue": ["revenue", "sales", "total_revenue"],
    "profit": ["profit", "margin", "income", "earnings"],
    "quarter": ["quarter", "q", "period"],
}

def normalize_record(enrichment):
    """Extract canonical fields from any schema."""
    normalized = {"title": enrichment["record_title"]}

    for canonical_name, patterns in FIELD_PATTERNS.items():
        for key, value in enrichment.items():
            if any(p in key.lower() for p in patterns):
                normalized[canonical_name] = value
                break

    return normalized

Strategy 3: Use flexible validation

Validate structure, not specific fields:
def is_valid_record(record):
    """Validate record has required structure."""
    # Check structure
    if "enrichment" not in record or "citations" not in record:
        return False

    # Check guaranteed fields only
    if "record_title" not in record["enrichment"]:
        return False

    # All other fields are optional
    return True

Use summary_template as guidance

The summary_template parameter can influence field naming:
{
  "query": "Tech earnings",
  "summary_template": "[COMPANY] earned [REVENUE] in [QUARTER]"
}
This adds a template_based_summary field and may guide the LLM toward similar field names (for example, company, revenue, quarter), but doesn’t guarantee them. Expect the following:
  • template_based_summary is always added
  • Field names may align with your placeholders
  • Specific field names are not guaranteed

Test for schema variations

To understand how schemas vary, submit the same query multiple times:
import time
import requests

# Submit same query 3 times
job_ids = []
for i in range(3):
    response = requests.post(url, json={"query": "Tech earnings Q3"})
    job_ids.append(response.json()["job_id"])
    time.sleep(1)

# Wait for completion and compare schemas
for job_id in job_ids:
    # Poll until complete
    results = get_results(job_id)

    # Print first record's fields
    first_record = results["records"][0]["enrichment"]
    print(f"\nJob {job_id} fields:")
    print(list(first_record.keys()))
This reveals the range of schema variations you need to handle.

Avoid common mistakes

Don’t expect consistency

# ❌ Assumes all records have same fields
df = pd.DataFrame([
    {"company": r["enrichment"]["company_name"]}
    for r in records
])
# ✅ Handles variable fields
data = []
for record in records:
    row = {"title": record["enrichment"]["record_title"]}
    # Add all other fields dynamically
    row.update(record["enrichment"])
    data.append(row)

df = pd.DataFrame(data)

Don’t use rigid validation

# ❌ Breaks on schema changes
required_fields = ["company_name", "revenue", "quarter"]
for field in required_fields:
    assert field in enrichment
# ✅ Flexible validation
assert "record_title" in enrichment  # Only check guaranteed fields

Don’t assume field types

# ❌ Field format can vary
revenue_float = float(enrichment["revenue"].replace("$", ""))
# ✅ Handle different formats
import re

revenue_str = enrichment.get("revenue", "0")
# Extract numbers from "$14.9B", "14.9 billion", "14900000000", etc.
numbers = re.findall(r'[\d.]+', revenue_str)
revenue_value = float(numbers[0]) if numbers else 0.0

Best practices

To work effectively with CatchAll’s LLM-generated schemas:
  1. Store raw JSON: Always preserve the original enrichment object.
  2. Use pattern matching: Match fields by content, not exact names.
  3. Build mapping layers: Translate variable schemas to your canonical model.
  4. Test with multiple runs: Submit identical queries to see variations.
  5. Document variability: Inform users that schemas change between jobs.
  6. Handle missing fields gracefully: Use .get() with defaults in Python or optional chaining in TypeScript.
The non-deterministic behavior lets CatchAll handle any query type without predefined schemas. Build flexible integrations to leverage this capability while maintaining robust code.