Understand dynamic schemas

CatchAll generates response schemas dynamically with LLMs. Each job creates a new schema, even with identical inputs. This guide explains this non-deterministic behavior and shows you how to build integrations that handle it.

Each job generates new schema

When you submit a query, the LLM analyzes your inputs (query, context, summary_template) and generates extractors that define the output schema. Because LLMs are non-deterministic, the same inputs can produce different schemas on different runs. Expect the following variations:

Field names vary between jobs (even with identical inputs)
The number of fields can differ
Field structures may change (for example, combined fields versus separate fields)
Results can vary significantly

This is expected behavior, not a bug.

Example: Same inputs, different schemas

This example shows how identical inputs produce different schemas. Query: “Tech company earnings this quarter” Job 1 produced:

{
  "record_title": "Oracle Q1 2026 Earnings",
  "company_name": "Oracle",
  "quarter_identifier": "Q1 2026",
  "revenue": "$14.9 billion",
  "revenue_change": "up 12%",
  "profit_margin": "42%"
}

Job 2 produced:

{
  "record_title": "Oracle Q1 2026 Results",
  "company": "Oracle",
  "quarter": "Q1",
  "fiscal_year": "2026",
  "total_revenue": "$14.9B",
  "yoy_growth": "+12%",
  "operating_margin": "42%"
}

The field names differ:

company_name versus company
quarter_identifier versus quarter + fiscal_year
revenue versus total_revenue
revenue_change versus yoy_growth
profit_margin versus operating_margin

Both schemas are valid. The LLM-generated different extractors from the same inputs.

Fields that are always present

The following fields appear in every record:

record_id
enrichment.record_title
citations array

If you provide a summary_template, the enrichment object also includes:

template_based_summary

All other fields in the enrichment object are generated dynamically for each job.

Handle variable schemas

Don’t hardcode field names

# ❌ Breaks when field names change
revenue = record["enrichment"]["revenue"]

Check multiple possible names

# ✅ Handles variations
revenue = (
    record["enrichment"].get("revenue") or
    record["enrichment"].get("total_revenue") or
    record["enrichment"].get("quarterly_revenue") or
    "N/A"
)

Process all fields dynamically

# ✅ Works with any schema
enrichment = record["enrichment"]

print(f"Title: {enrichment['record_title']}\n")

for key, value in enrichment.items():
    if key != "record_title":
        display_name = key.replace("_", " ").title()
        print(f"{display_name}: {value}")

Use pattern matching for key fields

# ✅ Find fields by content, not exact name
def find_field(enrichment, patterns):
    """Find first field matching any pattern."""
    for key, value in enrichment.items():
        if any(pattern in key.lower() for pattern in patterns):
            return value
    return None

# Usage
revenue = find_field(record["enrichment"], ["revenue", "sales"])
profit = find_field(record["enrichment"], ["profit", "margin", "income"])

Integration strategies

Strategy 1: Store raw data

Store the entire enrichment object as JSON. This preserves all fields regardless of schema:

import json

db.insert({
    "record_id": record["record_id"],
    "title": record["enrichment"]["record_title"],
    "raw_data": json.dumps(record["enrichment"]),
    "citations": json.dumps(record["citations"])
})

Strategy 2: Map to canonical fields

Create a mapping layer between variable field names and your application’s canonical names:

FIELD_PATTERNS = {
    "revenue": ["revenue", "sales", "total_revenue"],
    "profit": ["profit", "margin", "income", "earnings"],
    "quarter": ["quarter", "q", "period"],
}

def normalize_record(enrichment):
    """Extract canonical fields from any schema."""
    normalized = {"title": enrichment["record_title"]}

    for canonical_name, patterns in FIELD_PATTERNS.items():
        for key, value in enrichment.items():
            if any(p in key.lower() for p in patterns):
                normalized[canonical_name] = value
                break

    return normalized

Strategy 3: Use flexible validation

Validate structure, not specific fields:

def is_valid_record(record):
    """Validate record has required structure."""
    # Check structure
    if "enrichment" not in record or "citations" not in record:
        return False

    # Check guaranteed fields only
    if "record_title" not in record["enrichment"]:
        return False

    # All other fields are optional
    return True

Use summary_template as guidance

The summary_template parameter can influence field naming:

{
  "query": "Tech earnings",
  "summary_template": "[COMPANY] earned [REVENUE] in [QUARTER]"
}

This adds a template_based_summary field and may guide the LLM toward similar field names (for example, company, revenue, quarter), but doesn’t guarantee them. Expect the following:

template_based_summary is always added
Field names may align with your placeholders
Specific field names are not guaranteed

Test for schema variations

To understand how schemas vary, submit the same query multiple times:

import time
import requests

# Submit same query 3 times
job_ids = []
for i in range(3):
    response = requests.post(url, json={"query": "Tech earnings Q3"})
    job_ids.append(response.json()["job_id"])
    time.sleep(1)

# Wait for completion and compare schemas
for job_id in job_ids:
    # Poll until complete
    results = get_results(job_id)

    # Print first record's fields
    first_record = results["records"][0]["enrichment"]
    print(f"\nJob {job_id} fields:")
    print(list(first_record.keys()))

This reveals the range of schema variations you need to handle.

Avoid common mistakes

Don’t expect consistency

# ❌ Assumes all records have same fields
df = pd.DataFrame([
    {"company": r["enrichment"]["company_name"]}
    for r in records
])

# ✅ Handles variable fields
data = []
for record in records:
    row = {"title": record["enrichment"]["record_title"]}
    # Add all other fields dynamically
    row.update(record["enrichment"])
    data.append(row)

df = pd.DataFrame(data)

Don’t use rigid validation

# ❌ Breaks on schema changes
required_fields = ["company_name", "revenue", "quarter"]
for field in required_fields:
    assert field in enrichment

# ✅ Flexible validation
assert "record_title" in enrichment  # Only check guaranteed fields

Don’t assume field types

# ❌ Field format can vary
revenue_float = float(enrichment["revenue"].replace("$", ""))

# ✅ Handle different formats
import re

revenue_str = enrichment.get("revenue", "0")
# Extract numbers from "$14.9B", "14.9 billion", "14900000000", etc.
numbers = re.findall(r'[\d.]+', revenue_str)
revenue_value = float(numbers[0]) if numbers else 0.0

Best practices

To work effectively with CatchAll’s LLM-generated schemas:

Store raw JSON: Always preserve the original enrichment object.
Use pattern matching: Match fields by content, not exact names.
Build mapping layers: Translate variable schemas to your canonical model.
Test with multiple runs: Submit identical queries to see variations.
Document variability: Inform users that schemas change between jobs.
Handle missing fields gracefully: Use .get() with defaults in Python or optional chaining in TypeScript.

The non-deterministic behavior lets CatchAll handle any query type without predefined schemas. Build flexible integrations to leverage this capability while maintaining robust code.

Overview

How to

Endpoints

Understand dynamic schemas

Each job generates new schema

Example: Same inputs, different schemas

Fields that are always present

Handle variable schemas

Don’t hardcode field names

Check multiple possible names

Process all fields dynamically

Use pattern matching for key fields

Integration strategies

Strategy 1: Store raw data

Strategy 2: Map to canonical fields

Strategy 3: Use flexible validation

Use summary_template as guidance

Test for schema variations

Avoid common mistakes

Don’t expect consistency

Don’t use rigid validation

Don’t assume field types

Best practices

Overview

How to

Endpoints

​Each job generates new schema

​Example: Same inputs, different schemas

​Fields that are always present

​Handle variable schemas

​Don’t hardcode field names

​Check multiple possible names

​Process all fields dynamically

​Use pattern matching for key fields

​Integration strategies

​Strategy 1: Store raw data

​Strategy 2: Map to canonical fields

​Strategy 3: Use flexible validation

​Use summary_template as guidance

​Test for schema variations

​Avoid common mistakes

​Don’t expect consistency

​Don’t use rigid validation

​Don’t assume field types

​Best practices

Each job generates new schema

Example: Same inputs, different schemas

Fields that are always present

Handle variable schemas

Don’t hardcode field names

Check multiple possible names

Process all fields dynamically

Use pattern matching for key fields

Integration strategies

Strategy 1: Store raw data

Strategy 2: Map to canonical fields

Strategy 3: Use flexible validation

Use summary_template as guidance

Test for schema variations

Avoid common mistakes

Don’t expect consistency

Don’t use rigid validation

Don’t assume field types

Best practices