CatchAll generates response schemas dynamically with LLMs. Each job creates a
new schema, even with identical inputs. This guide explains this
non-deterministic behavior and shows you how to build integrations that handle
it.
Each job generates new schema
When you submit a query, the LLM analyzes your inputs (query
, context
,
summary_template
) and generates extractors that define the output schema.
Because LLMs are non-deterministic, the same inputs can produce different
schemas on different runs.
Expect the following variations:
- Field names vary between jobs (even with identical inputs)
- The number of fields can differ
- Field structures may change (for example, combined fields versus separate
fields)
- Results can vary significantly
This is expected behavior, not a bug.
This example shows how identical inputs produce different schemas.
Query: “Tech company earnings this quarter”
Job 1 produced:
{
"record_title": "Oracle Q1 2026 Earnings",
"company_name": "Oracle",
"quarter_identifier": "Q1 2026",
"revenue": "$14.9 billion",
"revenue_change": "up 12%",
"profit_margin": "42%"
}
Job 2 produced:
{
"record_title": "Oracle Q1 2026 Results",
"company": "Oracle",
"quarter": "Q1",
"fiscal_year": "2026",
"total_revenue": "$14.9B",
"yoy_growth": "+12%",
"operating_margin": "42%"
}
The field names differ:
company_name
versus company
quarter_identifier
versus quarter
+ fiscal_year
revenue
versus total_revenue
revenue_change
versus yoy_growth
profit_margin
versus operating_margin
Both schemas are valid. The LLM-generated different extractors from the same
inputs.
Fields that are always present
The following fields appear in every record:
record_id
enrichment.record_title
citations
array
If you provide a summary_template
, the enrichment object also includes:
All other fields in the enrichment
object are generated dynamically for each
job.
Handle variable schemas
Don’t hardcode field names
# ❌ Breaks when field names change
revenue = record["enrichment"]["revenue"]
Check multiple possible names
# ✅ Handles variations
revenue = (
record["enrichment"].get("revenue") or
record["enrichment"].get("total_revenue") or
record["enrichment"].get("quarterly_revenue") or
"N/A"
)
Process all fields dynamically
# ✅ Works with any schema
enrichment = record["enrichment"]
print(f"Title: {enrichment['record_title']}\n")
for key, value in enrichment.items():
if key != "record_title":
display_name = key.replace("_", " ").title()
print(f"{display_name}: {value}")
Use pattern matching for key fields
# ✅ Find fields by content, not exact name
def find_field(enrichment, patterns):
"""Find first field matching any pattern."""
for key, value in enrichment.items():
if any(pattern in key.lower() for pattern in patterns):
return value
return None
# Usage
revenue = find_field(record["enrichment"], ["revenue", "sales"])
profit = find_field(record["enrichment"], ["profit", "margin", "income"])
Integration strategies
Strategy 1: Store raw data
Store the entire enrichment
object as JSON. This preserves all fields
regardless of schema:
import json
db.insert({
"record_id": record["record_id"],
"title": record["enrichment"]["record_title"],
"raw_data": json.dumps(record["enrichment"]),
"citations": json.dumps(record["citations"])
})
Strategy 2: Map to canonical fields
Create a mapping layer between variable field names and your application’s
canonical names:
FIELD_PATTERNS = {
"revenue": ["revenue", "sales", "total_revenue"],
"profit": ["profit", "margin", "income", "earnings"],
"quarter": ["quarter", "q", "period"],
}
def normalize_record(enrichment):
"""Extract canonical fields from any schema."""
normalized = {"title": enrichment["record_title"]}
for canonical_name, patterns in FIELD_PATTERNS.items():
for key, value in enrichment.items():
if any(p in key.lower() for p in patterns):
normalized[canonical_name] = value
break
return normalized
Strategy 3: Use flexible validation
Validate structure, not specific fields:
def is_valid_record(record):
"""Validate record has required structure."""
# Check structure
if "enrichment" not in record or "citations" not in record:
return False
# Check guaranteed fields only
if "record_title" not in record["enrichment"]:
return False
# All other fields are optional
return True
Use summary_template as guidance
The summary_template
parameter can influence field naming:
{
"query": "Tech earnings",
"summary_template": "[COMPANY] earned [REVENUE] in [QUARTER]"
}
This adds a template_based_summary
field and may guide the LLM toward similar
field names (for example, company
, revenue
, quarter
), but doesn’t
guarantee them.
Expect the following:
template_based_summary
is always added
- Field names may align with your placeholders
- Specific field names are not guaranteed
Test for schema variations
To understand how schemas vary, submit the same query multiple times:
import time
import requests
# Submit same query 3 times
job_ids = []
for i in range(3):
response = requests.post(url, json={"query": "Tech earnings Q3"})
job_ids.append(response.json()["job_id"])
time.sleep(1)
# Wait for completion and compare schemas
for job_id in job_ids:
# Poll until complete
results = get_results(job_id)
# Print first record's fields
first_record = results["records"][0]["enrichment"]
print(f"\nJob {job_id} fields:")
print(list(first_record.keys()))
This reveals the range of schema variations you need to handle.
Avoid common mistakes
Don’t expect consistency
# ❌ Assumes all records have same fields
df = pd.DataFrame([
{"company": r["enrichment"]["company_name"]}
for r in records
])
# ✅ Handles variable fields
data = []
for record in records:
row = {"title": record["enrichment"]["record_title"]}
# Add all other fields dynamically
row.update(record["enrichment"])
data.append(row)
df = pd.DataFrame(data)
Don’t use rigid validation
# ❌ Breaks on schema changes
required_fields = ["company_name", "revenue", "quarter"]
for field in required_fields:
assert field in enrichment
# ✅ Flexible validation
assert "record_title" in enrichment # Only check guaranteed fields
Don’t assume field types
# ❌ Field format can vary
revenue_float = float(enrichment["revenue"].replace("$", ""))
# ✅ Handle different formats
import re
revenue_str = enrichment.get("revenue", "0")
# Extract numbers from "$14.9B", "14.9 billion", "14900000000", etc.
numbers = re.findall(r'[\d.]+', revenue_str)
revenue_value = float(numbers[0]) if numbers else 0.0
Best practices
To work effectively with CatchAll’s LLM-generated schemas:
- Store raw JSON: Always preserve the original
enrichment
object.
- Use pattern matching: Match fields by content, not exact names.
- Build mapping layers: Translate variable schemas to your canonical model.
- Test with multiple runs: Submit identical queries to see variations.
- Document variability: Inform users that schemas change between jobs.
- Handle missing fields gracefully: Use
.get()
with defaults in Python or
optional chaining in TypeScript.
The non-deterministic behavior lets CatchAll handle any query type without
predefined schemas. Build flexible integrations to leverage this capability
while maintaining robust code.