Build autonomous web search agents and research assistants that can find,
analyze, and synthesize information from millions of web pages using natural
language.
Before you start
- Python 3.12 or later
- CatchAll API key from
platform.newscatcherapi.com
- LLM provider credentials (OpenAI, Anthropic, etc.) for agent features
Installation
PyPI
GitHub (development)
pip install langchain-catchall langchain-core
git clone https://github.com/NewscatcherAPI/langchain-catchall.git
cd langchain-catchall
pip install -e .
pip install langchain-core
Quickstart
import os
from langchain_catchall import CatchAllClient
client = CatchAllClient(api_key=os.environ["CATCHALL_API_KEY"])
result = client.search("Semiconductor company earnings announcements")
print(f"Found {result.valid_records} records")
for record in result.all_records[:3]:
print(f"- {record.record_title}")
The search() method handles submission, polling, and retrieval
automatically. Jobs typically complete in 10-15 minutes.
CatchAllClient
CatchAllClient wraps the CatchAll Python SDK
with LangChain-friendly patterns: automatic polling, high-level search method,
and pagination handling.
Available as both sync (CatchAllClient) and async (AsyncCatchAllClient).
Basic usage
import os
from langchain_catchall import CatchAllClient
client = CatchAllClient(
api_key=os.environ["CATCHALL_API_KEY"],
poll_interval=30, # Status check interval (default: 30s)
max_wait_time=2400, # Timeout (default: 40 min)
)
# Submit and wait for results
result = client.search("AI company acquisitions and mergers")
# Control result limit
result = client.search("Technology funding rounds", limit=50)
# Submit without waiting
result = client.search("FDA drug approvals", wait=False)
job_id = result.job_id # Retrieve later
Jobs: Granular control
For data pipelines or complex workflows:
# Submit job
job_id = client.submit_job("Technology company IPO filings")
# Check status
status = client.get_status(job_id)
# Wait for completion
client.wait_for_completion(job_id)
# Retrieve results
result = client.get_all_results(job_id)
Monitors
Automate recurring searches with scheduled execution. Monitor methods in CatchAllClient
mirror the CatchAll Python SDK interface directly.
import os
from langchain_catchall import CatchAllClient
client = CatchAllClient(api_key=os.environ["CATCHALL_API_KEY"])
# Create reference job
result = client.search("Technology company acquisitions")
# Create daily monitor
monitor = client.create_monitor(
reference_job_id=result.job_id,
schedule="0 9 * * *", # Daily at 9 AM UTC
webhook={
"url": "https://your-domain.com/webhook",
"headers": {"Authorization": "Bearer your-token"}
}
)
# Manage monitors
monitors = client.list_monitors()
client.enable_monitor(monitor_id)
client.disable_monitor(monitor_id)
# Retrieve results
results = client.pull_monitor_results(monitor_id)
jobs = client.list_monitor_jobs(monitor_id, sort="desc")
For detailed monitor configuration, scheduling syntax, and webhook setup, see
the Monitors guide.
Advanced: Cost optimization
Search once, analyze many times without additional API cost:
import os
from langchain_catchall import CatchAllClient, query_with_llm
from langchain_openai import ChatOpenAI
client = CatchAllClient(api_key=os.environ["CATCHALL_API_KEY"])
llm = ChatOpenAI(model="gpt-4o", api_key=os.environ["OPENAI_API_KEY"])
# Search once (10-15 minutes, costs API credits)
result = client.search("Enterprise software company earnings reports")
# Analyze many times (instant, no additional cost)
answer1 = query_with_llm(result, "Which companies reported highest revenue?", llm)
answer2 = query_with_llm(result, "Compare year-over-year growth rates", llm)
answer3 = query_with_llm(result, "What are the key trends?", llm)
Ready-to-use toolkit for LangGraph agents with built-in caching. The killer
feature: search once, then analyze unlimited times for free (LLM costs only).
| Tool | Duration | Cost | Description |
|---|
catchall_search_data | 10-15 min | API credits | Initialize new search |
catchall_analyze_data | Instant | LLM costs only | Query cached results |
Setup
Install the OpenAI integration package if you haven’t already:
pip install langchain-openai
import os
from langchain_openai import ChatOpenAI
from langchain_catchall import CatchAllTools
toolkit = CatchAllTools(
api_key=os.environ["CATCHALL_API_KEY"],
llm=ChatOpenAI(model="gpt-4o"),
limit=100, # Default result limit
verbose=True, # Show progress
initialize_query=True, # Auto-suggest validators/enrichments
)
tools = toolkit.get_tools()
Create agent
import os
from langchain_openai import ChatOpenAI
from langchain.agents import create_agent
from langchain.messages import HumanMessage
from langchain_catchall import CatchAllTools, CATCHALL_AGENT_PROMPT
llm = ChatOpenAI(model="gpt-4o", api_key=os.environ["OPENAI_API_KEY"])
toolkit = CatchAllTools(api_key=os.environ["CATCHALL_API_KEY"], llm=llm)
agent = create_agent(
model=llm,
tools=toolkit.get_tools(),
system_prompt=CATCHALL_AGENT_PROMPT
)
messages = [HumanMessage(content="Find technology company acquisitions this week")]
response = agent.invoke({"messages": messages})
print(response["messages"][-1].content)
Agent prompt
The
CATCHALL_AGENT_PROMPT
teaches the agent cost-effective operation by implementing a two-phase workflow:
- Search phase: Uses
catchall_search_data to gather initial dataset, then
stop and wait for user input.
- Analysis phase: Uses
catchall_analyze_data for all follow-up questions
(filtering, sorting, aggregation, Q&A).
This design prevents expensive repeated searches and maximizes the value of
cached results. The prompt also handles limit parameters when users request
specific result counts (e.g., “top 50”, “limit to 20”).
Conversational pattern
The power of caching: perform one expensive search, then ask unlimited follow-up
questions for free (LLM costs only).
import os
from langchain_openai import ChatOpenAI
from langchain.agents import create_agent
from langchain.messages import HumanMessage
from langchain_catchall import CatchAllTools, CATCHALL_AGENT_PROMPT
llm = ChatOpenAI(model="gpt-4o", api_key=os.environ["OPENAI_API_KEY"])
toolkit = CatchAllTools(api_key=os.environ["CATCHALL_API_KEY"], llm=llm)
agent = create_agent(model=llm, tools=toolkit.get_tools(), system_prompt=CATCHALL_AGENT_PROMPT)
# Initial search (10-15 min, uses API credits)
messages = [HumanMessage(content="Find corporate headquarters relocations in the US")]
response = agent.invoke({"messages": messages})
messages = response["messages"]
# Filter cached data (instant, LLM costs)
messages.append(HumanMessage(content="Show only California locations"))
response = agent.invoke({"messages": messages})
messages = response["messages"]
# Analyze cached data (instant, LLM costs)
messages.append(HumanMessage(content="What are the top 3 cities?"))
response = agent.invoke({"messages": messages})
print(response["messages"][-1].content)
Error handling
import os
from langchain_catchall import CatchAllClient
client = CatchAllClient(
api_key=os.environ["CATCHALL_API_KEY"],
max_wait_time=2400
)
try:
result = client.search("Venture capital funding rounds")
print(f"Success: {result.valid_records} records")
except TimeoutError as e:
print(f"Search timed out: {e}")
# Retry with narrower query or higher limit
except Exception as e:
print(f"Error: {e}")
raise
Resources