Skip to main content
Build autonomous web search agents and research assistants that can find, analyze, and synthesize information from millions of web pages using natural language.

Before you start

  • Python 3.12 or later
  • CatchAll API key from platform.newscatcherapi.com
  • LLM provider credentials (OpenAI, Anthropic, etc.) for agent features

Installation

pip install langchain-catchall langchain-core

Quickstart

import os
from langchain_catchall import CatchAllClient

client = CatchAllClient(api_key=os.environ["CATCHALL_API_KEY"])
result = client.search("Semiconductor company earnings announcements")

print(f"Found {result.valid_records} records")
for record in result.all_records[:3]:
    print(f"- {record.record_title}")
The search() method handles submission, polling, and retrieval automatically. Jobs typically complete in 10-15 minutes.

CatchAllClient

CatchAllClient wraps the CatchAll Python SDK with LangChain-friendly patterns: automatic polling, high-level search method, and pagination handling. Available as both sync (CatchAllClient) and async (AsyncCatchAllClient).

Basic usage

import os
from langchain_catchall import CatchAllClient

client = CatchAllClient(
    api_key=os.environ["CATCHALL_API_KEY"],
    poll_interval=30,      # Status check interval (default: 30s)
    max_wait_time=2400,    # Timeout (default: 40 min)
)

# Submit and wait for results
result = client.search("AI company acquisitions and mergers")

# Control result limit
result = client.search("Technology funding rounds", limit=50)

# Submit without waiting
result = client.search("FDA drug approvals", wait=False)
job_id = result.job_id  # Retrieve later

Jobs: Granular control

For data pipelines or complex workflows:
# Submit job
job_id = client.submit_job("Technology company IPO filings")

# Check status
status = client.get_status(job_id)

# Wait for completion
client.wait_for_completion(job_id)

# Retrieve results
result = client.get_all_results(job_id)

Monitors

Automate recurring searches with scheduled execution. Monitor methods in CatchAllClient mirror the CatchAll Python SDK interface directly.
import os
from langchain_catchall import CatchAllClient

client = CatchAllClient(api_key=os.environ["CATCHALL_API_KEY"])

# Create reference job
result = client.search("Technology company acquisitions")

# Create daily monitor
monitor = client.create_monitor(
    reference_job_id=result.job_id,
    schedule="0 9 * * *",  # Daily at 9 AM UTC
    webhook={
        "url": "https://your-domain.com/webhook",
        "headers": {"Authorization": "Bearer your-token"}
    }
)

# Manage monitors
monitors = client.list_monitors()
client.enable_monitor(monitor_id)
client.disable_monitor(monitor_id)

# Retrieve results
results = client.pull_monitor_results(monitor_id)
jobs = client.list_monitor_jobs(monitor_id, sort="desc")
For detailed monitor configuration, scheduling syntax, and webhook setup, see the Monitors guide.

Advanced: Cost optimization

Search once, analyze many times without additional API cost:
import os
from langchain_catchall import CatchAllClient, query_with_llm
from langchain_openai import ChatOpenAI

client = CatchAllClient(api_key=os.environ["CATCHALL_API_KEY"])
llm = ChatOpenAI(model="gpt-4o", api_key=os.environ["OPENAI_API_KEY"])

# Search once (10-15 minutes, costs API credits)
result = client.search("Enterprise software company earnings reports")

# Analyze many times (instant, no additional cost)
answer1 = query_with_llm(result, "Which companies reported highest revenue?", llm)
answer2 = query_with_llm(result, "Compare year-over-year growth rates", llm)
answer3 = query_with_llm(result, "What are the key trends?", llm)

CatchAllTools

Ready-to-use toolkit for LangGraph agents with built-in caching. The killer feature: search once, then analyze unlimited times for free (LLM costs only).
ToolDurationCostDescription
catchall_search_data10-15 minAPI creditsInitialize new search
catchall_analyze_dataInstantLLM costs onlyQuery cached results

Setup

Install the OpenAI integration package if you haven’t already:
pip install langchain-openai
import os
from langchain_openai import ChatOpenAI
from langchain_catchall import CatchAllTools

toolkit = CatchAllTools(
    api_key=os.environ["CATCHALL_API_KEY"],
    llm=ChatOpenAI(model="gpt-4o"),
    limit=100,              # Default result limit
    verbose=True,           # Show progress
    initialize_query=True,  # Auto-suggest validators/enrichments
)

tools = toolkit.get_tools()

Create agent

import os
from langchain_openai import ChatOpenAI
from langchain.agents import create_agent
from langchain.messages import HumanMessage
from langchain_catchall import CatchAllTools, CATCHALL_AGENT_PROMPT

llm = ChatOpenAI(model="gpt-4o", api_key=os.environ["OPENAI_API_KEY"])
toolkit = CatchAllTools(api_key=os.environ["CATCHALL_API_KEY"], llm=llm)

agent = create_agent(
    model=llm,
    tools=toolkit.get_tools(),
    system_prompt=CATCHALL_AGENT_PROMPT
)

messages = [HumanMessage(content="Find technology company acquisitions this week")]
response = agent.invoke({"messages": messages})
print(response["messages"][-1].content)

Agent prompt

The CATCHALL_AGENT_PROMPT teaches the agent cost-effective operation by implementing a two-phase workflow:
  1. Search phase: Uses catchall_search_data to gather initial dataset, then stop and wait for user input.
  2. Analysis phase: Uses catchall_analyze_data for all follow-up questions (filtering, sorting, aggregation, Q&A).
This design prevents expensive repeated searches and maximizes the value of cached results. The prompt also handles limit parameters when users request specific result counts (e.g., “top 50”, “limit to 20”).

Conversational pattern

The power of caching: perform one expensive search, then ask unlimited follow-up questions for free (LLM costs only).
import os
from langchain_openai import ChatOpenAI
from langchain.agents import create_agent
from langchain.messages import HumanMessage
from langchain_catchall import CatchAllTools, CATCHALL_AGENT_PROMPT

llm = ChatOpenAI(model="gpt-4o", api_key=os.environ["OPENAI_API_KEY"])
toolkit = CatchAllTools(api_key=os.environ["CATCHALL_API_KEY"], llm=llm)
agent = create_agent(model=llm, tools=toolkit.get_tools(), system_prompt=CATCHALL_AGENT_PROMPT)

# Initial search (10-15 min, uses API credits)
messages = [HumanMessage(content="Find corporate headquarters relocations in the US")]
response = agent.invoke({"messages": messages})
messages = response["messages"]

# Filter cached data (instant, LLM costs)
messages.append(HumanMessage(content="Show only California locations"))
response = agent.invoke({"messages": messages})
messages = response["messages"]

# Analyze cached data (instant, LLM costs)
messages.append(HumanMessage(content="What are the top 3 cities?"))
response = agent.invoke({"messages": messages})
print(response["messages"][-1].content)

Error handling

import os
from langchain_catchall import CatchAllClient

client = CatchAllClient(
    api_key=os.environ["CATCHALL_API_KEY"],
    max_wait_time=2400
)

try:
    result = client.search("Venture capital funding rounds")
    print(f"Success: {result.valid_records} records")
except TimeoutError as e:
    print(f"Search timed out: {e}")
    # Retry with narrower query or higher limit
except Exception as e:
    print(f"Error: {e}")
    raise

Resources