LangChain integration

Build autonomous web search agents and research assistants that can find, analyze, and synthesize information from millions of web pages using natural language.

Before you start

Python 3.12 or later
CatchAll API key from platform.newscatcherapi.com
LLM provider credentials (OpenAI, Anthropic, etc.) for agent features

Installation

PyPI
GitHub (development)

pip install langchain-catchall langchain-core

git clone https://github.com/NewscatcherAPI/langchain-catchall.git
cd langchain-catchall
pip install -e .
pip install langchain-core

Quickstart

import os
from langchain_catchall import CatchAllClient

client = CatchAllClient(api_key=os.environ["CATCHALL_API_KEY"])
result = client.search("Semiconductor company earnings announcements")

print(f"Found {result.valid_records} records")
for record in result.all_records[:3]:
    print(f"- {record.record_title}")

The search() method handles submission, polling, and retrieval automatically. Jobs typically complete in 10-15 minutes.

CatchAllClient

CatchAllClient wraps the CatchAll Python SDK with LangChain-friendly patterns: automatic polling, high-level search method, and pagination handling. Available as both sync (CatchAllClient) and async (AsyncCatchAllClient).

Basic usage

import os
from langchain_catchall import CatchAllClient

client = CatchAllClient(
    api_key=os.environ["CATCHALL_API_KEY"],
    poll_interval=30,      # Status check interval (default: 30s)
    max_wait_time=2400,    # Timeout (default: 40 min)
)

# Submit and wait for results
result = client.search("AI company acquisitions and mergers")

# Control result limit
result = client.search("Technology funding rounds", limit=50)

# Submit without waiting
result = client.search("FDA drug approvals", wait=False)
job_id = result.job_id  # Retrieve later

Jobs: Granular control

For data pipelines or complex workflows:

# Submit job
job_id = client.submit_job("Technology company IPO filings")

# Check status
status = client.get_status(job_id)

# Wait for completion
client.wait_for_completion(job_id)

# Retrieve results
result = client.get_all_results(job_id)

Monitors

Automate recurring searches with scheduled execution. Monitor methods in CatchAllClient mirror the CatchAll Python SDK interface directly.

import os
from langchain_catchall import CatchAllClient

client = CatchAllClient(api_key=os.environ["CATCHALL_API_KEY"])

# Create reference job
result = client.search("Technology company acquisitions")

# Create daily monitor
monitor = client.create_monitor(
    reference_job_id=result.job_id,
    schedule="0 9 * * *",  # Daily at 9 AM UTC
    webhook={
        "url": "https://your-domain.com/webhook",
        "headers": {"Authorization": "Bearer your-token"}
    }
)

# Manage monitors
monitors = client.list_monitors()
client.enable_monitor(monitor_id)
client.disable_monitor(monitor_id)

# Retrieve results
results = client.pull_monitor_results(monitor_id)
jobs = client.list_monitor_jobs(monitor_id, sort="desc")

For detailed monitor configuration, scheduling syntax, and webhook setup, see the Monitors guide.

Advanced: Cost optimization

Search once, analyze many times without additional API cost:

import os
from langchain_catchall import CatchAllClient, query_with_llm
from langchain_openai import ChatOpenAI

client = CatchAllClient(api_key=os.environ["CATCHALL_API_KEY"])
llm = ChatOpenAI(model="gpt-4o", api_key=os.environ["OPENAI_API_KEY"])

# Search once (10-15 minutes, costs API credits)
result = client.search("Enterprise software company earnings reports")

# Analyze many times (instant, no additional cost)
answer1 = query_with_llm(result, "Which companies reported highest revenue?", llm)
answer2 = query_with_llm(result, "Compare year-over-year growth rates", llm)
answer3 = query_with_llm(result, "What are the key trends?", llm)

CatchAllTools

Ready-to-use toolkit for LangGraph agents with built-in caching. The killer feature: search once, then analyze unlimited times for free (LLM costs only).

Tool	Duration	Cost	Description
`catchall_search_data`	10-15 min	API credits	Initialize new search
`catchall_analyze_data`	Instant	LLM costs only	Query cached results

Setup

Install the OpenAI integration package if you haven’t already:

pip install langchain-openai

import os
from langchain_openai import ChatOpenAI
from langchain_catchall import CatchAllTools

toolkit = CatchAllTools(
    api_key=os.environ["CATCHALL_API_KEY"],
    llm=ChatOpenAI(model="gpt-4o"),
    limit=100,              # Default result limit
    verbose=True,           # Show progress
    initialize_query=True,  # Auto-suggest validators/enrichments
)

tools = toolkit.get_tools()

Create agent

import os
from langchain_openai import ChatOpenAI
from langchain.agents import create_agent
from langchain.messages import HumanMessage
from langchain_catchall import CatchAllTools, CATCHALL_AGENT_PROMPT

llm = ChatOpenAI(model="gpt-4o", api_key=os.environ["OPENAI_API_KEY"])
toolkit = CatchAllTools(api_key=os.environ["CATCHALL_API_KEY"], llm=llm)

agent = create_agent(
    model=llm,
    tools=toolkit.get_tools(),
    system_prompt=CATCHALL_AGENT_PROMPT
)

messages = [HumanMessage(content="Find technology company acquisitions this week")]
response = agent.invoke({"messages": messages})
print(response["messages"][-1].content)

Agent prompt

The CATCHALL_AGENT_PROMPT teaches the agent cost-effective operation by implementing a two-phase workflow:

Search phase: Uses catchall_search_data to gather initial dataset, then stop and wait for user input.
Analysis phase: Uses catchall_analyze_data for all follow-up questions (filtering, sorting, aggregation, Q&A).

This design prevents expensive repeated searches and maximizes the value of cached results. The prompt also handles limit parameters when users request specific result counts (e.g., “top 50”, “limit to 20”).

Conversational pattern

The power of caching: perform one expensive search, then ask unlimited follow-up questions for free (LLM costs only).

import os
from langchain_openai import ChatOpenAI
from langchain.agents import create_agent
from langchain.messages import HumanMessage
from langchain_catchall import CatchAllTools, CATCHALL_AGENT_PROMPT

llm = ChatOpenAI(model="gpt-4o", api_key=os.environ["OPENAI_API_KEY"])
toolkit = CatchAllTools(api_key=os.environ["CATCHALL_API_KEY"], llm=llm)
agent = create_agent(model=llm, tools=toolkit.get_tools(), system_prompt=CATCHALL_AGENT_PROMPT)

# Initial search (10-15 min, uses API credits)
messages = [HumanMessage(content="Find corporate headquarters relocations in the US")]
response = agent.invoke({"messages": messages})
messages = response["messages"]

# Filter cached data (instant, LLM costs)
messages.append(HumanMessage(content="Show only California locations"))
response = agent.invoke({"messages": messages})
messages = response["messages"]

# Analyze cached data (instant, LLM costs)
messages.append(HumanMessage(content="What are the top 3 cities?"))
response = agent.invoke({"messages": messages})
print(response["messages"][-1].content)

Error handling

import os
from langchain_catchall import CatchAllClient

client = CatchAllClient(
    api_key=os.environ["CATCHALL_API_KEY"],
    max_wait_time=2400
)

try:
    result = client.search("Venture capital funding rounds")
    print(f"Success: {result.valid_records} records")
except TimeoutError as e:
    print(f"Search timed out: {e}")
    # Retry with narrower query or higher limit
except Exception as e:
    print(f"Error: {e}")
    raise

Resources

Write effective queries

Construct queries that return focused results

API reference

Complete API endpoints documentation

GitHub repository

Source code and examples

PyPI package

Package installation and releases

Get started

Concepts

How to

API Reference

Libraries

Integrations

LangChain integration

Before you start

Installation

Quickstart

CatchAllClient

Basic usage

Jobs: Granular control

Monitors

Advanced: Cost optimization

CatchAllTools

Setup

Create agent

Agent prompt

Conversational pattern

Error handling

Resources

Write effective queries

API reference

GitHub repository

PyPI package

​Before you start

​Installation

​Quickstart

​CatchAllClient

​Basic usage

​Jobs: Granular control

​Monitors

​Advanced: Cost optimization

​CatchAllTools

​Setup

​Create agent

​Agent prompt

​Conversational pattern

​Error handling

​Resources

Write effective queries

API reference

GitHub repository

PyPI package

Before you start

Installation

Quickstart

CatchAllClient

Basic usage

Jobs: Granular control

Monitors

Advanced: Cost optimization

CatchAllTools

Setup

Create agent

Agent prompt

Conversational pattern

Error handling

Resources