Research Agent

An autonomous research agent that searches the web, extracts content from pages, fact-checks claims against multiple sources, and cites everything. Runs a ReAct loop until it has a complete answer with sources.

Note:

Rate limits apply. Each research query may trigger 5-15 search API calls. Use a Brave Search or SerpAPI key with sufficient quota.

Agent File Structure

research-agentadd

agent.pyadd

tools.pyadd

config.jsonadd

Setup

Install Dependencies

Install the OpenAI Python client. The agent uses function calling for tool execution.

pip install openai

Set API Keys

Create config.json with your OpenAI and Brave Search keys. Keep this file in .gitignore.

{
  "openai_api_key": "sk-...",
  "brave_api_key": "BSA-...",
  "model": "gpt-4o",
  "max_iterations": 10,
  "temperature": 0.3
}

Note:

Never commit config.json to version control. Add it to .gitignore immediately.

Verify the Agent

Run a test query to confirm everything is wired correctly.

python agent.py --query "What is the current population of Tokyo?"

You should see the agent reason, call web_search, and return a cited answer.

System Prompt

The system prompt defines the agent's behavior, output format, and tool usage rules.

You are a research agent. Your job is to answer questions using web search,
content extraction, and fact-checking. Follow this protocol:

1. THOUGHT: Analyze what you need to find out
2. ACTION: Call one or more tools
3. Observe the results
4. Continue the THOUGHT → ACTION cycle until you have a complete answer
5. FINAL_ANSWER: Provide the answer with numbered citations [#1], [#2], etc.

Rules:
- Always cite specific sources. Never make up facts.
- If two sources conflict, note the discrepancy.
- Prefer primary sources over secondary sources.
- For statistical claims, find at least two corroborating sources.
- If you cannot verify a claim, state that it is unverified.

Tool Definitions

Agent Tools

web_search

Search the web using Brave Search API. Returns title, URL, and snippet for top results.

Values: query: string, count?: int (default 5)

extract_content

Fetch and extract main text content from a URL. Strips navigation, ads, and boilerplate.

Values: url: string

fact_check

Search for corroborating or contradicting evidence for a specific claim.

Values: claim: string

cite_source

Record a source for citation. Use this before FINAL_ANSWER so citations are tracked.

Values: url: string, title: string, snippet: string

Tool Implementation

# tools.py
import json
import requests

BRAVE_API_KEY = None
OPENAI_CLIENT = None
sources = []

def set_keys(brave_key, openai_client):
    global BRAVE_API_KEY, OPENAI_CLIENT
    BRAVE_API_KEY = brave_key
    OPENAI_CLIENT = openai_client

def web_search(query: str, count: int = 5):
    headers = {
        "Accept": "application/json",
        "X-Subscription-Token": BRAVE_API_KEY
    }
    resp = requests.get(
        "https://api.search.brave.com/res/v1/web/search",
        params={"q": query, "count": count},
        headers=headers
    )
    data = resp.json()
    results = []
    for r in data.get("web", {}).get("results", []):
        results.append({
            "title": r.get("title"),
            "url": r.get("url"),
            "description": r.get("description")
        })
    return json.dumps(results)

def extract_content(url: str):
    resp = requests.get(url, headers={"User-Agent": "ResearchAgent/1.0"})
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(resp.text, "html.parser")
    for tag in soup(["script", "style", "nav", "footer", "header"]):
        tag.decompose()
    text = soup.get_text(separator="\n", strip=True)
    return text[:4000]

def fact_check(claim: str):
    return web_search(f'"{claim}" fact check')

def cite_source(url: str, title: str, snippet: str):
    global sources
    sources.append({"url": url, "title": title, "snippet": snippet})
    return f"Source #{len(sources)} recorded"

Agent Initialization

# agent.py
import json
import argparse
import importlib
from openai import OpenAI
import tools as agent_tools

TOOL_SCHEMAS = [
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Search the web using Brave Search API",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"},
                    "count": {"type": "integer", "default": 5}
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "extract_content",
            "description": "Fetch and extract main text from a URL",
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {"type": "string"}
                },
                "required": ["url"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "fact_check",
            "description": "Search for corroborating evidence for a claim",
            "parameters": {
                "type": "object",
                "properties": {
                    "claim": {"type": "string"}
                },
                "required": ["claim"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "cite_source",
            "description": "Record a source for citation",
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {"type": "string"},
                    "title": {"type": "string"},
                    "snippet": {"type": "string"}
                },
                "required": ["url", "title", "snippet"]
            }
        }
    }
]

SYSTEM_PROMPT = """You are a research agent. Your job is to answer questions using
web search, content extraction, and fact-checking. Follow this protocol:

1. THOUGHT: Analyze what you need to find out
2. ACTION: Call one or more tools
3. Observe the results
4. Continue the THOUGHT → ACTION cycle until you have a complete answer
5. FINAL_ANSWER: Provide the answer with numbered citations [#1], [#2], etc.

Rules:
- Always cite specific sources. Never make up facts.
- If two sources conflict, note the discrepancy.
- Prefer primary sources over secondary sources.
- For statistical claims, find at least two corroborating sources.
- If you cannot verify a claim, state that it is unverified."""

def run_agent(query, config):
    client = OpenAI(api_key=config["openai_api_key"])
    agent_tools.set_keys(config["brave_api_key"], client)
    agent_tools.sources = []

    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": query}
    ]

    for i in range(config.get("max_iterations", 10)):
        response = client.chat.completions.create(
            model=config.get("model", "gpt-4o"),
            messages=messages,
            tools=TOOL_SCHEMAS,
            temperature=config.get("temperature", 0.3)
        )

        msg = response.choices[0].message
        messages.append(msg)

        if msg.content and "FINAL_ANSWER:" in msg.content:
            answer = msg.content.split("FINAL_ANSWER:", 1)[1].strip()
            sources_out = agent_tools.sources
            return {"answer": answer, "sources": sources_out}

        if not msg.tool_calls:
            messages.append({
                "role": "user",
                "content": "Continue your research. Use tools or provide FINAL_ANSWER."
            })
            continue

        for tool_call in msg.tool_calls:
            func_name = tool_call.function.name
            func_args = json.loads(tool_call.function.arguments)
            func = getattr(agent_tools, func_name)
            result = func(**func_args)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result
            })

    return {"answer": "Agent reached max iterations.", "sources": agent_tools.sources}

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--query", required=True)
    parser.add_argument("--config", default="config.json")
    args = parser.parse_args()

    with open(args.config) as f:
        config = json.load(f)

    result = run_agent(args.query, config)
    print("ANSWER:", result["answer"])
    print("\nSOURCES:")
    for i, s in enumerate(result["sources"]):
        print(f"  [#{i+1}] {s['title']}")
        print(f"      {s['url']}")

Walkthrough

Here's how the agent handles the query: "What is the population of Tokyo and what percentage of Japan's total is it?"

Agent receives query

The system prompt defines the ReAct loop. The agent starts with THOUGHT about what data it needs.

Searches for Tokyo population

The agent calls web_search("Tokyo population 2024"). Brave Search returns 5 results with snippets. The agent extracts the population figure (approx. 37 million metro).

{"title": "Tokyo Population 2024", "url": "...", "description": "The metro area population of Tokyo in 2024 is 37,115,000..."}

It calls cite_source to record this.

Searches for Japan population

The agent calls web_search("Japan total population 2024"). Returns 124 million. Records with cite_source.

The agent now has both numbers but wants a second source for each.

Fact-checks the numbers

The agent calls fact_check("Tokyo population 37 million") and fact_check("Japan population 124 million"). Both pass.

Then it calculates the percentage: 37,115,000 / 124,000,000 ≈ 29.9%.

Delivers FINAL_ANSWER

The agent returns:

FINAL_ANSWER: The Tokyo metro area population is approximately 37.1 million [#1][#2],
which is about 29.9% of Japan's total population of 124 million [#3][#4].
This makes Tokyo the most populous metropolitan area in Japan by a significant margin.

Customization

Model Configuration

model

Which OpenAI model to use. gpt-4o recommended for accuracy; gpt-4o-mini for speed/cost.

Values: gpt-4o, gpt-4o-mini

temperature

Lower values (0.1-0.3) produce more deterministic, fact-focused research. Higher values increase creativity but risk hallucination.

Values: 0.0 - 1.0 (default 0.3)

max_iterations

Maximum ReAct loop iterations before the agent is forced to answer. Increase for complex multi-step research.

Values: 1 - 20 (default 10)

Note:

Verified use case: Research Agent works well for market analysis, technology comparison, historical research, and current events. For highly technical topics, pair with the Code Review Agent to analyze source code referenced in research.

Key Takeaway

Research agents work best with 3-5 focused tools. Every additional tool consumes system prompt tokens and increases the chance of the model choosing the wrong one. Start with search + extract + cite, and only add tools when the agent consistently fails to complete tasks without them.

Research Agent Blueprint

Research Agent

Agent File Structure

Setup

Install Dependencies

Set API Keys

Verify the Agent

System Prompt

Tool Definitions

Agent Tools

Tool Implementation

Agent Initialization

Walkthrough

Agent receives query

Searches for Tokyo population

Searches for Japan population

Fact-checks the numbers

Delivers FINAL_ANSWER

Customization

Model Configuration

Related Articles

Skill Packs

SWE-Explore: Why AI Coding Agents Find the File but Miss the Lines That Matter

Hermes Agent Setup Guide

On this page