Research Agent Blueprint
Complete research agent blueprint with web search, content extraction, fact-checking, and citation. Ready-to-run Python implementation with OpenAI function calling.
Research Agent
An autonomous research agent that searches the web, extracts content from pages, fact-checks claims against multiple sources, and cites everything. Runs a ReAct loop until it has a complete answer with sources.
Note:
Rate limits apply. Each research query may trigger 5-15 search API calls. Use a Brave Search or SerpAPI key with sufficient quota.
Agent File Structure
Setup
Install Dependencies
Install the OpenAI Python client. The agent uses function calling for tool execution.
pip install openai
Set API Keys
Create config.json with your OpenAI and Brave Search keys. Keep this file in .gitignore.
{
"openai_api_key": "sk-...",
"brave_api_key": "BSA-...",
"model": "gpt-4o",
"max_iterations": 10,
"temperature": 0.3
}
Note:
Never commit config.json to version control. Add it to .gitignore immediately.
Verify the Agent
Run a test query to confirm everything is wired correctly.
python agent.py --query "What is the current population of Tokyo?"
You should see the agent reason, call web_search, and return a cited answer.
System Prompt
The system prompt defines the agent's behavior, output format, and tool usage rules.
You are a research agent. Your job is to answer questions using web search,
content extraction, and fact-checking. Follow this protocol:
1. THOUGHT: Analyze what you need to find out
2. ACTION: Call one or more tools
3. Observe the results
4. Continue the THOUGHT → ACTION cycle until you have a complete answer
5. FINAL_ANSWER: Provide the answer with numbered citations [#1], [#2], etc.
Rules:
- Always cite specific sources. Never make up facts.
- If two sources conflict, note the discrepancy.
- Prefer primary sources over secondary sources.
- For statistical claims, find at least two corroborating sources.
- If you cannot verify a claim, state that it is unverified.
Tool Definitions
Agent Tools
Values: query: string, count?: int (default 5)
Values: url: string
Values: claim: string
Values: url: string, title: string, snippet: string
Tool Implementation
# tools.py
import json
import requests
BRAVE_API_KEY = None
OPENAI_CLIENT = None
sources = []
def set_keys(brave_key, openai_client):
global BRAVE_API_KEY, OPENAI_CLIENT
BRAVE_API_KEY = brave_key
OPENAI_CLIENT = openai_client
def web_search(query: str, count: int = 5):
headers = {
"Accept": "application/json",
"X-Subscription-Token": BRAVE_API_KEY
}
resp = requests.get(
"https://api.search.brave.com/res/v1/web/search",
params={"q": query, "count": count},
headers=headers
)
data = resp.json()
results = []
for r in data.get("web", {}).get("results", []):
results.append({
"title": r.get("title"),
"url": r.get("url"),
"description": r.get("description")
})
return json.dumps(results)
def extract_content(url: str):
resp = requests.get(url, headers={"User-Agent": "ResearchAgent/1.0"})
from bs4 import BeautifulSoup
soup = BeautifulSoup(resp.text, "html.parser")
for tag in soup(["script", "style", "nav", "footer", "header"]):
tag.decompose()
text = soup.get_text(separator="\n", strip=True)
return text[:4000]
def fact_check(claim: str):
return web_search(f'"{claim}" fact check')
def cite_source(url: str, title: str, snippet: str):
global sources
sources.append({"url": url, "title": title, "snippet": snippet})
return f"Source #{len(sources)} recorded"
Agent Initialization
# agent.py
import json
import argparse
import importlib
from openai import OpenAI
import tools as agent_tools
TOOL_SCHEMAS = [
{
"type": "function",
"function": {
"name": "web_search",
"description": "Search the web using Brave Search API",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"},
"count": {"type": "integer", "default": 5}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "extract_content",
"description": "Fetch and extract main text from a URL",
"parameters": {
"type": "object",
"properties": {
"url": {"type": "string"}
},
"required": ["url"]
}
}
},
{
"type": "function",
"function": {
"name": "fact_check",
"description": "Search for corroborating evidence for a claim",
"parameters": {
"type": "object",
"properties": {
"claim": {"type": "string"}
},
"required": ["claim"]
}
}
},
{
"type": "function",
"function": {
"name": "cite_source",
"description": "Record a source for citation",
"parameters": {
"type": "object",
"properties": {
"url": {"type": "string"},
"title": {"type": "string"},
"snippet": {"type": "string"}
},
"required": ["url", "title", "snippet"]
}
}
}
]
SYSTEM_PROMPT = """You are a research agent. Your job is to answer questions using
web search, content extraction, and fact-checking. Follow this protocol:
1. THOUGHT: Analyze what you need to find out
2. ACTION: Call one or more tools
3. Observe the results
4. Continue the THOUGHT → ACTION cycle until you have a complete answer
5. FINAL_ANSWER: Provide the answer with numbered citations [#1], [#2], etc.
Rules:
- Always cite specific sources. Never make up facts.
- If two sources conflict, note the discrepancy.
- Prefer primary sources over secondary sources.
- For statistical claims, find at least two corroborating sources.
- If you cannot verify a claim, state that it is unverified."""
def run_agent(query, config):
client = OpenAI(api_key=config["openai_api_key"])
agent_tools.set_keys(config["brave_api_key"], client)
agent_tools.sources = []
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": query}
]
for i in range(config.get("max_iterations", 10)):
response = client.chat.completions.create(
model=config.get("model", "gpt-4o"),
messages=messages,
tools=TOOL_SCHEMAS,
temperature=config.get("temperature", 0.3)
)
msg = response.choices[0].message
messages.append(msg)
if msg.content and "FINAL_ANSWER:" in msg.content:
answer = msg.content.split("FINAL_ANSWER:", 1)[1].strip()
sources_out = agent_tools.sources
return {"answer": answer, "sources": sources_out}
if not msg.tool_calls:
messages.append({
"role": "user",
"content": "Continue your research. Use tools or provide FINAL_ANSWER."
})
continue
for tool_call in msg.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)
func = getattr(agent_tools, func_name)
result = func(**func_args)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
return {"answer": "Agent reached max iterations.", "sources": agent_tools.sources}
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--query", required=True)
parser.add_argument("--config", default="config.json")
args = parser.parse_args()
with open(args.config) as f:
config = json.load(f)
result = run_agent(args.query, config)
print("ANSWER:", result["answer"])
print("\nSOURCES:")
for i, s in enumerate(result["sources"]):
print(f" [#{i+1}] {s['title']}")
print(f" {s['url']}")
Walkthrough
Here's how the agent handles the query: "What is the population of Tokyo and what percentage of Japan's total is it?"
Agent receives query
The system prompt defines the ReAct loop. The agent starts with THOUGHT about what data it needs.
Searches for Tokyo population
The agent calls web_search("Tokyo population 2024"). Brave Search returns 5 results with snippets. The agent extracts the population figure (approx. 37 million metro).
{"title": "Tokyo Population 2024", "url": "...", "description": "The metro area population of Tokyo in 2024 is 37,115,000..."}
It calls cite_source to record this.
Searches for Japan population
The agent calls web_search("Japan total population 2024"). Returns 124 million. Records with cite_source.
The agent now has both numbers but wants a second source for each.
Fact-checks the numbers
The agent calls fact_check("Tokyo population 37 million") and fact_check("Japan population 124 million"). Both pass.
Then it calculates the percentage: 37,115,000 / 124,000,000 ≈ 29.9%.
Delivers FINAL_ANSWER
The agent returns:
FINAL_ANSWER: The Tokyo metro area population is approximately 37.1 million [#1][#2],
which is about 29.9% of Japan's total population of 124 million [#3][#4].
This makes Tokyo the most populous metropolitan area in Japan by a significant margin.
Customization
Model Configuration
Values: gpt-4o, gpt-4o-mini
Values: 0.0 - 1.0 (default 0.3)
Values: 1 - 20 (default 10)
Note:
Verified use case: Research Agent works well for market analysis, technology comparison, historical research, and current events. For highly technical topics, pair with the Code Review Agent to analyze source code referenced in research.
Key Takeaway
Research agents work best with 3-5 focused tools. Every additional tool consumes system prompt tokens and increases the chance of the model choosing the wrong one. Start with search + extract + cite, and only add tools when the agent consistently fails to complete tasks without them.
Related Articles
Agent Blueprints
Ready-to-run AI agent implementations. Complete system prompts, tool definitions, and initialization code for research, code review, and content writing agents.
Hermes Agent Setup Guide
Complete setup and configuration guide for Hermes Agent by Nous Research — the #1 self-hosted AI agent on OpenRouter. Skills, security, messaging platforms, and LLM provider wiring.
Content Writer Agent Blueprint
Multi-step content creation agent with outline, research, draft, edit, and finalization stages. Includes grammar checking, tone adjustment, and SEO optimization tools.