SQL Agent Blueprint
AI agent that converts natural language questions into SQL queries, runs them against SQLite databases, validates results, and explains findings. Self-contained — zero external DBs needed.
SQL Agent
An AI agent that answers business questions by writing and running SQL. It reads your database schema, generates queries from natural language, executes them on SQLite, validates that results make sense, and explains findings in plain English. No external database setup — SQLite is built into Python.
Note:
Point this agent at any SQLite database (or convert CSVs to SQLite with sqlite3 data.db ".import mydata.csv table_name"). It works with existing application databases too — just point database_path at your .db file.
Agent File Structure
Setup
Install Dependencies
The agent only needs the OpenAI client. SQLite is built into Python's standard library.
pip install openai
Create config.json
Point the agent at your SQLite database.
{
"openai_api_key": "sk-...",
"model": "gpt-4o",
"max_iterations": 6,
"database_path": "./data/sales.db",
"max_result_rows": 50
}
Note:
The agent runs read-only queries by default. It will refuse to execute INSERT, UPDATE, DELETE, DROP, or ALTER statements. This is enforced in the system prompt, not at the SQL level — treat it as a safety net, not a guarantee.
Verify
Ask a test question to verify setup.
python agent.py --query "What tables are in the database and how many rows does each have?"
The agent should read the schema and return table names with row counts.
System Prompt
You are a senior data analyst who answers questions by writing and running SQL
queries against a SQLite database. Follow this protocol:
1. THOUGHT: What does the user want to know? What tables and columns are involved?
2. ACTION: Read the database schema to understand available tables, columns, and types
3. Write a SQL query to answer the question
4. Run the query — if it fails, analyze the error and fix the query
5. Validate that the results make sense (row counts, value ranges, no obvious SQL mistakes)
6. Explain the findings in plain English
7. FINAL_ANSWER: The user's question answered, with supporting data
Rules:
- Always read the schema before writing any query — never guess table or column names
- Use ONLY SELECT statements. Never write INSERT, UPDATE, DELETE, DROP, or ALTER
- Limit results with LIMIT 50 unless the user asks for more
- Use CTEs (WITH clauses) for complex queries — they're more readable and debuggable
- If a query returns empty results, check whether the data exists before assuming it doesn't
- Explain what the query does in plain English, not just the SQL
### Tool Definitions
<ParameterGrid title="Agent Tools" items={[
{ param: "read_schema", description: "Read the full database schema: list of tables, column names, types, foreign keys, and indexes.", values: "none" },
{ param: "generate_query", description: "Generate a SQL query from a natural language question and schema. Uses LLM.", values: "question: string" },
{ param: "run_query", description: "Execute a SQL SELECT query on the database. Returns rows as formatted text. Rejects non-SELECT statements.", values: "query: string" },
{ param: "validate_results", description: "Check query output for sanity: row count within expected range, no null surprises, value ranges make sense.", values: "query: string, results: string" },
{ param: "explain_findings", description: "Explain query results in plain English. Uses LLM to generate a business-friendly explanation.", values: "question: string, query: string, results: string" }
]} />
### Tool Implementation
```python
# tools.py
import os
import re
import sqlite3
DATABASE_PATH = None
MAX_RESULT_ROWS = 50
def read_schema():
if not os.path.exists(DATABASE_PATH):
return f"ERROR: Database not found: {DATABASE_PATH}"
try:
conn = sqlite3.connect(DATABASE_PATH)
cursor = conn.cursor()
# Get all tables
cursor.execute("SELECT name FROM sqlite_master WHERE type='table' ORDER BY name")
tables = [row[0] for row in cursor.fetchall()]
output = []
for table in tables:
# Column info
cursor.execute(f"PRAGMA table_info('{table}')")
cols = cursor.fetchall()
col_lines = [f" {c[1]} {c[2]}{' PRIMARY KEY' if c[5] else ''}{' NOT NULL' if c[3] else ''}" for c in cols]
# Row count
cursor.execute(f"SELECT COUNT(*) FROM [{table}]")
count = cursor.fetchone()[0]
# Foreign keys
cursor.execute(f"PRAGMA foreign_key_list('{table}')")
fks = cursor.fetchall()
fk_lines = [f" → {fk[3]}.{fk[4]} (ON DELETE {fk[5] or 'NO ACTION'})" for fk in fks] if fks else []
output.append(f"TABLE: {table} ({count} rows)")
output.extend(col_lines)
if fk_lines:
output.append(" Foreign keys:")
output.extend(fk_lines)
output.append("")
conn.close()
return "\n".join(output)
except Exception as e:
return f"ERROR reading schema: {e}"
def generate_query(client, model, question):
schema = read_schema()
prompt = f"""Write a SQLite SELECT query to answer this question. Use ONLY SELECT.
Database schema:
{schema}
Question: {question}
Rules:
- Use ONLY SELECT statements
- Use explicit column names, not SELECT *
- Add LIMIT {MAX_RESULT_ROWS} unless the question asks for a count or aggregation
- Use CTEs (WITH clauses) for multi-step queries
- Qualify column names with table names when joining
- Return ONLY the SQL query, no explanation"""
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
temperature=0.1
)
text = response.choices[0].message.content.strip()
# Strip markdown code fences if present
if text.startswith("```"):
text = re.sub(r'^```\w*\n?', '', text)
text = re.sub(r'\n?```$', '', text)
return text
def run_query(query):
query = query.strip()
# Reject non-SELECT queries
if not re.match(r'^\s*SELECT\b', query, re.IGNORECASE):
return f"ERROR: Only SELECT queries are allowed. Rejected: {query[:80]}..."
if not os.path.exists(DATABASE_PATH):
return f"ERROR: Database not found: {DATABASE_PATH}"
try:
conn = sqlite3.connect(DATABASE_PATH)
conn.row_factory = sqlite3.Row
cursor = conn.cursor()
cursor.execute(query)
rows = cursor.fetchall()
if not rows:
conn.close()
return "Query returned 0 rows."
cols = [desc[0] for desc in cursor.description]
result = " | ".join(cols) + "\n"
result += "-" * len(result) + "\n"
for row in rows[:MAX_RESULT_ROWS]:
result += " | ".join(str(v) for v in row) + "\n"
if len(rows) > MAX_RESULT_ROWS:
result += f"\n... ({len(rows) - MAX_RESULT_ROWS} more rows not shown)"
conn.close()
return result
except Exception as e:
return f"QUERY ERROR: {e}\n\nQuery was:\n{query}"
def validate_results(query, results):
issues = []
if "Query returned 0 rows" in results:
issues.append("Query returned no rows. Check if filter conditions are too strict.")
elif "ERROR" in results:
issues.append("Query failed with an error. Fix the SQL and retry.")
return "\n".join(issues)
lines = results.strip().split("\n")
if len(lines) > 1:
data_lines = [l for l in lines[2:] if l and not l.startswith("...") and not l.startswith("-")]
if len(data_lines) == 1:
issues.append(f"Only 1 result row — results may be a scalar. Confirm this is expected.")
if len(data_lines) >= MAX_RESULT_ROWS:
issues.append(f"Results truncated at {MAX_RESULT_ROWS} rows. Consider a more specific query.")
# Check for NULL-heavy columns
if not issues:
col_count = len(lines[0].split("|")) if lines else 0
data_lines = [l for l in lines[2:] if l and not l.startswith("...") and not l.startswith("-")]
nulls = {}
for line in data_lines:
for i, val in enumerate(line.split("|")):
if val.strip() in ("None", "", "NULL"):
nulls[i] = nulls.get(i, 0) + 1
for idx, count in nulls.items():
if count > len(data_lines) * 0.5:
col_name = lines[0].split("|")[idx].strip() if idx < col_count else f"col_{idx}"
issues.append(f"Column '{col_name}' has {count}/{len(data_lines)} NULLs. Check for data quality issues or OUTER JOIN problems.")
return "\n".join(issues) if issues else "Results look valid."
def explain_findings(client, model, question, query, results):
prompt = f"""Explain these SQL results in plain English for a business audience.
Original question: {question}
SQL query run:
{query}
Results:
{results}
Explain:
1. What the query found (in plain English, not SQL)
2. The key numbers and what they mean
3. Any notable patterns or surprises
Keep it under 200 words."""
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
return response.choices[0].message.content
Agent Initialization
# agent.py
import json
import argparse
from openai import OpenAI
import tools as agent_tools
TOOL_SCHEMAS = [
{
"type": "function",
"function": {
"name": "read_schema",
"description": "Read the full database schema with tables, columns, types, keys, and row counts",
"parameters": {"type": "object", "properties": {}, "required": []}
}
},
{
"type": "function",
"function": {
"name": "generate_query",
"description": "Generate a SQL SELECT query from a natural language question using the database schema",
"parameters": {
"type": "object",
"properties": {"question": {"type": "string"}},
"required": ["question"]
}
}
},
{
"type": "function",
"function": {
"name": "run_query",
"description": "Execute a SQL SELECT query on the database. Only SELECT statements are allowed.",
"parameters": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "validate_results",
"description": "Check query results for sanity: row counts, nulls, value ranges",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"},
"results": {"type": "string"}
},
"required": ["query", "results"]
}
}
},
{
"type": "function",
"function": {
"name": "explain_findings",
"description": "Explain query results in plain English for a business audience",
"parameters": {
"type": "object",
"properties": {
"question": {"type": "string"},
"query": {"type": "string"},
"results": {"type": "string"}
},
"required": ["question", "query", "results"]
}
}
}
]
SYSTEM_PROMPT = """You are a senior data analyst who answers questions by writing
and running SQL queries against a SQLite database. Follow this protocol:
1. THOUGHT: What does the user want to know? What tables and columns are involved?
2. ACTION: Read the database schema
3. Write a SQL query to answer the question
4. Run the query — if it fails, analyze the error and fix the query
5. Validate that the results make sense
6. Explain the findings in plain English
7. FINAL_ANSWER: The user's question answered, with supporting data
Rules:
- Always read the schema before writing any query
- Use ONLY SELECT statements
- Limit results unless the user asks for more
- Use CTEs (WITH clauses) for complex queries
- Qualify column names with table names when joining
- If a query returns empty results, verify whether data exists
- Explain what the query does in plain English"""
def run_agent(question: str, config: dict):
client = OpenAI(api_key=config["openai_api_key"])
model = config.get("model", "gpt-4o")
agent_tools.DATABASE_PATH = config.get("database_path", "./data/sales.db")
agent_tools.MAX_RESULT_ROWS = config.get("max_result_rows", 50)
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"Answer this question using the database: {question}"}
]
for i in range(config.get("max_iterations", 6)):
response = client.chat.completions.create(
model=model,
messages=messages,
tools=TOOL_SCHEMAS,
temperature=0.1
)
msg = response.choices[0].message
messages.append(msg)
if msg.content and "FINAL_ANSWER:" in msg.content:
return msg.content.split("FINAL_ANSWER:", 1)[1].strip()
if not msg.tool_calls:
messages.append({
"role": "user",
"content": "Continue. Read the schema, generate and run a query, validate results, explain findings. End with FINAL_ANSWER."
})
continue
for tool_call in msg.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)
if func_name == "read_schema":
result = agent_tools.read_schema()
elif func_name == "generate_query":
result = agent_tools.generate_query(
client, model, func_args.get("question", ""))
elif func_name == "run_query":
result = agent_tools.run_query(
func_args.get("query", ""))
elif func_name == "validate_results":
result = agent_tools.validate_results(
func_args.get("query", ""),
func_args.get("results", ""))
elif func_name == "explain_findings":
result = agent_tools.explain_findings(
client, model,
func_args.get("question", ""),
func_args.get("query", ""),
func_args.get("results", ""))
else:
result = f"Unknown tool: {func_name}"
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
return "Agent reached max iterations."
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--query", required=True, help="Natural language question to answer")
parser.add_argument("--config", default="config.json")
args = parser.parse_args()
with open(args.config) as f:
config = json.load(f)
result = run_agent(args.query, config)
print(result)
Walkthrough
Answering "Which product categories grew more than 20% quarter-over-quarter?" against a sales database.
Agent reads the schema
read_schema() returns:
TABLE: orders (12,430 rows)
order_id INTEGER PRIMARY KEY
product_id INTEGER NOT NULL
category TEXT NOT NULL
amount REAL NOT NULL
order_date TEXT NOT NULL
→ products.product_id (ON DELETE NO ACTION)
TABLE: products (340 rows)
product_id INTEGER PRIMARY KEY
name TEXT NOT NULL
category TEXT NOT NULL
The agent notes: category exists in both tables — needs to understand which one drives the question. Orders has per-transaction data, products has metadata.
Generates the query
generate_query("Which product categories grew >20% QoQ?") produces:
WITH quarterly AS (
SELECT
p.category,
strftime('%Y', o.order_date) AS year,
((cast(strftime('%m', o.order_date) AS integer) - 1) / 3) + 1 AS quarter,
SUM(o.amount) AS total_revenue
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY p.category, year, quarter
),
qoq AS (
SELECT
category,
year,
quarter,
total_revenue,
LAG(total_revenue) OVER (
PARTITION BY category ORDER BY year, quarter
) AS prev_revenue
FROM quarterly
)
SELECT
category,
year,
quarter,
total_revenue,
prev_revenue,
ROUND((total_revenue - prev_revenue) / prev_revenue * 100, 1) AS growth_pct
FROM qoq
WHERE prev_revenue IS NOT NULL
AND (total_revenue - prev_revenue) / prev_revenue > 0.20
ORDER BY growth_pct DESC
Runs and validates
run_query returns 4 rows. validate_results notes: "Results look valid — 4 categories show >20% growth, no NULL columns."
But the agent double-checks: "Q1 2026 has no prior quarter — LAG returns NULL. The WHERE prev_revenue IS NOT NULL handles this correctly."
Delivers FINAL_ANSWER
FINAL_ANSWER:
4 product categories grew more than 20% quarter-over-quarter:
1. Electronics: +34.2% (Q1 2026 vs Q4 2025) — $142K → $191K
2. Home & Garden: +28.7% (Q1 2026 vs Q4 2025) — $89K → $115K
3. Sports: +24.1% (Q4 2025 vs Q3 2025) — $67K → $83K
4. Books: +21.5% (Q1 2026 vs Q4 2025) — $41K → $50K
Electronics is the standout, growing 34% QoQ. All growth was in the most
recent quarter (Q1 2026 vs Q4 2025), which may reflect seasonal holiday
momentum rather than structural growth — check Q2 2026 data when available.
Customization
Database Settings
Values: path to .db file
Values: gpt-4o, gpt-4o-mini
Values: 10-200 (default 50)
Values: 1-10 (default 6)
Note:
Read-only enforcement is prompt-based, not SQL-level. The agent's system prompt instructs it to write only SELECT statements, but this is not enforced at the database connection level. For production use, open the database in read-only mode or use a dedicated read replica.
Key Takeaway
A SQL agent's real value is in the iteration loop — schema → query → error → fix → validate. The first query is often wrong (wrong JOIN, missing GROUP BY, edge case in WHERE). The agent catches and fixes these faster than a human writing queries manually, especially for multi-CTE analytical queries. Pair it with the Insights Agent for the full data pipeline: ETL → SQL → analysis.
Related Articles
AI Agent Blueprints & Configurations
Ready-to-run AI agent blueprints, configurations, and local setup guides. Build research agents, code reviewers, and content writers with copy-paste implementations.
Content Writer Agent Blueprint
Multi-step content creation agent with outline, research, draft, edit, and finalization stages. Includes grammar checking, tone adjustment, and SEO optimization tools.
OpenAI Agents SDK Setup Guide
Complete setup and configuration guide for OpenAI's official Agents SDK. Handoffs, guardrails, sandbox agents, tracing, sessions, MCP integration, and realtime voice agents.