Testing Agent Blueprint

Complete testing agent that reads source code, identifies test gaps, generates unit tests, runs them, and fixes failures. Ready-to-run with pytest and Jest integration.

June 11, 2026
testing-agenttest-generationpytestjestagent-blueprintcoverage

Testing Agent

An AI agent that writes and runs tests like a QA engineer embedded in your team. It reads source code, checks coverage gaps, generates tests, runs them, and iterates on failures until tests pass.

Note:

Run this agent on PRs to auto-generate tests for new code, or point it at under-tested modules to fill coverage gaps. It uses your existing test framework — no config changes needed.

Agent File Structure

testing-agentadd
agent.pyadd
tools.pyadd
config.jsonadd

Setup

1

Install Dependencies

Install the OpenAI client. The agent uses your existing test runner (pytest or Jest).

pip install openai pytest pytest-cov
2

Create config.json

Configure the agent with your project root and test framework.

{
  "openai_api_key": "sk-...",
  "model": "gpt-4o",
  "max_iterations": 8,
  "project_root": "./my-project",
  "test_framework": "pytest",
  "test_directory": "tests",
  "source_directory": "src",
  "coverage_threshold": 80
}

Note:

The agent writes files to your test directory. Commit existing tests first or run on a branch.

3

Verify

Generate tests for a single module to verify the setup.

python agent.py --source "src/utils/validators.py"

The agent should create a test file, run it, and report pass/fail.

System Prompt

You are a senior test engineer. Your job is to write thorough, maintainable
tests for given source code. Follow this protocol:

1. THOUGHT: What does this code do? What are the critical paths?
2. ACTION: Read source files, check existing tests and coverage
3. Generate tests covering: happy path, edge cases, error conditions,
   boundary values, and the specific behavior described in docstrings
4. Run the tests — if any fail, analyze the failure and fix the test
   (not the source code) until all pass
5. FINAL_REPORT: Number of tests generated, coverage before/after, any
   untestable code noted

Rules:
- Match the project's existing test style (naming, fixtures, assertions)
- One test file per source module, named test_{module}.py
- Prefer pytest fixtures and parametrize over repeated setup
- Mock external dependencies (APIs, databases) — tests must be deterministic
- If coverage is already above threshold, add only edge-case tests
- Never modify source code — fix test failures by correcting the test

Tool Definitions

Agent Tools

read_file
Read a source or test file. Returns numbered lines.

Values: path: string

list_directory
List files in source and test directories. Used to match source modules to existing tests.

Values: path: string, recursive?: bool

write_file
Create or overwrite a test file. Path is relative to project_root.

Values: path: string, content: string

run_tests
Run pytest (or Jest) on a specific test file or directory. Returns test output with pass/fail counts.

Values: path?: string, framework?: 'pytest' | 'jest'

check_coverage
Check code coverage for a source file. Returns uncovered lines and overall percentage.

Values: source_path: string

Tool Implementation

# tools.py
import os
import subprocess

PROJECT_ROOT = None
TEST_DIRECTORY = None
TEST_FRAMEWORK = None
COVERAGE_THRESHOLD = 80

def read_file(path):
    full = os.path.join(PROJECT_ROOT, path)
    if not os.path.exists(full):
        return f"ERROR: File not found: {path}"
    with open(full, "r") as f:
        lines = f.readlines()
    return "".join(f"{i+1:4d}| {l}" for i, l in enumerate(lines))

def list_directory(path, recursive=False):
    full = os.path.join(PROJECT_ROOT, path)
    if not os.path.exists(full):
        return f"ERROR: Directory not found: {path}"
    items = []
    for root, dirs, files in os.walk(full):
        dirs[:] = [d for d in dirs if not d.startswith(".") and d != "__pycache__"]
        for f in files:
            if f.endswith((".py", ".js", ".ts", ".tsx")) and not f.startswith("."):
                rel = os.path.relpath(os.path.join(root, f), PROJECT_ROOT)
                items.append(rel)
        if not recursive:
            break
    return "\n".join(sorted(items)) if items else f"Directory {path} is empty"

def write_file(path, content):
    full = os.path.join(PROJECT_ROOT, path)
    os.makedirs(os.path.dirname(full), exist_ok=True)
    with open(full, "w") as f:
        f.write(content)
    return f"Written {len(content)} bytes to {path}"

def run_tests(path=None, framework=None):
    framework = framework or TEST_FRAMEWORK
    if framework == "pytest":
        cmd = ["pytest", "-v", "--tb=short"]
        if path:
            cmd.append(os.path.join(PROJECT_ROOT, path))
        else:
            cmd.append(os.path.join(PROJECT_ROOT, TEST_DIRECTORY))
    elif framework == "jest":
        cmd = ["npx", "jest", "--verbose"]
        if path:
            cmd.append(path)
    else:
        return f"Unknown framework: {framework}"

    try:
        result = subprocess.run(cmd, cwd=PROJECT_ROOT, capture_output=True,
                                text=True, timeout=60)
        return result.stdout + "\n" + result.stderr
    except subprocess.TimeoutExpired:
        return "Test run timed out after 60s. Check for infinite loops."

def check_coverage(source_path):
    if TEST_FRAMEWORK != "pytest":
        return f"Coverage check not supported for {TEST_FRAMEWORK}. Only pytest --cov is available."
    source_path = source_path.replace("/", ".").replace(".py", "")
    cmd = [
        "pytest", f"--cov={source_path}", "--cov-report=term-missing",
        os.path.join(PROJECT_ROOT, TEST_DIRECTORY), "-q"
    ]
    try:
        result = subprocess.run(cmd, cwd=PROJECT_ROOT, capture_output=True,
                                text=True, timeout=60)
        return result.stdout
    except Exception as e:
        return f"Coverage check failed: {e}"

Agent Initialization

# agent.py
import json
import argparse
from openai import OpenAI
import tools as agent_tools

TOOL_SCHEMAS = [
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read a source or test file with line numbers",
            "parameters": {
                "type": "object",
                "properties": {"path": {"type": "string"}},
                "required": ["path"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "list_directory",
            "description": "List Python/JS/TS files in a directory, excluding hidden dirs",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"},
                    "recursive": {"type": "boolean", "default": False}
                },
                "required": ["path"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "write_file",
            "description": "Create or overwrite a test file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"},
                    "content": {"type": "string"}
                },
                "required": ["path", "content"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "run_tests",
            "description": "Run tests on a specific file or the full test directory",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"},
                    "framework": {"type": "string", "enum": ["pytest", "jest"]}
                },
                "required": []
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "check_coverage",
            "description": "Check code coverage for a source module",
            "parameters": {
                "type": "object",
                "properties": {"source_path": {"type": "string"}},
                "required": ["source_path"]
            }
        }
    }
]

SYSTEM_PROMPT = """You are a senior test engineer. Your job is to write thorough,
maintainable tests for given source code. Follow this protocol:

1. THOUGHT: What does this code do? What are the critical paths?
2. ACTION: Read source files, check existing tests and coverage
3. Generate tests covering: happy path, edge cases, error conditions,
   boundary values, and behavior described in docstrings
4. Run the tests — if any fail, analyze the failure and fix the test
   (not the source code) until all pass
5. FINAL_REPORT: Number of tests generated, coverage before/after,
   any untestable code noted

Rules:
- Match the project's existing test style (naming, fixtures, assertions)
- One test file per source module, named test_{module}.py
- Prefer pytest fixtures and parametrize over repeated setup
- Mock external dependencies — tests must be deterministic
- If coverage is already above threshold, add only edge-case tests
- Never modify source code — fix test failures by correcting the test"""


def run_agent(source_path, config):
    client = OpenAI(api_key=config["openai_api_key"])
    agent_tools.PROJECT_ROOT = config["project_root"]
    agent_tools.TEST_DIRECTORY = config.get("test_directory", "tests")
    agent_tools.TEST_FRAMEWORK = config.get("test_framework", "pytest")
    agent_tools.COVERAGE_THRESHOLD = config.get("coverage_threshold", 80)

    query = (
        f"Write comprehensive tests for {source_path}. "
        f"Read the source file, check existing test coverage, "
        f"generate tests, run them, and fix any failures. "
        f"Use {config.get('test_framework', 'pytest')} conventions."
    )

    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": query}
    ]

    for i in range(config.get("max_iterations", 8)):
        response = client.chat.completions.create(
            model=config.get("model", "gpt-4o"),
            messages=messages,
            tools=TOOL_SCHEMAS,
            temperature=0.2
        )

        msg = response.choices[0].message
        messages.append(msg)

        if msg.content and "FINAL_REPORT:" in msg.content:
            return msg.content.split("FINAL_REPORT:", 1)[1].strip()

        if not msg.tool_calls:
            messages.append({
                "role": "user",
                "content": "Continue. Use tools to read files, generate tests, and run them. End with FINAL_REPORT."
            })
            continue

        for tool_call in msg.tool_calls:
            func_name = tool_call.function.name
            func_args = json.loads(tool_call.function.arguments)
            func = getattr(agent_tools, func_name)
            result = func(**func_args)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result
            })

    return "Agent reached max iterations."


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--source", required=True, help="Source file to generate tests for")
    parser.add_argument("--config", default="config.json")
    args = parser.parse_args()

    with open(args.config) as f:
        config = json.load(f)

    report = run_agent(args.source, config)
    print(report)

Walkthrough

Generating tests for src/services/payment.py — a Stripe integration module.

1

Agent reads the source

read_file("src/services/payment.py") returns a module with three functions: create_payment_intent(), handle_webhook(), refund_charge(). Each has type hints and docstrings.

2

Checks existing coverage

check_coverage("src.services.payment") shows 0% coverage — no tests exist. The agent also calls list_directory("tests") to confirm no test_payment.py file.

3

Generates tests

The agent calls write_file("tests/test_payment.py", ...) with:

# tests/test_payment.py
from unittest.mock import patch, MagicMock
import pytest
from src.services.payment import create_payment_intent, handle_webhook, refund_charge

@pytest.fixture
def mock_stripe():
    with patch("src.services.payment.stripe") as mock:
        yield mock

def test_create_payment_intent_success(mock_stripe):
    mock_stripe.PaymentIntent.create.return_value = {"id": "pi_123", "status": "succeeded"}
    result = create_payment_intent(amount=5000, currency="usd")
    assert result["id"] == "pi_123"

def test_create_payment_intent_zero_amount(mock_stripe):
    with pytest.raises(ValueError, match="amount must be positive"):
        create_payment_intent(amount=0)

def test_handle_webhook_invalid_signature(mock_stripe):
    mock_stripe.Webhook.construct_event.side_effect = ValueError("Invalid signature")
    result = handle_webhook(b"payload", "bad_sig")
    assert result["status"] == "error"

def test_refund_charge_success(mock_stripe):
    mock_stripe.Refund.create.return_value = {"id": "re_456", "status": "succeeded"}
    result = refund_charge("ch_789")
    assert result["id"] == "re_456"
4

Runs tests and fixes failures

run_tests("tests/test_payment.py") returns 3 pass, 1 fail — the refund_charge test fails because the actual function has a different return key. The agent reads the source again, spots "refund_id" vs "id", and calls write_file with the corrected assertion.

5

Delivers FINAL_REPORT

FINAL_REPORT:
4 tests generated for src/services/payment.py
Coverage: 0% → 92% (4/4 functions covered)
Failures resolved: 1 (return key mismatch)
Untestable: None — all functions have deterministic inputs

Customization

Test Framework Settings

test_framework
Which test runner the agent uses. pytest for Python projects, jest for JS/TS.

Values: pytest, jest

coverage_threshold
If coverage is already above this %, the agent skips generating tests for that module.

Values: 0-100 (default 80)

test_directory
Where test files live relative to project_root. The agent creates test_{module}.py files here.

Values: tests, __tests__, spec

source_directory
Where source code lives. The agent only reads from this directory.

Values: src, lib, app

Note:

Mocking external services. The agent knows to mock APIs, databases, and file I/O. If your code uses a less common library, add it to the system prompt under "Mock external dependencies."

Key Takeaway

A testing agent's value is in coverage gaps, not test volume. Point it at modules with 0-40% coverage first — these are where humans avoid writing tests because the code is complex or untested from the start. The agent doesn't get bored writing edge-case tests for a function with 15 branches.