Agentic Prompting

Agentic prompting turns the model from a question-answerer into an autonomous agent. Instead of one prompt → one response, the agent operates in a loop: it reasons about what to do next, calls tools, observes results, and decides whether to continue or stop.

This is the pattern behind Claude Code, ChatGPT with tools, and any system where the model drives its own workflow. Single prompts are for known questions. Prompt chaining is for fixed multi-step pipelines. Agentic prompting is for tasks where you don't know the number of steps ahead of time.

Note:

Start with prompt chaining if your workflow has fixed steps. Graduate to agentic prompting when the model needs to decide what to do next at runtime.

The ReAct Loop

ReAct (Reasoning + Acting) is the core agentic pattern. The model operates in a three-phase loop until the task is complete:

┌─────────────────────────────────────────────┐
│                                             │
│  ▼                                          │
│  ┌──────────┐     ┌──────────┐              │
│  │  Reason  │────→│   Act    │              │
│  │  (think) │     │  (tool)  │              │
│  └──────────┘     └────┬─────┘              │
│       ▲                │                    │
│       │                ▼                    │
│       │           ┌──────────┐              │
│       └───────────│ Observe  │              │
│                   │ (result) │──────────────┼──→ Done
│                   └──────────┘              │
│                                             │
└─────────────────────────────────────────────┘

ReAct Prompt Template

You are an AI agent that solves tasks by reasoning step by step and
using tools when needed. Follow this format for every turn:

THOUGHT: What do I need to do next? What do I currently know?
ACTION: tool_name({
  "param1": "value1",
  "param2": "value2"
})

After you receive the tool result (OBSERVATION), continue with another
THOUGHT/ACTION pair, or end with FINAL_ANSWER when the task is complete.

Available tools:
- web_search(query: string) — Search the web for current information
- calculator(expression: string) — Evaluate a mathematical expression
- read_file(path: string) — Read the contents of a file

Task: Research the current population of Tokyo and calculate what
percentage it is of Japan's total population. Show your work.

The model responds:

THOUGHT: I need to find the population of Tokyo and Japan. Let me search
for both.

ACTION: web_search({
  "query": "Tokyo population 2024"
})

The system executes web_search, returns the result as OBSERVATION, and the loop continues. The model reasons again, calls calculator for the percentage, then returns FINAL_ANSWER.

Implementing ReAct in Python

import json
import re
from openai import OpenAI

client = OpenAI()
MAX_ITERATIONS = 10

def execute_tool(name, args):
    tools = {
        "web_search": lambda q: f"Search results for: {q}",
        "calculator": lambda e: eval(e),
    }
    return str(tools[name](**args))


def run_agent(task, tools_desc):
    messages = [{"role": "user", "content": f"""You are an AI agent.
{task}

Follow this format:
THOUGHT: <reasoning>
ACTION: tool_name(json_args)
or FINAL_ANSWER: <answer>

Available tools: {tools_desc}"""}]

    for i in range(MAX_ITERATIONS):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages
        )
        text = response.choices[0].message.content
        messages.append({"role": "assistant", "content": text})

        if "FINAL_ANSWER:" in text:
            return text.split("FINAL_ANSWER:")[1].strip()

        match = re.search(r'ACTION:\s*(\w+)\((.*?)\)', text, re.DOTALL)
        if match:
            tool_name, args_str = match.group(1), match.group(2)
            try:
                args = json.loads(args_str)
            except json.JSONDecodeError:
                args = {}
            result = execute_tool(tool_name, args)
            messages.append({"role": "user", "content": f"OBSERVATION: {result}"})

    return "Agent reached max iterations without a final answer."

Note:

eval() is only used in this example for brevity. Never use eval() in production agents — it executes arbitrary code and the LLM's input is susceptible to prompt injection. Use a safe math parser or sandboxed expression evaluator instead.

This is a minimal ReAct implementation — 40 lines, no external libraries beyond openai. The key pieces: format the prompt with THOUGHT/ACTION/OBSERVATION, parse the model's output for the next action, execute it, feed the result back as observation, repeat until FINAL_ANSWER or max iterations.

For production, use the native function-calling API (next section) instead of parsing text. But this text-based approach works with any model — no function-calling support required.

Tool Calling

Modern models support native function calling: you define tool schemas as JSON, the model returns structured calls instead of text you have to parse.

Defining Tool Schemas

tools = [
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Search the web for current information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query"
                    }
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Evaluate a mathematical expression",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "Expression to evaluate, e.g. '2 + 3 * 4'"
                    }
                },
                "required": ["expression"]
            }
        }
    }
]

Tool Execution Loop with Native Function Calling

def run_agent_v2(task, tools):
    messages = [{"role": "system", "content": "You are a helpful agent. Use tools when needed."},
                {"role": "user", "content": task}]

    for i in range(MAX_ITERATIONS):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )
        msg = response.choices[0].message
        messages.append(msg)

        if msg.content and not msg.tool_calls:
            return msg.content

        for tool_call in (msg.tool_calls or []):
            name = tool_call.function.name
            args = json.loads(tool_call.function.arguments)
            result = execute_tool(name, args)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": str(result)
            })

    return "Agent reached max iterations."

Native function calling removes the parsing fragility. The model returns structured tool_calls with names and arguments. You execute the tool, return the result as a "tool" role message, and the model incorporates it into the next reasoning step.

Tool design rules:

Be specific in descriptions. "Search the web" produces vague queries. "Search for current factual information using a keyword query" produces better args.
Validate inputs before executing. The model can hallucinate parameter values. Check types, bounds, and required fields server-side.
Handle tool errors in the loop. If a tool fails, return the error as the observation. The model can retry with corrected arguments or try a different approach.
Use tool_choice="auto" so the model decides when to call vs respond directly. Use "required" only when every turn should involve a tool.

Agentic vs Prompt Chaining

Both handle multi-step tasks. The difference is who controls the flow:

Pattern	Steps	Decision Maker	Recovery	Best For
Prompt Chaining	Fixed, known ahead	Developer (hardcoded)	Retry the failing step	Content pipelines, data transforms, routing
Agentic	Dynamic, unknown count	Model (at runtime)	Re-plan from current state	Research, debugging, open-ended tasks

Use chaining when you can write the pipeline on a whiteboard before writing code. Use agentic when the model needs to explore, adapt, or decide what to do next based on intermediate results.

Chain example: outline → draft → polish. Three known steps, always in that order.

Agent example: "Find the root cause of this production error." Steps depend on what the agent discovers — check logs, query the database, examine recent deploys, try a fix, verify.

Self-Correction in Agents

Agents self-correct differently than a generate → critique → revise loop. In a static self-correction loop, the model reviews its own output and rewrites it. In an agentic context, self-correction means:

Backtracking. If a tool call returns an error or unexpected result, the agent tries a different approach instead of proceeding with bad data.
Re-planning. If the agent discovers new information that invalidates its original plan, it abandons the plan and formulates a new one.
Escalation. If the agent can't resolve an issue after N attempts, it stops and asks for human input rather than continuing blindly.

After each tool result, evaluate:
1. Did the tool return what I expected?
2. Does this change my understanding of the task?
3. Should I continue with my current plan or re-plan?

If the answer to #3 is "re-plan," state your new plan explicitly before
taking the next action.

Multi-Agent Delegation

For complex tasks, decompose into sub-tasks and delegate to specialized sub-agents.

Decompose → Delegate → Aggregate

You are a supervisor agent. Break the task into independent subtasks,
delegate each to a specialized sub-agent, then synthesize the results.

Task: {complex_task}

Step 1 — DECOMPOSE: List the subtasks that need to be completed.
Step 2 — DELEGATE: For each subtask, write the prompt you'd give
   a sub-agent specialized in that area.
Step 3 — Wait for sub-agent results.
Step 4 — AGGREGATE: Combine results into a unified response. Resolve
   any conflicts between sub-agent findings.

Supervisor-Worker Pattern

def supervisor_agent(task):
    # Decompose
    plan = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"""Break this task into independent
subtasks. Return a JSON array of {{"id", "description", "specialty"}}.

Task: {task}"""}]
    )
    subtasks = json.loads(plan.choices[0].message.content)

    # Delegate to workers
    results = {}
    for st in subtasks:
        worker_prompt = f"You are a {st['specialty']} specialist.\nTask: {st['description']}"
        result = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": worker_prompt}]
        )
        results[st["id"]] = result.choices[0].message.content

    # Aggregate
    synthesis = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"""Combine these sub-agent results:
{json.dumps(results, indent=2)}

Produce a unified response. Flag any contradictions."""}]
    )
    return synthesis.choices[0].message.content

Multi-agent is expensive — each sub-agent call adds cost and latency. Reserve it for tasks where parallel decomposition genuinely helps: competitive analysis with multiple perspectives, code review from security + performance + style angles, fact-checking across multiple sources. For simpler tasks, a single agent with tools does the same job at lower cost.

Guardrails

Agents without boundaries are dangerous. Every agent loop needs:

Max iterations. Cap the loop at 10-20 iterations. Infinite loops happen when the model gets stuck and keeps calling the same failing tool. A cap is cheaper than a runaway bill.

MAX_ITERATIONS = 15
if iteration >= MAX_ITERATIONS:
    return "I couldn't complete this task within the time limit. Here's what I know so far: ..."

Tool allowlist. Never expose all available tools. Define exactly which tools each agent can access. A research agent doesn't need delete_database.

Timeout per tool call. If a tool (API call, database query) hangs, the agent hangs. Set per-tool timeouts and treat timeouts as tool errors the agent can react to.

Output sanitization. If the agent generates code, SQL, or shell commands, never execute them directly. Require human approval for write operations.

SANDBOXED_TOOLS = {"web_search", "read_file", "calculator"}
WRITE_TOOLS = {"write_file", "execute_sql", "run_command"}

if tool_name in WRITE_TOOLS and not human_approved(tool_name, args):
    return "This operation requires approval."

Cost tracking. Log tokens per iteration. If the agent is burning tokens without making progress, kill the loop. A spike from 500 to 5000 tokens per iteration means the agent is in a reasoning spiral.

Common Pitfalls

Over-engineering simple tasks. "What's 2+2?" doesn't need a ReAct loop. If a single prompt handles it, use a single prompt. Agentic patterns start paying off at 3+ unknown steps.
Vague tool descriptions. The model needs to know when to call a tool. "Searches stuff" → model never calls it. "Searches the web for current factual information using keyword queries" → model calls it correctly.
Missing error handling in tool results. When a tool fails, return the error as the observation. Let the model decide whether to retry with different arguments or try another approach. Never crash the loop on a tool error.
No max iteration cap. The most common production bug. Set a cap and a token budget. If the agent exceeds either, return a partial result with an explanation.
Using agents for latency-sensitive tasks. Agents add 3-10x latency over a single prompt. Don't use them in request-response paths where the user is waiting on a synchronous response.
Forgetting to close the loop. Every agent needs FINAL_ANSWER or equivalent. Without explicit termination, the model keeps reasoning indefinitely. Make the stop condition explicit in the prompt.
Mixing agent responsibilities. One agent should own one task. If you catch yourself writing "and also" in the agent prompt, split it into two agents with a supervisor.

Prompt Templates

Basic ReAct Agent:

You are an AI agent. Follow this format for every turn:

THOUGHT: <what to do next and why>
ACTION: tool_name({"param": "value"})

When you're done: FINAL_ANSWER: <your response>

Task: {task}
Available tools: {tools}

Planning Agent:

Before executing, create a plan:

STEP 1 — PLAN: List the subtasks. For each, note what tool or approach
   you'll use.
STEP 2 — EXECUTE: Work through each subtask. After each, note whether
   the result changes your plan.
STEP 3 — VERIFY: Check that all subtasks are complete and consistent.

Task: {task}

Supervisor + Workers:

You are a supervisor. Decompose the task, delegate to workers,
and synthesize their results.

TASK: {task}

1. DECOMPOSE into independent subtasks with assigned specialties.
2. For each subtask, write the exact prompt the worker should receive.
3. Wait for all results, then AGGREGATE into one response.
4. FLAG any contradictions between workers.

Self-Correcting Agent:

After each tool result, evaluate:

1. Did the tool return what I expected?
2. Does this change my understanding?
3. Should I continue or re-plan?

If re-planning, state the new plan BEFORE taking the next action.

Task: {task}

Best Practices

Start with prompt chaining, graduate to agentic. If you can write the steps on a whiteboard, you don't need an agent.
Define tools with precise descriptions and typed parameters. Vague tools produce vague calls.
Set max iterations and token budgets on every agent loop.
Handle tool failures gracefully — errors are observations the model can react to.
Test agents with an eval harness before deploying. Agents fail in complex ways that static prompts don't.
Log every iteration — thought, action, observation. When the agent does something wrong, you need the full trace.
Prefer native function calling over text-parsed actions when the model supports it. Less fragile, better tool argument quality.

Agentic Prompting