What Is the OpenAI Agents SDK?

The OpenAI Agents SDK is an orchestration framework for designing, building, and deploying LLM-powered agents. Released as a lightweight Python package (pip install openai-agents), it's a production-grade evolution of OpenAI's earlier Swarm experiment — same minimal API surface, but built for real-world reliability.

The SDK occupies a specific spot in the stack. It's not a model API (that's the Responses API and Chat Completions API). It's not a platform tool (that's the now-deprecated Agent Builder). It's an agent runtime — a library that manages the agent loop, dispatches tools, executes handoffs, runs guardrails, and collects traces.

Your Application
    ↓ SDK (Agent + Runner)
    ↓ Models (Responses API / Chat Completions / LiteLLM)
    ↓ Tools (function_tool / MCP / Hosted Tools)

This matters because OpenAI is deprecating the Assistants API (sunset August 26, 2026). The SDK is the recommended path forward for anyone who was building on Assistants and needs a framework to manage multi-turn, multi-tool, multi-agent workflows.

What It Isn't

Not a replacement for direct API calls. If you need one LLM call with one tool, call the Responses API directly. The SDK adds 0 value and 1 dependency.
Not a graph engine. LangGraph this is not. The SDK doesn't model workflows as directed graphs with state nodes and edges. It models agents as autonomous tools that can call other agents.
Not a platform. You host it. OpenAI doesn't run your agent loop.

Architecture: Four Primitives

The SDK is built on a deliberately small set of abstractions. The entire framework fits in your head after reading one page of docs.

1. Agent

An Agent is an LLM configured with instructions, tools, and optional runtime behavior:

from agents import Agent

support_agent = Agent(
    name="Support Agent",
    instructions="You handle general support inquiries.",
    tools=[search_knowledge_base, escalate_to_human],
    model="gpt-4o-mini",  # per-agent model override
)

The agent owns:

Instructions — system prompt. Can use prompt injection for dynamic behavior.
Tools — function tools, hosted tools (WebSearch, FileSearch, Computer), MCP servers.
Handoffs — references to other Agent objects. The LLM can choose to delegate.
Guardrails — input and output validation functions that run alongside execution.
Output type — optional Pydantic model for structured output.
Model override — each agent can use a different model.

The key insight: an Agent is a lightweight descriptor, not a long-lived process. It defines LLM instructions and capabilities. The Runner brings it to life.

2. Runner

The Runner is the execution engine. It manages the agent loop:

Call the LLM with the agent's instructions and conversation history
If the LLM returns tool calls, execute them
Feed tool results back to the LLM
If the LLM returns a handoff, transfer control to the target agent
Repeat until done or max_turns reached

from agents import Agent, Runner

result = Runner.run_sync(
    support_agent,
    "I was charged twice for my subscription.",
)
print(result.final_output)

Runner.run() is the async version. Both return a RunResult that includes final_output, the full conversation history, and trace metadata.

The runner is where the SDK earns its keep. Without it, you'd write this loop yourself — calling the API, parsing tool calls, executing functions, checking for handoffs, re-entering the conversation. With it, you define the agents and the SDK manages state transitions.

3. Tools

Tools come in three flavors:

Function Tools — any Python function decorated with @function_tool:

from agents import function_tool

@function_tool
def get_weather(city: str) -> str:
    """Get the weather for a given city."""
    return f"The weather in {city} is sunny."

The SDK uses Python's inspect module for signature extraction, griffe for docstring parsing, and pydantic for schema generation. The OpenAI function-calling schema is generated automatically.

The @function_tool(defer_loading=True) option hides a function tool until a ToolSearchTool() runtime helper loads it — useful for agent toolkits where you want to register dozens of tools but only make the relevant ones visible.

Hosted Tools — OpenAI-managed tools that require no code:

WebSearchTool() — browse the web
FileSearchTool() — search uploaded files
ComputerTool() — computer-use actions (beta)
CodeInterpreterTool() — execute code in a sandbox

MCP Tools — any Model Context Protocol server:

from agents.mcp import MCPServerStdio

async with MCPServerStdio(
    name="Filesystem Server",
    params={"command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]}
) as server:
    agent = Agent(
        name="File Agent",
        instructions="Manage files for the user.",
        mcp_servers=[server],
    )

4. Handoffs

Handoffs are the SDK's multi-agent mechanism. An agent delegates to another by including the target agent in its handoffs list. The LLM decides when to hand off — the SDK recognizes the handoff signal in the LLM's response and transfers control automatically.

billing_agent = Agent(
    name="Billing Agent",
    instructions="Handle billing inquiries: charges, invoices, payment methods, refunds.",
)

triage_agent = Agent(
    name="Triage Agent",
    instructions="Route users to the right agent. For billing, hand off to Billing Agent.",
    handoffs=[billing_agent],
)

result = Runner.run_sync(triage_agent, "I was charged twice.")

Under the hood, handoffs work by rewriting the conversation to simulate a clean handoff context. The incoming agent sees only the relevant conversation history — not every previous turn from every agent. The SDK provides built-in handoff prompt templates (agents.extensions.handoff_prompt.prompt_with_handoff_instructions) to make this natural.

Handoffs stay within a single Runner.run() call. Input guardrails apply only to the first agent in the chain; output guardrails only to the agent producing the final output.

Guardrails: Safety at Every Layer

Guardrails are validation functions that run in parallel with agent execution. The SDK supports three scopes:

Guardrail Type	When It Runs	What It Guards
`@input_guardrail`	Before the LLM processes user input	Prevents bad prompts from reaching the model
`@output_guardrail`	Before output reaches the user	Blocks harmful or off-topic responses
Tool guardrails	Before/after each custom function-tool call	Validates tool inputs and outputs

from agents import input_guardrail, output_guardrail, GuardrailFunctionOutput

@input_guardrail
async def no_pii_guardrail(context, agent, input):
    if any(p in str(input).lower() for p in ["credit card", "ssn", "social security"]):
        return GuardrailFunctionOutput(tripwire_triggered=True)
    return GuardrailFunctionOutput(allow=True)

A guardrail with tripwire_triggered=True halts execution immediately. This is the SDK's primary safety mechanism and it's refreshingly simple — a guardrail is just a function that returns a boolean.

Sessions: Persistent Memory

Sessions maintain working context across agent turns. The SDK supports pluggable backends:

SQLite — development and single-process deployments
Redis — multi-process, multi-instance persistence
Postgres (via SQLAlchemy) — production deployments with HA requirements

from agents import Agent, Runner, SQLAlchemySession

session = await SQLAlchemySession.create(
    connection_string="sqlite:///agent_memory.db"
)

agent = Agent(name="Memory Agent", instructions="Remember user preferences.")

result = await Runner.run(agent, "My name is Alice.", session=session)
result = await Runner.run(agent, "What's my name?", session=session)
# → "Your name is Alice."

Sessions store conversation history, tool call results, and any application context you inject via RunContextWrapper. The SDK handles serialization and deserialization — you provide the connection string.

Tracing: Observability Built In

Every agent run is automatically traced. The SDK collects:

LLM generations (prompt, completion, token counts)
Tool calls (function name, input, output, duration)
Handoffs (source, target, reason)
Guardrail checks (which guardrail, result, duration)
Custom events you add with agents.trace()

Traces are sent to OpenAI's Traces dashboard by default. You can also export to your observability stack — Datadog, Grafana, or a custom processor via TraceProcessor.

from agents import trace

with trace(workflow_name="Customer Support"):
    result = Runner.run_sync(triage_agent, "I need help with billing.")

This is one of the SDK's strongest features. Most agent frameworks ship tracing as an add-on or leave it to third-party tools. The SDK makes it free and automatic — you get it from the first Runner.run_sync().

Multi-Agent Orchestration Patterns

The SDK supports three distinct multi-agent patterns:

Pattern 1: Router/Supervisor

A supervisor agent routes requests to specialist agents. This is the most common pattern and the one the SDK handles best:

User → Triage Agent → [Billing Agent, Support Agent, Sales Agent]

Pattern 2: Agent as Tool

One agent exposes another agent as a tool it can call. The parent agent decides when to invoke the child, and the child runs with its own instructions and tools:

@function_tool
def delegate_to_research(query: str) -> str:
    result = Runner.run_sync(research_agent, query)
    return result.final_output

coordinator = Agent(
    name="Coordinator",
    instructions="You coordinate research tasks.",
    tools=[delegate_to_research],
)

Pattern 3: Parallel Agents

Multiple agents work on independent subtasks, and a final agent synthesizes results. The SDK doesn't have built-in parallel execution — you manage this yourself with Python's asyncio.gather() or a task queue:

import asyncio
from agents import Agent, Runner

research_agent = Agent(name="Researcher", instructions="Research the topic.")
writing_agent = Agent(name="Writer", instructions="Write the draft.")
review_agent = Agent(name="Reviewer", instructions="Review and improve.")

async def run_parallel():
    research, draft, review = await asyncio.gather(
        Runner.run(research_agent, "Topic: LLM agents"),
        Runner.run(writing_agent, "Write about LLM agents"),
        Runner.run(review_agent, "Review this article"),
    )
    return synthesize(research.final_output, draft.final_output, review.final_output)

This works but lacks built-in coordination. If you need parallel agents with shared state, dependency ordering, and error propagation, LangGraph's graph model may be a better fit.

How It Compares

LangGraph (LangChain)

LangGraph models agent workflows as directed graphs with explicit state nodes, edges, and conditional transitions. It's the most powerful and most complex option.

Dimension	LangGraph	OpenAI Agents SDK
Core abstraction	State graph with nodes and edges	Agent descriptor + Runner loop
Control flow	Explicit via graph edges	Implicit via LLM+handoffs
State management	Manual — you design the state schema	Automatic via Runner + Sessions
Parallel execution	Built-in (fan-out/fan-in via nodes)	Manual (asyncio.gather)
Human-in-the-loop	First-class (checkpoints, approval nodes)	Via guardrails (limited)
Observability	LangSmith (comprehensive, paid tier)	Built-in tracing (free with API key)
Learning curve	Steep — graph design, state, routing	Gentle — agents, tools, handoffs
LLM support	50+ providers via LangChain	OpenAI + 100+ via LiteLLM

Pick LangGraph when: you need explicit control over execution flow, durable state for long-running workflows, sophisticated human-in-the-loop approval chains, or parallel agent execution with dependency management.

Pick OpenAI SDK when: you want a fast start with handoff-based multi-agent, built-in tracing is sufficient, your workflows are tree-shaped (router → specialist) rather than graph-shaped, and you're already on OpenAI infrastructure.

AutoGen (Microsoft)

AutoGen models multi-agent workflows as conversations between agents. An agent generates, another critiques, a third summarizes. The emergent behavior from conversation patterns is AutoGen's superpower.

Dimension	AutoGen	OpenAI Agents SDK
Core abstraction	Conversational agent groups	Agent + Runner
Multi-agent model	Group chat, nested chat, two-agent chat	Handoffs (agent calls agent)
Human-in-the-loop	First-class — agent proposes, human approves	Via guardrails (basic)
Code execution	Built-in sandbox	Via SandboxAgent (beta)
Research backing	Published papers on conversation patterns	Production-oriented, less research
Stability	Active development with breaking changes	Production-focused, stable API
LLM support	OpenAI-focused via config	OpenAI + 100+ via LiteLLM

Pick AutoGen when: your workflow benefits from emergent behavior through agent conversation — code review cycles where critic and author iterate, or research loops where agents challenge each other's assumptions. AutoGen's conversation patterns handle reciprocal direction changes naturally (agent A → agent B → agent A), which handoffs don't.

Pick OpenAI SDK when: you want deterministic handoffs between clearly defined specialists, your workflow is tree-shaped (one agent routes to one specialist), or you need built-in tracing without setting up a separate observability stack.

CrewAI

CrewAI models multi-agent systems as teams with roles. Agents have roles, goals, and backstories. Tasks have clear owners. This maps beautifully to how humans think about team workflows.

Dimension	CrewAI	OpenAI Agents SDK
Core abstraction	Roles + Tasks + Crew	Agent + Runner
Prototyping speed	Fastest — 30 lines for multi-agent	Fast — 50 lines for multi-agent
Tool integration	Basic, thinner docs	Rich (function_tool, MCP, hosted tools)
Execution model	Sequential tasks per role	Handoff-driven, less structured
Observability	Limited	Built-in tracing
Production readiness	Approaching — limited checkpointing	High — tracing, guardrails, sessions
LLM support	OpenAI-focused	OpenAI + 100+ via LiteLLM

Pick CrewAI when: you need to prototype a multi-agent workflow in an afternoon, your agents have clear role-based responsibilities that map to sequential tasks (research → write → review), and you accept the gap between prototype and production.

Pick OpenAI SDK when: production reliability matters — guardrails, tracing, and session persistence are priorities; you need rich tool integration with MCP; or your multi-agent pattern is routing/supervisor rather than role-based pipelines.

Feature Comparison Matrix

Feature	OpenAI SDK	LangGraph	AutoGen	CrewAI
Agent definition	Python class	Graph node	Agent class	Role-based
Tool registration	@function_tool, MCP, hosted	Tool decorator	Tool class	Tool class
Multi-agent orchestration	Handoffs	Graph edges	Group chat	Sequential tasks
Guardrails	Built-in (3 scopes)	Via callbacks	Limited	None
Tracing	Built-in, free	LangSmith (paid tiers)	Third-party	Limited
Human-in-the-loop	Via guardrails	First-class (checkpoints)	First-class	Via tasks
MCP support	Native	Community adapters	Community adapters	Via plugins
Parallel execution	Manual (asyncio)	Built-in (fan-out)	Via group chat	Sequential only
Sessions/memory	Built-in (SQLite/Redis/PG)	Built-in (checkpointer)	Via agents	Limited
LLM provider support	OpenAI + 100+ via LiteLLM	50+ providers	OpenAI-focused	OpenAI-focused
Production readiness	High	Highest	Medium	Approaching
Learning curve	Low	High	Medium	Low
License	MIT	MIT	MIT (AG2)	MIT

Limitations

The SDK is not a silver bullet. Here's what it doesn't do well:

Vendor lock-in. The SDK is provider-agnostic on paper (it supports any LLM via LiteLLM), but in practice the best features — tracing dashboard, hosted tools, sandbox agents — are OpenAI-only. Non-OpenAI models lose the built-in hosted tools and may not get full tracing fidelity.

No parallel execution. The SDK runs one agent at a time. Handoffs are sequential. If you need parallel agents with result merging, you write the coordination yourself. LangGraph handles this natively with fan-out/fan-in graph nodes.

Limited human-in-the-loop. Guardrails can halt execution, but they can't pause and resume. AutoGen's model — where an agent proposes, a human approves, the agent continues — isn't achievable with the current SDK. You'd need to build a custom workflow around the guardrail tripwire.

OpenAI model pricing. Running the SDK with OpenAI models means paying per-token at standard API rates. For high-volume agent deployments with many turns per session, costs can add up quickly. The SDK doesn't include cost control or budgeting.

Sandbox agents are beta. The isolated workspace execution (Docker sandbox for code running) is in beta. The API and capabilities are expected to change before GA.

No TypeScript support yet. The SDK is Python-only. TypeScript is "planned for a future release."

Pitfalls

What I learned the hard way evaluating this SDK:

Max turns matters. The default max_turns=10 in Runner.run_sync() is easy to hit in multi-handoff workflows. Each handoff costs a turn. Three agents with two tools each can exhaust 10 turns before producing output. Either raise max_turns or set max_turns=None to disable the limit entirely (available in recent SDK versions).

Tracing costs nothing but requires an OpenAI API key. You can use tracing with non-OpenAI models — the SDK lets you set a separate tracing key via set_tracing_export_api_key(). But you need at least one valid OpenAI API key even if you're running on Anthropic or local models.

Handoff history management. By default, when agent A hands off to agent B, the conversation rewrites to give B a clean context. The SDK provides filters in agents.extensions.handoff_filters for common rewrite strategies. If you need the full conversation visible to all agents, you'll need a custom handoff filter.

Tool guardrails fire on every call. If you register a tool guardrail on a frequently-used function, it runs on every invocation. Keep guardrail logic cheap — no heavy model calls or external API requests in the guardrail path.

Session serialization can surprise you. Sessions persist everything including tool outputs. If your tools return large data (files, images, long search results), the session database grows fast. Set a retention policy or prune old sessions.

When to Use the OpenAI Agents SDK

Good Fit

You're building on OpenAI infrastructure. If your stack already uses OpenAI models and the Responses API, the SDK adds agent orchestration with minimal new surface area.
Your workflow is router-based. A supervisor agent that hands off to specialists is the SDK's best pattern. It handles this naturally and efficiently.
You need built-in guardrails and tracing. If these are compliance requirements, the SDK ships them ready to go. No assembly required.
You're migrating from the Assistants API. The August 2026 sunset makes this urgent. The SDK is the recommended migration path.

Bad Fit

You need parallel agents with coordinated results. LangGraph handles this better with graph-structured fan-out/fan-in.
Your workflow needs multi-step human approval. AutoGen's conversation model with embedded human review is a better fit.
You're on a tight token budget. The SDK abstracts away token tracking per step. You can inspect traces post-hoc, but there's no built-in cost governor.
You need TypeScript support. It's coming but not here yet.

The Bottom Line

The OpenAI Agents SDK is the most batteries-included agent framework on the market for OpenAI-native stacks. Guardrails, tracing, sessions, MCP integration, and handoff-based multi-agent orchestration ship in a single pip install. The SDK's minimal abstraction layer — agents, tools, handoffs, guardrails — lets you build production agent workflows without learning a complex framework API.

But the SDK is opinionated. Handoffs are the only multi-agent pattern. Parallel execution is your responsibility. Human-in-the-loop beyond simple tripwires requires custom work. And for all its provider-agnostic claims, the SDK's best features are OpenAI-only.

Reach for the OpenAI Agents SDK when: you're on OpenAI, your multi-agent pattern is router-to-specialist, and you want guardrails and tracing without configuring external tools.

Reach for LangGraph when: you need explicit state control, parallel execution, and durable checkpoints — and you can invest in learning the graph model.

Reach for AutoGen when: your value is in emergent conversation patterns — agents that critique, iterate, and converge through dialogue.

Reach for CrewAI when: you're prototyping and need a multi-agent system working in an afternoon. Then evaluate whether the pattern justifies a production framework investment.

For a practical setup guide covering installation, agent configuration, and common patterns, see the OpenAI Agents SDK Setup Guide. For the broader agent framework landscape including LangChain and smaller players, see AI Agent Frameworks Compared (2026).

OpenAI Agents SDK: Architecture Deep-Dive and Framework Comparison