OpenRouter Fusion: Multi-Model Deliberation at API Call

A technical deep-dive into OpenRouter's Fusion API — how it routes prompts across a panel of models, synthesizes responses through a judge, what it costs, and when it beats calling a single model. Includes code samples, preset comparisons, and a worked cost analysis.

June 14, 2026
openrouterfusion-apimulti-model-routingmodel-ensemblingcost-optimizationllm-routingapi-designjudge-modelmodel-deliberation

Note:

Fusion is currently labeled as an experimental Labs feature. The API and behavior may change — OpenRouter explicitly warns against building long-term production dependencies on it as of June 2026.

What You'll Get Out of This

By the end, you'll know exactly what Fusion is, how to call it two different ways, what the structured response looks like, how much it actually costs, and a decision framework for when multi-model deliberation beats single-model routing.

What Is Fusion?

Fusion is a multi-model deliberation API from OpenRouter. You send one prompt; OpenRouter fans it out to a panel of LLMs running in parallel, then a configurable judge model reads every response and synthesizes them into a structured output with consensus points, contradictions, partial coverage, unique insights, and blind spots.

It launched as a public Labs experiment in early 2026, and the positioning is aggressive: "Fable-level intelligence at half the price."

The product is available in two forms:

ModeHow It WorksWhen to Use
model: "openrouter/fusion"Routes through a default panel + judge. Simple drop-in replacement for any model string.Quick prototyping. You want fusion with zero config.
openrouter:fusion server toolAttaches fusion as a tool your primary model can invoke at its discretion. The model decides if a prompt needs deliberation.Production systems. You want a single model for simple answers and fusion only for complex queries.

Both modes are identical under the hood — the server-tool version just gives the calling model agency over when to deliberate.

How Fusion Works (Architecture)

Fusion runs a three-stage pipeline:

  ┌──────────┐
  │  Prompt   │
  └────┬─────┘
       │
  ┌────▼─────────────────────┐
  │  STAGE 1: Fan Out        │
  │  Panel models (3-5)      │
  │  run the same prompt in   │
  │  parallel. Each has web    │
  │  search + bash tools.      │
  └────┬─────────────────────┘
       │
  ┌────▼─────────────────────┐
  │  STAGE 2: Judge Synthesis │
  │  A judge model reads all  │
  │  panel responses and      │
  │  extracts:                │
  │   • Consensus points      │
  │   • Contradictions        │
  │   • Partial coverage      │
  │   • Unique insights       │
  │   • Blind spots           │
  └────┬─────────────────────┘
       │
  ┌────▼─────────────────────┐
  │  STAGE 3: Final Response  │
  │  Returns structured JSON  │
  │  with analysis + raw      │
  │  panel responses.         │
  └───────────────────────────┘

Key architectural details:

  • Parallelism. All panel calls are made concurrently. Total latency is bounded by the slowest panel member plus judge inference — not the sum of all models.
  • Bounded deliberation. The fusion tool refuses to inject itself recursively. Inner calls carry an x-openrouter-fusion-depth header, and the plugin checks it before injecting again. This keeps the cost and latency of deliberation at exactly one level.
  • Tool access for panel models. Each panel model gets web search and bash tools. This means the models can research facts, run calculations, or query external systems before formulating their responses. Panel models see the same tool environment as a normal OpenRouter request with those tools enabled.
  • No caching between calls. Every fusion request runs fresh inference on every panel member and the judge. There is no result caching as of the current experimental version.

How to Use It

Method 1: Model Alias (Simplest)

Replace any model string with openrouter/fusion:

curl https://openrouter.ai/api/v1/chat/completions \
  -X POST \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openrouter/fusion",
    "messages": [
      {
        "role": "user",
        "content": "Survey the strongest arguments for and against a multiverse theory. Where do experts disagree?"
      }
    ]
  }'

This uses the Quality preset by default — a panel of three frontier-class models and a frontier judge. You don't need to configure anything else.

Method 2: Server Tool (Model-Controlled)

Attach openrouter:fusion as a tool on your primary model. The model decides when to invoke fusion:

curl https://openrouter.ai/api/v1/chat/completions \
  -X POST \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "~anthropic/claude-opus-latest",
    "messages": [
      {
        "role": "user",
        "content": "We need a risk assessment for our Q3 infrastructure migration. Consider resilience, cost, and vendor lock-in from multiple angles."
      }
    ],
    "tools": [
      { "type": "openrouter:fusion" }
    ]
  }'

The model will call fusion only when the prompt merits deliberation. For simple queries ("What's the weather in Berlin?"), it answers directly without invoking fusion at all — including invoking any other tools you've defined.

Method 3: Custom Configuration

Override the panel models and judge to control cost and quality:

{
  "model": "~anthropic/claude-opus-latest",
  "messages": [],
  "tools": [
    {
      "type": "openrouter:fusion",
      "parameters": {
        "analysis_models": [
          "~google/gemini-flash-latest",
          "deepseek/deepseek-v3.2",
          "~moonshotai/kimi-latest"
        ],
        "model": "~anthropic/claude-opus-latest"
      }
    }
  ]
}
  • analysis_models — the panel. Models that receive the prompt and produce independent responses.
  • model — the judge. The model that reads every panel response and produces the structured synthesis.

You can use the ~ prefix for auto-resolution to the latest version of a model family.

The Structured Response

The judge returns a JSON response with a structured analysis section plus the raw panel responses:

{
  "status": "ok",
  "analysis": {
    "consensus": [
      "All panel models agree that a multiverse is not directly testable with current technology."
    ],
    "contradictions": [
      {
        "topic": "String theory landscape",
        "stances": [
          {
            "model": "anthropic/claude-opus-4.5",
            "stance": "The string landscape is evidence for multiverse scenarios."
          },
          {
            "model": "openai/gpt-4.1",
            "stance": "The string landscape is a mathematical artifact, not physical evidence."
          }
        ]
      }
    ],
    "partial_coverage": [
      {
        "models": ["google/gemini-2.5-pro", "deepseek/deepseek-v3.2"],
        "point": "Boltzmann brain paradox as a critique of eternal inflation"
      }
    ],
    "unique_insights": [
      {
        "model": "anthropic/claude-opus-4.5",
        "insight": "Quantum Darwinism may select for a single observed universe without requiring a multiverse."
      }
    ],
    "blind_spots": [
      "No panel model addressed the philosophical implications of parsimony (Occam's razor)."
    ]
  },
  "responses": [
    {
      "model": "anthropic/claude-opus-4.5",
      "content": "..."
    },
    {
      "model": "openai/gpt-4.1",
      "content": "..."
    },
    {
      "model": "google/gemini-2.5-pro",
      "content": "..."
    }
  ]
}

This structure is what makes Fusion useful beyond a simple "average of models." You get explicit disagreement tracking, coverage analysis, and blind-spot identification — not just a blended answer.

Presets and Configuration

PresetPanelJudgeBest For
Quality (default)3 frontier models (Claude Opus, GPT-4.1, Gemini 2.5 Pro)Frontier judgeResearch, complex reasoning, high-stakes analysis
Budget3 cheaper/faster modelsBudget judgeCost-sensitive batch processing, experimentation
CustomUser-specified analysis_modelsUser-specified modelWhen you know your data and which models handle it best

The Quality and Budget presets are OpenRouter-managed — the exact panel composition may change over time as new models are released. Custom presets freeze your configuration.

Cost Analysis

Cost is the primary trade-off. Fusion runs N panel calls + 1 judge call for every request.

Base-case math (Quality preset, default panel)

ComponentModelApprox. Cost (per 1M input tokens)
Panel model 1~anthropic/claude-opus-latest$15.00
Panel model 2~openai/gpt-4.1$10.00
Panel model 3~google/gemini-2.5-pro$2.50
Judge~anthropic/claude-opus-latest$15.00
Total (input)$42.50 per 1M input tokens

Compare with a single call to Claude Opus at $15.00/1M input. Fusion costs ~2.8× the input token cost for the default panel. But the actual multiplier is higher in practice because:

  1. Output tokens multiply too — each panel model generates a full response, and the judge generates an analysis. With the default 3-model panel, expect 4–5× the total cost of a single completion on the same prompt.
  2. Tool usage — panel models have web search and bash tools. Actual tool call costs vary by invocation.
  3. No output caching — every fusion call runs fresh.

Cost optimization strategies

  • Use the Budget preset. Cost drops to roughly 2–3× a single mid-tier model call.
  • Bring your own panel. If you know cheaper models that handle your domain well, specify them in a custom config.
  • Server tool mode. The calling model only invokes fusion when it decides the prompt needs deliberation. For a typical workload where 70% of prompts are simple, this can dramatically reduce average cost.
  • Use free tier models for the panel during experimentation. Fusion can route through OpenRouter's free models.

Cost/quality comparison

ApproachRelative CostOutput Quality
Single cheap model (Gemini Flash)Baseline
Single frontier model (Claude Opus)6–8×Good
Fusion (Budget preset)~15×Better
Fusion (Quality preset)~30×Best
Manual: run 3 models, synthesize yourself~20× + dev timeComparable to fusion

These are rough multiples for input-heavy workloads. Your actual ratio depends on prompt length and output token counts.

When to Use Fusion vs. Single-Model Routing

Use Fusion When

  • The question has no single correct answer — policy analysis, strategic planning, architectural decisions. Multiple perspectives are inherently valuable.
  • You need blind-spot detection — the structured output explicitly surfaces what wasn't covered.
  • Answer quality matters more than per-call cost — research, legal analysis, regulatory compliance.
  • You're doing speculative analysis — the contradictions section gives you a map of where models disagree, which is itself useful signal.
  • High-stakes production with low throughput — critical decisions where the extra latency (seconds to tens of seconds) is acceptable.

Skip Fusion When

  • You're building a chatbot — latency and cost kill the UX.
  • High-throughput batch processing — the cost multiplier doesn't amortize well at scale.
  • The question is factual and well-defined — "What's the capital of France?" doesn't benefit from deliberation.
  • You're already getting good results from a single model — Fusion's quality gain is workload-dependent. Test before committing.
  • You need predictable, low-latency responses — fusion's tail latency is the max of all panel members plus judge inference.

Pitfalls

1. Fusion is experimental. OpenRouter explicitly labels it as a Labs feature. The API, pricing, and behavior could change without notice. Do not hardcode long-term production dependencies on it.

2. Cost can surprise you. A single Quality-preset call can cost 4–5× a normal completion. Without monitoring, it's easy to burn through credits in a batch run. Log the number of fusion calls separately from normal completions.

3. Tail latency is the slowest panel member. If one model in the panel is slow or rate-limited, the entire request waits. You can't cancel individual panel members. In the worst case, a failing panel model can cause the entire fusion call to time out.

4. Quality gain is not universal. For simple prompts, Fusion provides negligible benefit over a strong single model. The value appears on complex, open-ended, or analytical prompts. Benchmark your specific use case before committing.

5. Panel models see your full prompt. Every model in the panel receives the complete prompt with all tools. If you have data isolation requirements (regulatory, compliance), understand that your prompt is effectively shared across every provider that serves the panel models.

6. No caching. Every fusion request runs fresh inference on all models. There's no result caching, no sharing across panel calls, and no prompt caching that spans the full pipeline. This is the biggest lever for cost optimization — and it's not available yet.

7. Server-tool mode can mask costs. When using openrouter:fusion as a server tool, your primary model decides when to invoke it. This is good for cost control, but the decision boundary is opaque — you don't know how often it fires until you log it.

How It Compares to Other Routing Approaches

ApproachWhat It DoesCostLatencyUse Case
Fusion (Quality)3 frontier models + judge deliberation4–5× single callHigh (parallel + judge)High-stakes analysis
Fusion (Budget)3 cheap models + budget judge2–3× single callMediumCost-sensitive deliberation
Auto RouterRoute to cheapest model that can handle the prompt0.5–1×LowCost optimization without quality loss
Provider RoutingFailover across providers for the same modelLowReliability, best-price, BYOK
Model FallbacksCascade: try A, fall back to B, fall back to C1–3×VariableReliability with degraded quality
Manual ensemblingCall N APIs, synthesize yourselfN× + dev timeHighFull control, but high engineering cost

Fusion is the only approach that gives you structured disagreement analysis and blind-spot coverage out of the box. Every other approach either routes to a single model (Auto Router, Provider Routing) or provides fallback semantics without deliberation (Model Fallbacks).

Putting It Together: When Fusion Makes Sense

Fusion is not a replacement for your default model endpoint. It's a specialized tool for the subset of prompts where multiple perspectives surface better answers than any single model can produce.

The server-tool mode (openrouter:fusion) is the most practical deployment pattern: let your primary model answer the easy questions directly, and invoke fusion only when the prompt merits deliberation. This gives you the cost profile of a single model for 70–90% of traffic while keeping Fusion available for the complex queries that actually benefit from it.

The budget-conscious pattern is: test with the Quality preset on your hardest prompts, validate that the structured output (contradictions, blind spots) actually adds value for your use case, then move to the Budget preset for production. If the output quality holds, you get deliberation economics at roughly 2× a single call.