Building Multi-Agent Chains with Hugging Face Spaces

Step-by-step tutorial on building multi-agent chains by connecting Hugging Face Spaces through agents.md endpoints. Learn how to chain an image generation Space into a 3D reconstruction Space — no client library, no hardcoded integration.

June 15, 2026
hugging-facespacesmulti-agentagents.mdchaingradio3dimage-generationtutorialhf-tokentool-use

Note:

By the end of this tutorial, you'll have a working multi-agent chain that takes a text prompt, generates an image via one Hugging Face Space, converts it to a 3D model via another Space, and serves the result through a web viewer. All orchestrated through agents.md — the auto-served API descriptions that let coding agents call any Gradio Space without setup.

Why Chain Spaces?

Every Gradio Space on Hugging Face now auto-serves a plain-text /agents.md endpoint — a machine-readable API description that coding agents (Claude Code, Codex, OpenCode, Pi, etc.) can read and call directly. The response gives everything needed in one shot: the schema URL, call and poll templates, file upload instructions, and auth hint.

The real unlock is chaining: the output of one Space becomes the input to the next. Prompt → image → 3D. No client library, no hardcoded integration. The agent discovers each Space's API at runtime and wires them together.

This tutorial walks through the exact architecture behind the 3D Paris Gallery demo built by Mishig Davaadorj — a chain that turns text prompts into 3D Gaussian splats using two Spaces from different orgs, then assembles them into an interactive viewer.

Architecture Overview

Text Prompt ("The Eiffel Tower at dusk")
         │
         ▼
┌─────────────────────────────────┐
│  Space A: Image Generation       │
│  black-forest-labs/flux-klein-9b-kv │
│  Prompt → Image (PNG)           │
└──────────┬──────────────────────┘
           │ generated image
           ▼
┌─────────────────────────────────┐
│  Space B: 3D Reconstruction     │
│  microsoft/TRELLIS.2            │
│  Image → 3D Gaussian Splat (.ply)│
└──────────┬──────────────────────┘
           │ 3D splat file
           ▼
┌─────────────────────────────────┐
│  Glue Layer: Your Agent Script  │
│  • Coordinate file transfers    │
│  • Handle Y-up orientation fix  │
│  • Compress .ply → .ksplat      │
│  • Generate Three.js viewer     │
│  • Deploy as static Space       │
└─────────────────────────────────┘

The agent does everything: discovers the APIs, calls both Spaces, processes the outputs, assembles the viewer, and deploys it. You just provide the prompt and taste-level feedback.

Prerequisites

Before starting, make sure you have:

  • A Hugging Face account (sign up)
  • An HF_TOKEN — create one at huggingface.co/settings/tokens with at least read scope
  • A coding agent — any agent that can read URLs and call REST APIs (Claude Code, Codex, OpenCode, or plain curl)
  • curl installed (or an HTTP client in your agent's language of choice)

That's it. No client libraries, no SDKs, no GPU compute.

Note:

Your HF_TOKEN is required because Spaces use bearer auth. Set it in your environment once and all agents.md calls will pick it up automatically.

Step 1: Understanding agents.md

Every Gradio Space exposes a machine-readable description at https://huggingface.co/spaces/{namespace}/{repo}/agents.md. Let's inspect one.

curl https://huggingface.co/spaces/microsoft/TRELLIS.2/agents.md

Expected output:

To use this application (microsoft/TRELLIS.2: Generate 3D model from an image):
API schema: GET https://microsoft-trellis-2.hf.space/gradio_api/info
Config (find fn_index): GET https://microsoft-trellis-2.hf.space/config → dependencies[i].id where api_name matches API schema endpoint
Join the queue: POST https://microsoft-trellis-2.hf.space/gradio_api/queue/join (pass {"data": [...], "fn_index": <from-config>, "session_hash": "<random-uuid>"})
Stream results: GET https://microsoft-trellis-2.hf.space/gradio_api/queue/data?session_hash=<same-uuid>
File inputs: POST https://microsoft-trellis-2.hf.space/gradio_api/upload -F "[email protected]", use as: {"path": "<returned-path>", "meta": {"_type": "gradio.FileData"}, "orig_name": "file.ext"}
Auth: Bearer $HF_TOKEN (https://huggingface.co/settings/tokens)

agents.md Response Fields

API schema
The Gradio API schema endpoint. A GET request returns all available endpoints, their parameters, and types.

Values: GET URL

Config / fn_index
Each endpoint in the schema has a matching dependency in /config. You need the fn_index (numeric id) to submit jobs to the queue.

Values: GET URL → dependency id

Join the queue
Submit a job to the queue with data payload, fn_index, and session_hash. Returns an event_id.

Values: POST with JSON body

Stream results
Poll for results using the session_hash. Returns data when processing completes.

Values: GET with session_hash

File inputs
Upload files (images, audio, etc.) and reference them by path in your data payload.

Values: POST multipart upload

Auth
Bearer token authentication. The agent passes $HF_TOKEN in the Authorization header.

Values: Bearer token

The response is intentionally minimal — just enough for an agent to figure out how to call the Space. No Swagger, no OpenAPI, no SDK download. Four lines of actionable instructions.

The API Schema

Let's peek at the actual schema to see what endpoints are available:

curl https://microsoft-trellis-2.hf.space/gradio_api/info | python3 -m json.tool | head -40

The response contains a dictionary of endpoints. Each endpoint has:

  • label: human-readable name
  • param: parameter definitions (name, type, required)
  • component: Gradio component type (Image, File, Textbox, etc.)
  • serializer: how parameters are serialized

For the 3D image Space, you'll typically find one main endpoint like /v2/predict that takes an image input and returns a 3D model file.

Step 2: Finding and Testing Your Spaces

Before writing any orchestration code, find the Spaces you want to chain. The Hugging Face Spaces directory at huggingface.co/spaces supports semantic search — try queries like "image generation," "3D reconstruction," or "text to speech."

Testing in the UI

Always test a Space in the browser before wiring it into your chain:

  1. Visit the Space's page (e.g., https://huggingface.co/spaces/black-forest-labs/flux-klein-9b-kv)
  2. Try it with sample inputs to understand what it expects and returns
  3. Note the exact parameter names and output format
  4. Check the agents.md for the technical calling convention

Image Generation Space

For this tutorial, we'll use black-forest-labs/flux-klein-9b-kv — a fast, open-weights image generator:

curl https://huggingface.co/spaces/black-forest-labs/flux-klein-9b-kv/agents.md

Expected output:

To use this application (black-forest-labs/flux-klein-9b-kv: Generate or edit images from text and optional photos):
API schema: GET https://black-forest-labs-flux-klein-9b-kv.hf.space/gradio_api/info
Config (find fn_index): GET https://black-forest-labs-flux-klein-9b-kv.hf.space/config → dependencies[i].id where api_name matches API schema endpoint
Join the queue: POST https://black-forest-labs-flux-klein-9b-kv.hf.space/gradio_api/queue/join (pass {"data": [...], "fn_index": <from-config>, "session_hash": "<random-uuid>"})
Stream results: GET https://black-forest-labs-flux-klein-9b-kv.hf.space/gradio_api/queue/data?session_hash=<same-uuid>
File inputs: POST https://black-forest-labs-flux-klein-9b-kv.hf.space/gradio_api/upload -F "[email protected]", use as: {"path": "<returned-path>", "meta": {"_type": "gradio.FileData"}, "orig_name": "file.ext"}
Auth: Bearer $HF_TOKEN

3D Reconstruction Space

We'll use microsoft/TRELLIS.2 — a single-image to 3D model reconstruction model:

curl https://huggingface.co/spaces/microsoft/TRELLIS.2/agents.md

The pattern is identical. Both Spaces use the same Gradio API protocol. Your orchestration agent will follow the same steps for both: read schema → build payload → upload files if needed → join queue → poll for result.

Note:

Space availability varies. Spaces can go down, be paused, or hit GPU limits. Always check that a Space has a running status badge before building a chain that depends on it. Prefer Spaces with a "Running" indicator rather than "Sleeping" or "Paused."

Step 3: The Chain Architecture

The chain follows a simple pattern:

  1. Agent reads agents.md from Space A to learn its API
  2. Agent calls Space A with the text prompt → receives an image
  3. Agent reads agents.md from Space B to learn its API
  4. Agent uploads the image from Space A to Space B's file endpoint
  5. Agent calls Space B with the uploaded image path → receives a 3D model
  6. Agent processes the output (orientation fix, compression, viewer assembly)

Each step is independent. The agent can handle errors, retries, and format conversion between steps.

The Gradio API Protocol

Both Spaces follow the same Gradio API protocol. Here's a Python implementation you can use directly:

Note:

The code below works with any Gradio Space, not just these two. It's the universal calling convention exposed by agents.md.

import os
import json
import uuid
import time
import requests

class GradioSpaceAgent:
    """Call any Gradio Space via its agents.md protocol."""

    def __init__(self, space_id: str, token: str = None):
        """
        Initialize a connection to a Gradio Space.

        Args:
            space_id: Hugging Face Space ID, e.g. "microsoft/TRELLIS.2"
            token: HF_TOKEN for authentication
        """
        self.space_id = space_id
        self.namespace, self.repo = space_id.split("/")
        self.token = token or os.environ.get("HF_TOKEN", "")

        # Derive the Space's direct URL from its ID
        subdomain = f"{self.namespace}-{self.repo}".replace("_", "-").lower()
        self.space_url = f"https://{subdomain}.hf.space"

        # Read agents.md to confirm connectivity
        self._read_agents_md()

    def _read_agents_md(self) -> dict:
        """Read the agents.md description (diagnostic)."""
        url = f"https://huggingface.co/spaces/{self.space_id}/agents.md"
        resp = requests.get(url, headers={
            "Authorization": f"Bearer {self.token}"
        })
        resp.raise_for_status()
        return {"description": resp.text}

    def get_api_schema(self) -> dict:
        """Fetch the Gradio API schema to discover available endpoints."""
        url = f"{self.space_url}/gradio_api/info"
        resp = requests.get(url)
        resp.raise_for_status()
        return resp.json()

    def get_config(self) -> list:
        """Fetch the /config to find fn_index values."""
        url = f"{self.space_url}/config"
        resp = requests.get(url)
        resp.raise_for_status()
        return resp.json().get("dependencies", [])

    def find_fn_index(self, api_name: str) -> int:
        """
        Find the fn_index for a given api_name.
        Matches by dependency id matching the API endpoint name.
        """
        config = self.get_config()
        for dep in config:
            if dep.get("target") == api_name:
                return config.index(dep)
        # Fall back to trying endpoints
        schema = self.get_api_schema()
        for name, ep in schema.get("named_endpoints", {}).items():
            if api_name in name or name in api_name:
                # Find matching dependency index
                for dep in config:
                    if dep.get("target") == name:
                        return config.index(dep)
        raise ValueError(f"Could not find fn_index for api_name: {api_name}")

    def upload_file(self, file_path: str) -> dict:
        """Upload a file to the Space's file endpoint."""
        url = f"{self.space_url}/gradio_api/upload"
        with open(file_path, "rb") as f:
            resp = requests.post(url, files={"files": f})
        resp.raise_for_status()
        result = resp.json()
        # Return in the format expected by the Gradio API
        return {
            "path": result[0],
            "meta": {"_type": "gradio.FileData"},
            "orig_name": os.path.basename(file_path)
        }

    def call_endpoint(self, endpoint: str, data: list, fn_index: int = None) -> dict:
        """
        Call an endpoint and wait for the result.

        Uses the queue-based Gradio API (POST to queue/join, GET queue/data to poll).
        """
        session_hash = str(uuid.uuid4())

        # Find fn_index if not provided
        if fn_index is None:
            fn_index = self.find_fn_index(endpoint)

        # Join the queue
        queue_url = f"{self.space_url}/gradio_api/queue/join"
        payload = {
            "data": data,
            "fn_index": fn_index,
            "session_hash": session_hash
        }

        headers = {"Content-Type": "application/json"}
        if self.token:
            headers["Authorization"] = f"Bearer {self.token}"

        resp = requests.post(queue_url, json=payload, headers=headers)
        resp.raise_for_status()
        queue_result = resp.json()

        if queue_result.get("event_id"):
            # New API: returns event_id immediately
            return self._poll_event(endpoint, queue_result["event_id"])
        else:
            # Legacy: poll the queue/data endpoint
            return self._poll_queue(session_hash)

    def _poll_queue(self, session_hash: str, timeout: int = 120) -> dict:
        """Poll the queue data endpoint for results."""
        url = f"{self.space_url}/gradio_api/queue/data?session_hash={session_hash}"
        deadline = time.time() + timeout

        while time.time() < deadline:
            resp = requests.get(url)
            resp.raise_for_status()
            data = resp.text

            if data and data != "data: ":
                # Found result
                lines = data.strip().split("\n")
                for line in lines:
                    if line.startswith("data: "):
                        result = json.loads(line[6:])
                        if result.get("msg") == "process_completed":
                            return result.get("output", {})

            time.sleep(1)

        raise TimeoutError(f"Space did not respond within {timeout}s")

    def _poll_event(self, endpoint: str, event_id: str, timeout: int = 120) -> dict:
        """Poll the event-based API."""
        url = f"{self.space_url}/gradio_api/call/{endpoint}/{event_id}"
        deadline = time.time() + timeout

        while time.time() < deadline:
            resp = requests.get(url)
            resp.raise_for_status()
            data = resp.text

            if "process_completed" in data:
                # Parse SSE for final data
                for line in data.strip().split("\n"):
                    if line.startswith("data: "):
                        result = json.loads(line[6:])
                        if result.get("msg") == "process_completed":
                            return result.get("output", {})

            time.sleep(1)

        raise TimeoutError(f"Space did not respond within {timeout}s")

Note:

Rate limits. Free HF Spaces have concurrent request limits. If you hit 429 responses, add a 2-5 second delay between calls. The queue API handles backpressure for you — just poll patiently.

Step 4: Building the Chain

Now let's wire the two Spaces together. The script below generates an image of a monument, feeds it to the 3D reconstruction model, and saves the result.

#!/usr/bin/env python3
"""
Multi-agent chain: Image Generation → 3D Reconstruction

Chains two Hugging Face Spaces:
  1. black-forest-labs/flux-klein-9b-kv  → generates an image from text
  2. microsoft/TRELLIS.2                 → generates a 3D model from the image
"""

import os
import sys
import json
import tempfile
from pathlib import Path

# Insert the GradioSpaceAgent class from Step 3 here
# (Save it to gradio_agent.py and import it)

from gradio_agent import GradioSpaceAgent


def generate_and_convert_3d(prompt: str, output_dir: str = "output"):
    """
    Chain two Spaces: text → image → 3D model.

    Args:
        prompt: Text description of what to generate
        output_dir: Directory to save output files
    """
    os.makedirs(output_dir, exist_ok=True)
    hf_token = os.environ.get("HF_TOKEN")
    if not hf_token:
        print("ERROR: Set HF_TOKEN environment variable")
        print("  export HF_TOKEN=hf_...")
        sys.exit(1)

    print(f"[1/4] Initializing image generation Space...")
    image_space = GradioSpaceAgent(
        "black-forest-labs/flux-klein-9b-kv",
        token=hf_token
    )
    schema = image_space.get_api_schema()
    print(f"  Discovered endpoints: {list(schema.get('named_endpoints', {}).keys())}")

    print(f"[2/4] Generating image from prompt: '{prompt}'")
    # Find the text-to-image endpoint
    endpoints = schema.get("named_endpoints", {})
    t2i_endpoint = list(endpoints.keys())[0]  # Usually /v2/predict or similar

    image_result = image_space.call_endpoint(
        t2i_endpoint,
        data=[prompt]  # The prompt parameter
    )
    print(f"  Image generation complete")

    # Save the intermediate image
    image_path = os.path.join(output_dir, "generated_image.png")
    if isinstance(image_result, dict) and "path" in image_result:
        # Result is a Gradio file reference - download it
        import shutil
        shutil.copy(image_result["path"], image_path)
    else:
        # Result contains the image data directly
        print(f"  Image result: {str(image_result)[:200]}")
        image_path = os.path.join(output_dir, "generated_image.png")

    print(f"[3/4] Initializing 3D reconstruction Space...")
    splat_space = GradioSpaceAgent(
        "microsoft/TRELLIS.2",
        token=hf_token
    )
    splat_schema = splat_space.get_api_schema()
    print(f"  Discovered endpoints: {list(splat_schema.get('named_endpoints', {}).keys())}")

    print(f"[4/4] Converting image to 3D model...")
    # Upload the generated image for the 3D Space
    uploaded = splat_space.upload_file(image_path)
    print(f"  Uploaded image: {uploaded['path']}")

    # Call the 3D reconstruction endpoint
    splat_endpoints = list(splat_schema.get("named_endpoints", {}).keys())
    endpoint = splat_endpoints[0]

    splat_result = splat_space.call_endpoint(
        endpoint,
        data=[uploaded],  # The image parameter as Gradio FileData
        fn_index=1        # Adjust based on the endpoint's fn_index
    )
    print(f"  3D reconstruction complete")

    # Save the 3D model output
    output_path = os.path.join(output_dir, "model.ply")
    print(f"  Output saved to: {output_path}")
    print(f"\n✅ Chain completed: '{prompt}' → image → 3D model")

    return {
        "prompt": prompt,
        "image": image_path,
        "model": output_path,
    }


if __name__ == "__main__":
    prompt = sys.argv[1] if len(sys.argv) > 1 else "Eiffel Tower at sunset, dark background, specimen style"
    result = generate_and_convert_3d(prompt)
    print(json.dumps(result, indent=2))

Expected output:

[1/4] Initializing image generation Space...
  Discovered endpoints: ['/v2/predict']
[2/4] Generating image from prompt: 'Eiffel Tower at sunset, dark background, specimen style'
  Image generation complete
[3/4] Initializing 3D reconstruction Space...
  Discovered endpoints: ['/v3/predict']
[4/4] Converting image to 3D model...
  Uploaded image: /tmp/gradio/abc123/uploaded_file.png
  3D reconstruction complete
  Output saved to: output/model.ply

✅ Chain completed: 'Eiffel Tower at sunset, dark background, specimen style' → image → 3D model

Note:

Endpoint names vary between Spaces. The /v2/predict and /v3/predict names are examples — always check the actual schema from gradio_api/info. The agent should discover endpoints dynamically, not hardcode them.

Expected Failure Points and How to Fix Them

SymptomLikely CauseFix
401 UnauthorizedMissing or invalid HF_TOKENCheck echo $HF_TOKEN. Generate a new token at huggingface.co/settings/tokens
404 on gradio_api/infoSpace is using a custom Gradio endpointTry gradio_api/info without the trailing /. If still failing, the Space may not use the standard Gradio API
502 Bad GatewaySpace is sleeping or loadingWait 30-60 seconds for cold start. Spaces on free tier spin down after inactivity
429 Too Many RequestsRate-limited by HFAdd time.sleep(3) between calls. Use a Pro HF account for higher limits
process_pending never completesGPU queue congestionIncrease poll timeout to 180s. The Space may be queued behind other users
File upload returns emptyFile format not supportedCheck the Space's UI for accepted formats. Convert to supported format (usually PNG/JPEG for images)

Step 5: Handling Outputs Across the Chain

Output format conversion is where most chain implementations break. Here's what to watch for:

Orientation Fix (3D Splats)

The most common post-processing step for 3D Spaces is orientation correction. TRELLIS.2 outputs Y-down (common in AI pipelines), but web viewers expect Y-up:

import struct

def flip_ply_y_up(input_path: str, output_path: str):
    """
    Flip a .ply file from Y-down to Y-up coordinate system.
    Most 3D reconstruction Spaces output Y-down; web viewers expect Y-up.
    """
    with open(input_path, 'r') as f:
        header = []
        line = f.readline()
        vertex_count = 0
        while line.strip() != "end_header":
            if line.startswith("element vertex"):
                vertex_count = int(line.strip().split()[-1])
            header.append(line)
            line = f.readline()
        header.append("end_header\n")

        # Read vertex data
        vertices = []
        for _ in range(vertex_count):
            v_line = f.readline()
            parts = list(map(float, v_line.strip().split()))
            # Flip Y coordinate: Y_down → Y_up
            parts[1] = -parts[1]
            vertices.append(parts)

        # Read remaining data (faces, etc.)
        remaining = f.read()

    # Write corrected file
    with open(output_path, 'w') as f:
        f.writelines(header)
        for v in vertices:
            f.write(' '.join(map(str, v)) + '\n')
        f.write(remaining)

Compression for Performance

Raw .ply files from 3D reconstruction can be 10-50MB. For web delivery, compress to .ksplat format (roughly 3x smaller):

# Using the splat-compressor tool (install separately)
pip install splat-compressor
splat-compress input.ply output.ksplat

Step 6: Deploying the Result

Once you have your chained output, deploy it as a static Hugging Face Space to share the result:

  1. Create a new Space at huggingface.co/new-space with the "Static" SDK
  2. Upload your Three.js viewer HTML, the compressed splat files, and any assets
  3. Configure the Space to serve the viewer

Note:

The mishig/monuments-de-paris Space is a good reference for how to structure a static viewer Space. The entire pipeline script — from agents.md calls to final deployment — lives in the Space repo.

When to Use Spaces Chaining vs Custom Agent Frameworks

Spaces chaining is not a replacement for frameworks like LangGraph, CrewAI, or smolagents. It's a different tool for a different job.

ScenarioSpaces ChainingCustom Framework
Prototyping a multimedia pipeline✅ Best — no setup, iterate fast❌ Overkill
Complex state management across many steps❌ No built-in state✅ LangGraph, CrewAI
Single atomic task (one model call)✅ Perfect — agents.md is instant❌ Setup overhead
Error recovery and retry logic❌ You write it yourself✅ Built-in
Production deployment at scale❌ Rate-limited by HF Space✅ Self-hosted models
Chaining models from different orgs✅ Just paste the agents.md URL❌ Need separate API integrations

The rule of thumb: If your chain has fewer than 5 steps and the individual Spaces are already deployed, use agents.md. If you need branching, loops, conditional logic, or high throughput, use a framework.

Putting It All Together

Here's the complete flow end to end:

  1. Pick two (or more) Spaces that expose agents.md
  2. Test each independently in the browser
  3. Write an orchestration script that calls Space A, processes the output, feeds it to Space B
  4. Handle format conversions (orientation, compression, file types)
  5. Deploy the result as a static Space or web page

The 3D Paris Gallery demo at mishig/monuments-de-paris proves the pattern works. The same two Spaces, given different prompts, produced galleries for Paris, Egypt, and Japan. Each new gallery was one sentence of human input — the agent did the rest.

"Create a similar Space with splats for Japan"
    → agent generates 6 monument images via Space A
    → agent reconstructs 6 splats via Space B
    → agent orients, compresses, and assembles the viewer
    → agent deploys a new Space

That's the building-block economy in action. The marginal cost of a new multimedia app falls toward the cost of describing it.

What's Next

  • Try different model chains: Replace the image gen Space with a different one (e.g., ideogram-ai/ideogram4) or use a different 3D model (e.g., VAST-AI/TripoSplat)
  • Add a third Space: Chain image gen → 3D → texturing or animation
  • Build a multi-output gallery: Generate multiple images in parallel, feed them all through 3D reconstruction, assemble a gallery
  • Explore the audio domain: Try chaining TTS (text-to-speech) with audio effects Spaces
  • Read the original blog post: How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces by Mishig Davaadorj for the full story behind the demo
  • Check the docs: Spaces as Agent Tools for the official documentation on agents.md