ElevenLabs MCP Server

Official ElevenLabs MCP server for AI text-to-speech generation, voice cloning, audio transcription, sound design, and speech-to-speech conversion with 29+ languages.

June 10, 2026
MCP ServerSpecialized ToolsElevenLabs MCP Server
GitHub starsPyPI version

Overview

The ElevenLabs MCP Server is the official Model Context Protocol integration for ElevenLabs' industry-leading text-to-speech and audio AI platform. It enables AI assistants to generate natural-sounding speech, clone voices, transcribe audio, and design sound effects — all through a standardised MCP interface.

Official Server:

Developed and maintained by ElevenLabs. Actively maintained with frequent releases. Free tier includes 10,000 credits per month.

Key Features

🎙️

Text-to-Speech

Generate natural-sounding speech from text in 29+ languages with thousands of voices

🗣️

Voice Cloning

Clone voices from audio samples with instant or professional-level fidelity

📝

Speech-to-Text

Transcribe audio files with speaker diarisation and timestamp support

🎵

Sound Design

Generate sound effects and ambient audio from text descriptions

🔄

Speech-to-Speech

Convert speech to a different voice while preserving emotion and prosody

📁

Flexible Output

Save to disk, return as MCP resources, or both — configurable output modes

Available Tools

Quick Reference

ToolPurposeCategory
generate_tts_audioConvert text to speech with voice selectionTTS
generate_sound_effectCreate sound effects from text promptsSound Design
design_voiceDesign a new custom voice from parametersVoice Design
get_user_voicesList all voices in your ElevenLabs libraryVoice Management
audio_isolationIsolate speech from background noiseAudio Processing
speech_to_textTranscribe audio to text with diarisationTranscription
convert_speechConvert speech from one voice to anotherSpeech-to-Speech

Detailed Usage

generate_tts_audio

Generate natural speech from text with full control over voice, model, and voice settings.

{
  "text": "Welcome to the future of audio AI — realistic voices, instant voice cloning, and professional sound design.",
  "voice_id": "your-voice-id",
  "model_id": "eleven_multilingual_v2",
  "output_format": "mp3_44100_128"
}
generate_sound_effect

Create sound effects from text descriptions. Ideal for game dev, podcasts, and video.

{
  "text": "Thunderstorm in a dense jungle with animals reacting to the weather",
  "duration_seconds": 15
}
speech_to_text

Transcribe audio files with high accuracy, including speaker identification.

{
  "audio_file": "/path/to/meeting-recording.mp3",
  "language_code": "en",
  "diarize": true
}
convert_speech

Convert recorded speech to sound like a different voice while preserving delivery.

{
  "audio_file": "/path/to/my-voice.mp3",
  "voice_id": "your-voice-id",
  "model_id": "eleven_multilingual_sts_v2"
}

Installation

{
  "mcpServers": {
    "elevenlabs": {
      "command": "uvx",
      "args": ["elevenlabs-mcp"],
      "env": {
        "ELEVENLABS_API_KEY": "your-api-key"
      }
    }
  }
}

Output Configuration

Control how generated files are returned with the ELEVENLABS_MCP_OUTPUT_MODE variable:

ModeBehaviour
files (default)Save audio to disk, return file paths
resourcesReturn audio as base64-encoded MCP resources
bothSave to disk AND return as resources
{
  "mcpServers": {
    "elevenlabs": {
      "command": "uvx",
      "args": ["elevenlabs-mcp"],
      "env": {
        "ELEVENLABS_API_KEY": "your-api-key",
        "ELEVENLABS_MCP_OUTPUT_MODE": "both",
        "ELEVENLABS_MCP_BASE_PATH": "~/Desktop"
      }
    }
  }
}

API Key Required:

Get a free ElevenLabs API key from elevenlabs.io/app/settings/api-keys. Free tier includes 10,000 credits per month.

Example Use Cases

  • Voice Narration: Generate voice-over narration for videos, podcasts, and presentations
  • Accessibility: Create audio versions of written content for visually impaired users
  • Game Development: Produce character voices and environmental sound effects
  • Content Localisation: Generate speech in 29+ languages for global audiences
  • Meeting Transcription: Transcribe and identify speakers in meeting recordings
  • Character Design: Create unique voice profiles for AI agents and virtual characters

Security

  • API key authentication for all requests
  • Data residency options for enterprise (EU/US selection)
  • Free tier credits reset monthly — no unexpected billing
  • Audio files saved locally or returned inline, never stored by ElevenLabs
  • HTTPS encryption for all API communication

Sources