Text to Speech MCP Server

Text to Speech MCP servers enable AI models to convert text into natural-sounding speech, providing capabilities for real-time audio generation, voice synthesis, and multilingual support.

GitHub starsPyPI versionPyPI downloads

Overview

The RealtimeTTS MCP Server enables AI models to convert text into speech in real-time. This server is built on the powerful RealtimeTTS Python library, which is designed for low-latency text-to-speech applications. It supports a wide range of TTS engines, making it a versatile solution for adding voice capabilities to AI agents.

Created by:

Developed by KoljaB

Key Features

Low-Latency Conversion

Almost instantaneous text-to-speech conversion, ideal for real-time interactions

🔊

High-Quality Audio

Generates clear and natural-sounding speech

🔄

Multiple TTS Engines

Supports OpenAI TTS, ElevenLabs, Azure, Coqui TTS, and more

🌍

Multilingual Support

Provides speech synthesis in multiple languages

Available Tools

Quick Reference

ToolPurposeCategory
synthesizeConvert text to speechCore
streamStream synthesized audioCore
set_engineSelect the TTS engineConfiguration
get_enginesList available enginesDiscovery

Detailed Usage

synthesize

Convert a string of text into speech and play it.

use_mcp_tool({
  server_name: "text_to_speech",
  tool_name: "synthesize",
  arguments: {
    text: "Hello, world! This is a test."
  }
});
stream

Stream synthesized audio in real-time as it's generated.

use_mcp_tool({
  server_name: "text_to_speech",
  tool_name: "stream",
  arguments: {
    text: "This is a streaming test to demonstrate real-time audio synthesis."
  }
});
set_engine

Select the TTS engine to use for speech synthesis.

use_mcp_tool({
  server_name: "text_to_speech",
  tool_name: "set_engine",
  arguments: {
    engine: "elevenlabs"
  }
});
get_engines

Get a list of available TTS engines.

use_mcp_tool({
  server_name: "text_to_speech",
  tool_name: "get_engines",
  arguments: {}
});

Installation

{
  "mcpServers": {
    "text_to_speech": {
      "command": "pip",
      "args": [
        "install",
        "realtimetts[all]"
      ]
    }
  }
}

Common Use Cases

1. Voice-Enabled AI Assistants

Provide voice output for AI assistants and chatbots.

// Let the assistant speak its response
use_mcp_tool({
  server_name: "text_to_speech",
  tool_name: "synthesize",
  arguments: {
    text: "I'm sorry, I didn't understand that. Could you please rephrase?"
  }
});

2. Accessibility

Make applications more accessible by providing audio versions of text content.

// Read the content of an article aloud
use_mcp_tool({
  server_name: "text_to_speech",
  tool_name: "synthesize",
  arguments: {
    text: articleContent
  }
});

3. Real-Time Notifications

Create audible notifications for events in your applications.

// Announce a new message
use_mcp_tool({
  server_name: "text_to_speech",
  tool_name: "synthesize",
  arguments: {
    text: "You have a new message from Jane."
  }
});

Sources