Mag1cFall
diff --git a/‎README.md‎
Lines changed: 56 additions & 1 deletion b/‎README.md‎
Lines changed: 56 additions & 1 deletion
diff --git a/‎README_en.md‎
Lines changed: 56 additions & 1 deletion b/‎README_en.md‎
Lines changed: 56 additions & 1 deletion
diff --git a/‎data/excluded_models.txt‎
Lines changed: 0 additions & 2 deletions b/‎data/excluded_models.txt‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎docs/api-usage.md‎
Lines changed: 35 additions & 0 deletions b/‎docs/api-usage.md‎
Lines changed: 35 additions & 0 deletions
diff --git a/‎docs/development-guide.md‎
Lines changed: 5 additions & 0 deletions b/‎docs/development-guide.md‎
Lines changed: 5 additions & 0 deletions
@@ -23,6 +23,7 @@
 ## 🚀 特性
 
 - **OpenAI 兼容 API**: 完全兼容 OpenAI 格式的 `/v1/chat/completions` 端点
+- **TTS 语音生成**: 支持 Gemini 2.5 TTS 模型的单/多说话人音频生成
 - **智能模型切换**: 通过 `model` 字段动态切换 AI Studio 中的模型
 - **反指纹检测**: 使用 Camoufox 浏览器降低被检测风险
 - **图形界面启动器**: 功能丰富的 **网页** 启动器，简化配置和管理
@@ -191,6 +192,59 @@ curl -X POST http://localhost:2048/v1/chat/completions \
    - **模型名称**: `gemini-2.5-pro` (或其他 AI Studio 支持的模型)
    - **API 密钥**: 留空或输入任意字符，如`123`
 
+### TTS 语音生成
+
+支持 Gemini 2.5 Flash/Pro TTS 模型进行单说话人或多说话人音频生成：
+
+#### 单说话人示例
+
+```bash
+curl -X POST http://localhost:2048/generate-speech \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gemini-2.5-flash-preview-tts",
+    "contents": "Hello, this is a test.",
+    "generationConfig": {
+      "responseModalities": ["AUDIO"],
+      "speechConfig": {
+        "voiceConfig": {
+          "prebuiltVoiceConfig": {"voiceName": "Kore"}
+        }
+      }
+    }
+  }'
+```
+
+#### 多说话人示例
+
+```bash
+curl -X POST http://localhost:2048/generate-speech \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gemini-2.5-flash-preview-tts",
+    "contents": "Joe: How are you?\nJane: I am fine, thanks!",
+    "generationConfig": {
+      "responseModalities": ["AUDIO"],
+      "speechConfig": {
+        "multiSpeakerVoiceConfig": {
+          "speakerVoiceConfigs": [
+            {"speaker": "Joe", "voiceConfig": {"prebuiltVoiceConfig": {"voiceName": "Kore"}}},
+            {"speaker": "Jane", "voiceConfig": {"prebuiltVoiceConfig": {"voiceName": "Puck"}}}
+          ]
+        }
+      }
+    }
+  }'
+```
+
+**可用语音**: Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, Callirrhoe, Autonoe, Enceladus, Iapetus 等 30 种。
+
+**端点**:
+- `POST /generate-speech`
+- `POST /v1beta/models/{model}:generateContent` (兼容官方 API)
+
+**返回格式**: 音频数据以 Base64 编码的 WAV 格式在 `candidates[0].content.parts[0].inlineData.data` 中返回。
+
 ### Ollama 兼容层
 
 项目还提供 Ollama 格式的 API 兼容：
@@ -219,6 +273,7 @@ AIStudio2API/
 │   ├── browser/                 # 浏览器自动化模块
 │   ├── config/                  # 配置管理
 │   ├── models/                  # 数据模型
+│   ├── tts/                     # TTS 语音生成模块
 │   ├── proxy/                   # 流式代理
 │   └── static/                  # 静态资源
 ├── data/                        # 运行时数据目录
@@ -296,7 +351,7 @@ cp .env.example .env
 
 ## 📅 开发计划
 
-- **TTS 支持**: 适配 `gemini-2.5-flash/pro-preview-tts` 语音生成模型
+- ✅ **TTS 支持**: 已适配 `gemini-2.5-flash/pro-preview-tts` 语音生成模型
 - **文档完善**: 更新并优化 `docs/` 目录下的详细使用文档与 API 规范
 - **一键部署**: 提供 Windows/Linux/macOS 的全自动化安装与启动脚本
 - **Docker 支持**: 提供标准 Dockerfile 及 Docker Compose 编排文件，简化部署流程
 
@@ -23,6 +23,7 @@
 ## 🚀 Features
 
 - **OpenAI Compatible API**: Fully compatible with OpenAI format `/v1/chat/completions` endpoint
+- **TTS Speech Generation**: Supports Gemini 2.5 TTS models for single/multi-speaker audio generation
 - **Smart Model Switching**: Dynamically switch models in AI Studio via the `model` field
 - **Anti-Fingerprint Detection**: Uses Camoufox browser to reduce detection risk
 - **GUI Launcher**: Feature-rich **web** launcher for simplified configuration and management
@@ -185,6 +186,59 @@ Using Cherry Studio as an example:
    - **Model Name**: `gemini-2.5-pro` (or other AI Studio supported models)
    - **API Key**: Leave empty or enter any character like `123`
 
+### TTS Speech Generation
+
+Supports Gemini 2.5 Flash/Pro TTS models for single-speaker or multi-speaker audio generation:
+
+#### Single-Speaker Example
+
+```bash
+curl -X POST http://localhost:2048/generate-speech \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gemini-2.5-flash-preview-tts",
+    "contents": "Hello, this is a test.",
+    "generationConfig": {
+      "responseModalities": ["AUDIO"],
+      "speechConfig": {
+        "voiceConfig": {
+          "prebuiltVoiceConfig": {"voiceName": "Kore"}
+        }
+      }
+    }
+  }'
+```
+
+#### Multi-Speaker Example
+
+```bash
+curl -X POST http://localhost:2048/generate-speech \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gemini-2.5-flash-preview-tts",
+    "contents": "Joe: How are you?\nJane: I am fine, thanks!",
+    "generationConfig": {
+      "responseModalities": ["AUDIO"],
+      "speechConfig": {
+        "multiSpeakerVoiceConfig": {
+          "speakerVoiceConfigs": [
+            {"speaker": "Joe", "voiceConfig": {"prebuiltVoiceConfig": {"voiceName": "Kore"}}},
+            {"speaker": "Jane", "voiceConfig": {"prebuiltVoiceConfig": {"voiceName": "Puck"}}}
+          ]
+        }
+      }
+    }
+  }'
+```
+
+**Available Voices**: Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, Callirrhoe, Autonoe, Enceladus, Iapetus, and 18 more voices.
+
+**Endpoints**:
+- `POST /generate-speech`
+- `POST /v1beta/models/{model}:generateContent` (compatible with official API)
+
+**Response Format**: Audio data is returned as Base64-encoded WAV format in `candidates[0].content.parts[0].inlineData.data`.
+
 ### Ollama Compatibility Layer
 
 The project also provides Ollama format API compatibility:
@@ -213,6 +267,7 @@ AIStudio2API/
 │   ├── browser/                 # Browser automation modules
 │   ├── config/                  # Configuration management
 │   ├── models/                  # Data models
+│   ├── tts/                     # TTS Speech Generation modules
 │   ├── proxy/                   # Streaming proxy
 │   └── static/                  # Static resources
 ├── data/                        # Runtime data directory
@@ -290,7 +345,7 @@ Issues and Pull Requests are welcome!
 
 ## 📅 Development Roadmap
 
-- **TTS Support**: Adapt `gemini-2.5-flash/pro-preview-tts` speech generation models
+- ✅ **TTS Support**: Adapted `gemini-2.5-flash/pro-preview-tts` speech generation models
 - **Documentation**: Update and optimize documentation in `docs/` directory
 - **One-Click Deployment**: Provide fully automated install and launch scripts for Windows/Linux/macOS
 - **Docker Support**: Provide standard Dockerfile and Docker Compose orchestration files
 
@@ -2,8 +2,6 @@ veo-2.0-generate-001
 imagen-4.0-fast-generate-001
 imagen-4.0-ultra-generate-001
 imagen-4.0-generate-001
-gemini-2.5-flash-preview-tts
-gemini-2.5-pro-preview-tts
 gemini-2.5-flash-native-audio-preview-09-2025
 gemini-2.5-flash-image
 gemini-3-pro-image-preview
@@ -225,6 +225,40 @@ else:
         print(f"Error: {response.status_code}\n{response.text}")
 ```
 
+### TTS 语音生成
+
+**端点**: 
+- `POST /generate-speech`
+- `POST /v1beta/models/{model}:generateContent`
+
+支持 Gemini 2.5 TTS 模型进行单说话人或多说话人音频生成。
+
+**支持的模型**:
+- `gemini-2.5-flash-preview-tts`
+- `gemini-2.5-pro-preview-tts`
+
+**请求示例**:
+```bash
+curl -X POST http://localhost:2048/generate-speech \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gemini-2.5-flash-preview-tts",
+    "contents": "Hello, this is a test.",
+    "generationConfig": {
+      "responseModalities": ["AUDIO"],
+      "speechConfig": {
+        "voiceConfig": {
+          "prebuiltVoiceConfig": {"voiceName": "Kore"}
+        }
+      }
+    }
+  }'
+```
+
+**响应格式**: 音频数据以 Base64 编码的 WAV 格式在 `candidates[0].content.parts[0].inlineData.data` 中返回。
+
+**详细文档**: 参见 [TTS 使用指南](tts-guide.md)
+
 ### Ollama 兼容层
 
 项目还提供 Ollama 格式的 API 兼容：
@@ -393,5 +427,6 @@ curl -X POST http://localhost:11434/api/chat \
 ## 下一步
 
 API 使用配置完成后，请参考：
+- [TTS 语音生成指南](tts-guide.md)
 - [故障排除指南](troubleshooting.md)
 - [日志控制指南](logging-control.md)
@@ -56,6 +56,11 @@ AIStudio2API/
 │   ├── models/                 # 数据模型
 │   │   ├── types.py            # 聊天/异常模型
 │   │   └── websocket.py        # WebSocket日志模型
+│   ├── tts/                    # TTS 语音生成模块
+│   │   ├── __init__.py         # 模块初始化
+│   │   ├── models.py           # TTS 数据模型
+│   │   ├── tts_controller.py   # TTS 页面控制器
+│   │   └── tts_processor.py    # TTS 请求处理器
 │   ├── proxy/                  # 流式代理服务
 │   │   ├── runner.py           # 代理服务入口
 │   │   ├── server.py           # 代理服务器