|
23 | 23 | ## 🚀 Features |
24 | 24 |
|
25 | 25 | - **OpenAI Compatible API**: Fully compatible with OpenAI format `/v1/chat/completions` endpoint |
| 26 | +- **TTS Speech Generation**: Supports Gemini 2.5 TTS models for single/multi-speaker audio generation |
26 | 27 | - **Smart Model Switching**: Dynamically switch models in AI Studio via the `model` field |
27 | 28 | - **Anti-Fingerprint Detection**: Uses Camoufox browser to reduce detection risk |
28 | 29 | - **GUI Launcher**: Feature-rich **web** launcher for simplified configuration and management |
@@ -185,6 +186,59 @@ Using Cherry Studio as an example: |
185 | 186 | - **Model Name**: `gemini-2.5-pro` (or other AI Studio supported models) |
186 | 187 | - **API Key**: Leave empty or enter any character like `123` |
187 | 188 |
|
| 189 | +### TTS Speech Generation |
| 190 | + |
| 191 | +Supports Gemini 2.5 Flash/Pro TTS models for single-speaker or multi-speaker audio generation: |
| 192 | + |
| 193 | +#### Single-Speaker Example |
| 194 | + |
| 195 | +```bash |
| 196 | +curl -X POST http://localhost:2048/generate-speech \ |
| 197 | + -H "Content-Type: application/json" \ |
| 198 | + -d '{ |
| 199 | + "model": "gemini-2.5-flash-preview-tts", |
| 200 | + "contents": "Hello, this is a test.", |
| 201 | + "generationConfig": { |
| 202 | + "responseModalities": ["AUDIO"], |
| 203 | + "speechConfig": { |
| 204 | + "voiceConfig": { |
| 205 | + "prebuiltVoiceConfig": {"voiceName": "Kore"} |
| 206 | + } |
| 207 | + } |
| 208 | + } |
| 209 | + }' |
| 210 | +``` |
| 211 | + |
| 212 | +#### Multi-Speaker Example |
| 213 | + |
| 214 | +```bash |
| 215 | +curl -X POST http://localhost:2048/generate-speech \ |
| 216 | + -H "Content-Type: application/json" \ |
| 217 | + -d '{ |
| 218 | + "model": "gemini-2.5-flash-preview-tts", |
| 219 | + "contents": "Joe: How are you?\nJane: I am fine, thanks!", |
| 220 | + "generationConfig": { |
| 221 | + "responseModalities": ["AUDIO"], |
| 222 | + "speechConfig": { |
| 223 | + "multiSpeakerVoiceConfig": { |
| 224 | + "speakerVoiceConfigs": [ |
| 225 | + {"speaker": "Joe", "voiceConfig": {"prebuiltVoiceConfig": {"voiceName": "Kore"}}}, |
| 226 | + {"speaker": "Jane", "voiceConfig": {"prebuiltVoiceConfig": {"voiceName": "Puck"}}} |
| 227 | + ] |
| 228 | + } |
| 229 | + } |
| 230 | + } |
| 231 | + }' |
| 232 | +``` |
| 233 | + |
| 234 | +**Available Voices**: Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, Callirrhoe, Autonoe, Enceladus, Iapetus, and 18 more voices. |
| 235 | + |
| 236 | +**Endpoints**: |
| 237 | +- `POST /generate-speech` |
| 238 | +- `POST /v1beta/models/{model}:generateContent` (compatible with official API) |
| 239 | + |
| 240 | +**Response Format**: Audio data is returned as Base64-encoded WAV format in `candidates[0].content.parts[0].inlineData.data`. |
| 241 | + |
188 | 242 | ### Ollama Compatibility Layer |
189 | 243 |
|
190 | 244 | The project also provides Ollama format API compatibility: |
@@ -213,6 +267,7 @@ AIStudio2API/ |
213 | 267 | │ ├── browser/ # Browser automation modules |
214 | 268 | │ ├── config/ # Configuration management |
215 | 269 | │ ├── models/ # Data models |
| 270 | +│ ├── tts/ # TTS Speech Generation modules |
216 | 271 | │ ├── proxy/ # Streaming proxy |
217 | 272 | │ └── static/ # Static resources |
218 | 273 | ├── data/ # Runtime data directory |
@@ -290,7 +345,7 @@ Issues and Pull Requests are welcome! |
290 | 345 |
|
291 | 346 | ## 📅 Development Roadmap |
292 | 347 |
|
293 | | -- **TTS Support**: Adapt `gemini-2.5-flash/pro-preview-tts` speech generation models |
| 348 | +- ✅ **TTS Support**: Adapted `gemini-2.5-flash/pro-preview-tts` speech generation models |
294 | 349 | - **Documentation**: Update and optimize documentation in `docs/` directory |
295 | 350 | - **One-Click Deployment**: Provide fully automated install and launch scripts for Windows/Linux/macOS |
296 | 351 | - **Docker Support**: Provide standard Dockerfile and Docker Compose orchestration files |
|
0 commit comments