You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -245,6 +246,7 @@ async for response in solana_agent.process("user123", "What is the latest news o
245
246
### Audio/Text Streaming
246
247
247
248
```python
249
+
## Realtime Usage
248
250
from solana_agent import SolanaAgent
249
251
250
252
config = {
@@ -275,28 +277,32 @@ async for response in solana_agent.process("user123", audio_content, audio_input
275
277
276
278
### Realtime Audio Streaming
277
279
278
-
If input and/or output is encoded (compressed) like mp4/aac then you must have `ffmpeg` installed.
280
+
If input and/or output is encoded (compressed) like mp4/mp3 then you must have `ffmpeg` installed.
279
281
280
282
Due to the overhead of the router (API call) - realtime only supports a single agent setup.
281
283
282
284
Realtime uses MongoDB for memory so Zep is not needed.
283
285
286
+
By default, when `realtime=True` and you supply raw/encoded audio bytes as input, the system **always skips the HTTP transcription (STT) path** and relies solely on the realtime websocket session for input transcription. If you don't specify `rt_transcription_model`, a sensible default (`gpt-4o-mini-transcribe`) is auto-selected so you still receive input transcript events with minimal latency.
287
+
288
+
Implications:
289
+
-`llm_provider.transcribe_audio` is never invoked for realtime turns.
290
+
- Lower end-to-end latency (no duplicate network round trip for STT).
291
+
- Unified transcript sourcing from realtime events.
292
+
- If you explicitly want to disable transcription altogether, send text (not audio bytes) or ignore transcript events client-side.
293
+
284
294
This example will work using expo-audio on Android and iOS.
285
295
286
296
```python
287
297
from solana_agent import SolanaAgent
288
298
289
299
solana_agent = SolanaAgent(config=config)
290
-
291
-
audio_content =await audio_file.read()
292
-
293
-
asyncdefgenerate():
294
-
asyncfor chunk in solana_agent.process(
295
-
user_id=user_id,
300
+
user_id="user123",
296
301
message=audio_content,
297
302
realtime=True,
298
303
rt_encode_input=True,
299
304
rt_encode_output=True,
305
+
rt_output_modalities=["audio"],
300
306
rt_voice="marin",
301
307
output_format="audio",
302
308
audio_output_format="mp3",
@@ -314,6 +320,106 @@ return StreamingResponse(
314
320
"X-Accel-Buffering": "no",
315
321
},
316
322
)
323
+
```
324
+
325
+
### Realtime Text Streaming
326
+
327
+
Due to the overhead of the router (API call) - realtime only supports a single agent setup.
328
+
329
+
Realtime uses MongoDB for memory so Zep is not needed.
330
+
331
+
When using realtime with text input, no audio transcription is needed. The same bypass rules apply—HTTP STT is never called in realtime mode.
332
+
333
+
```python
334
+
from solana_agent import SolanaAgent
335
+
336
+
solana_agent = SolanaAgent(config=config)
337
+
338
+
asyncdefgenerate():
339
+
asyncfor chunk in solana_agent.process(
340
+
user_id="user123",
341
+
message="What is the latest news on Solana?",
342
+
realtime=True,
343
+
rt_output_modalities=["text"],
344
+
):
345
+
yield chunk
346
+
```
347
+
348
+
### Dual Modality Realtime Streaming
349
+
350
+
Solana Agent supports **dual modality realtime streaming**, allowing you to stream both audio and text simultaneously from a single realtime session. This enables rich conversational experiences where users can receive both voice responses and text transcripts in real-time.
351
+
352
+
#### Features
353
+
-**Simultaneous Audio & Text**: Stream both modalities from the same conversation
354
+
-**Flexible Output**: Choose audio-only, text-only, or both modalities
355
+
-**Real-time Demuxing**: Automatically separate audio and text streams
356
+
-**Mobile Optimized**: Works seamlessly with compressed audio formats (MP4/AAC)
357
+
-**Memory Efficient**: Smart buffering and streaming for optimal performance
358
+
359
+
#### Mobile App Integration Example
360
+
361
+
```python
362
+
from fastapi import UploadFile
363
+
from fastapi.responses import StreamingResponse
364
+
from solana_agent import SolanaAgent
365
+
from solana_agent.interfaces.providers.realtime import RealtimeChunk
Copy file name to clipboardExpand all lines: docs/index.rst
+104-1Lines changed: 104 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -223,9 +223,10 @@ This example will work using expo-audio on Android and iOS.
223
223
rt_encode_input=True,
224
224
rt_encode_output=True,
225
225
rt_voice="marin",
226
+
rt_output_modalities=["audio"],
226
227
output_format="audio",
227
-
audio_output_format="mp3",
228
228
audio_input_format="m4a",
229
+
audio_output_format="mp3",
229
230
):
230
231
yield chunk
231
232
@@ -240,6 +241,108 @@ This example will work using expo-audio on Android and iOS.
240
241
},
241
242
)
242
243
244
+
Realtime Text Streaming
245
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
246
+
247
+
Due to the overhead of the router (API call) - realtime only supports a single agent setup.
248
+
249
+
Realtime uses MongoDB for memory so Zep is not needed.
250
+
251
+
.. code-block:: python
252
+
253
+
from solana_agent import SolanaAgent
254
+
255
+
solana_agent = SolanaAgent(config=config)
256
+
257
+
asyncdefgenerate():
258
+
asyncfor chunk in solana_agent.process(
259
+
user_id="user123",
260
+
message="What is the latest news on Solana?",
261
+
realtime=True,
262
+
rt_output_modalities=["text"],
263
+
):
264
+
yield chunk
265
+
266
+
Dual Modality Realtime Streaming
267
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
268
+
269
+
Solana Agent now supports **dual modality realtime streaming**, allowing you to stream both audio and text simultaneously from a single realtime session. This enables rich conversational experiences where users can receive both voice responses and text transcripts in real-time.
270
+
271
+
Features
272
+
^^^^^^^^
273
+
274
+
- **Simultaneous Audio & Text**: Stream both modalities from the same conversation
275
+
- **Flexible Output**: Choose audio-only, text-only, or both modalities
276
+
- **Real-time Demuxing**: Automatically separate audio and text streams
277
+
- **Mobile Optimized**: Works seamlessly with compressed audio formats (MP4/MP3)
278
+
- **Memory Efficient**: Smart buffering and streaming for optimal performance
279
+
280
+
Mobile App Integration Example
281
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
282
+
283
+
.. code-block:: python
284
+
285
+
from fastapi import UploadFile
286
+
from fastapi.responses import StreamingResponse
287
+
from solana_agent import SolanaAgent
288
+
from solana_agent.interfaces.providers.realtime import RealtimeChunk
0 commit comments