Skip to content

Commit 9066e93

Browse files
deploy: 38ac96f
1 parent e895f94 commit 9066e93

File tree

8 files changed

+93
-8
lines changed

8 files changed

+93
-8
lines changed

.doctrees/api/index.doctree

7.33 KB
Binary file not shown.

.doctrees/environment.pickle

1.31 KB
Binary file not shown.

.doctrees/index.doctree

2.62 KB
Binary file not shown.

_modules/solana_agent/client/solana_agent.html

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,23 @@ <h1>Source code for solana_agent.client.solana_agent</h1><div class="highlight">
127127
<span class="n">capture_schema</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
128128
<span class="n">capture_name</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
129129
<span class="n">output_format</span><span class="p">:</span> <span class="n">Literal</span><span class="p">[</span><span class="s2">&quot;text&quot;</span><span class="p">,</span> <span class="s2">&quot;audio&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="s2">&quot;text&quot;</span><span class="p">,</span>
130+
<span class="c1"># Realtime (WebSocket) options — used when realtime=True</span>
131+
<span class="n">realtime</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">,</span>
132+
<span class="n">vad</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">bool</span><span class="p">]</span> <span class="o">=</span> <span class="kc">False</span><span class="p">,</span>
133+
<span class="n">rt_encode_input</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">,</span>
134+
<span class="n">rt_encode_output</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">,</span>
135+
<span class="n">rt_voice</span><span class="p">:</span> <span class="n">Literal</span><span class="p">[</span>
136+
<span class="s2">&quot;alloy&quot;</span><span class="p">,</span>
137+
<span class="s2">&quot;ash&quot;</span><span class="p">,</span>
138+
<span class="s2">&quot;ballad&quot;</span><span class="p">,</span>
139+
<span class="s2">&quot;cedar&quot;</span><span class="p">,</span>
140+
<span class="s2">&quot;coral&quot;</span><span class="p">,</span>
141+
<span class="s2">&quot;echo&quot;</span><span class="p">,</span>
142+
<span class="s2">&quot;marin&quot;</span><span class="p">,</span>
143+
<span class="s2">&quot;sage&quot;</span><span class="p">,</span>
144+
<span class="s2">&quot;shimmer&quot;</span><span class="p">,</span>
145+
<span class="s2">&quot;verse&quot;</span><span class="p">,</span>
146+
<span class="p">]</span> <span class="o">=</span> <span class="s2">&quot;marin&quot;</span><span class="p">,</span>
130147
<span class="n">audio_voice</span><span class="p">:</span> <span class="n">Literal</span><span class="p">[</span>
131148
<span class="s2">&quot;alloy&quot;</span><span class="p">,</span>
132149
<span class="s2">&quot;ash&quot;</span><span class="p">,</span>
@@ -139,7 +156,6 @@ <h1>Source code for solana_agent.client.solana_agent</h1><div class="highlight">
139156
<span class="s2">&quot;sage&quot;</span><span class="p">,</span>
140157
<span class="s2">&quot;shimmer&quot;</span><span class="p">,</span>
141158
<span class="p">]</span> <span class="o">=</span> <span class="s2">&quot;nova&quot;</span><span class="p">,</span>
142-
<span class="n">audio_instructions</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="s2">&quot;You speak in a friendly and helpful manner.&quot;</span><span class="p">,</span>
143159
<span class="n">audio_output_format</span><span class="p">:</span> <span class="n">Literal</span><span class="p">[</span>
144160
<span class="s2">&quot;mp3&quot;</span><span class="p">,</span> <span class="s2">&quot;opus&quot;</span><span class="p">,</span> <span class="s2">&quot;aac&quot;</span><span class="p">,</span> <span class="s2">&quot;flac&quot;</span><span class="p">,</span> <span class="s2">&quot;wav&quot;</span><span class="p">,</span> <span class="s2">&quot;pcm&quot;</span>
145161
<span class="p">]</span> <span class="o">=</span> <span class="s2">&quot;aac&quot;</span><span class="p">,</span>
@@ -157,8 +173,14 @@ <h1>Source code for solana_agent.client.solana_agent</h1><div class="highlight">
157173
<span class="sd"> message: Text message or audio bytes</span>
158174
<span class="sd"> prompt: Optional prompt for the agent</span>
159175
<span class="sd"> output_format: Response format (&quot;text&quot; or &quot;audio&quot;)</span>
176+
<span class="sd"> capture_schema: Optional Pydantic schema for structured output</span>
177+
<span class="sd"> capture_name: Optional name for structured output capture</span>
178+
<span class="sd"> realtime: Whether to use realtime (WebSocket) processing</span>
179+
<span class="sd"> vad: Whether to use voice activity detection (for audio input)</span>
180+
<span class="sd"> rt_encode_input: Whether to re-encode input audio for compatibility</span>
181+
<span class="sd"> rt_encode_output: Whether to re-encode output audio for compatibility</span>
182+
<span class="sd"> rt_voice: Voice to use for realtime audio output</span>
160183
<span class="sd"> audio_voice: Voice to use for audio output</span>
161-
<span class="sd"> audio_instructions: Not used in this version</span>
162184
<span class="sd"> audio_output_format: Audio output format</span>
163185
<span class="sd"> audio_input_format: Audio input format</span>
164186
<span class="sd"> router: Optional routing service for processing</span>
@@ -173,8 +195,12 @@ <h1>Source code for solana_agent.client.solana_agent</h1><div class="highlight">
173195
<span class="n">query</span><span class="o">=</span><span class="n">message</span><span class="p">,</span>
174196
<span class="n">images</span><span class="o">=</span><span class="n">images</span><span class="p">,</span>
175197
<span class="n">output_format</span><span class="o">=</span><span class="n">output_format</span><span class="p">,</span>
198+
<span class="n">realtime</span><span class="o">=</span><span class="n">realtime</span><span class="p">,</span>
199+
<span class="n">vad</span><span class="o">=</span><span class="n">vad</span><span class="p">,</span>
200+
<span class="n">rt_encode_input</span><span class="o">=</span><span class="n">rt_encode_input</span><span class="p">,</span>
201+
<span class="n">rt_encode_output</span><span class="o">=</span><span class="n">rt_encode_output</span><span class="p">,</span>
202+
<span class="n">rt_voice</span><span class="o">=</span><span class="n">rt_voice</span><span class="p">,</span>
176203
<span class="n">audio_voice</span><span class="o">=</span><span class="n">audio_voice</span><span class="p">,</span>
177-
<span class="n">audio_instructions</span><span class="o">=</span><span class="n">audio_instructions</span><span class="p">,</span>
178204
<span class="n">audio_output_format</span><span class="o">=</span><span class="n">audio_output_format</span><span class="p">,</span>
179205
<span class="n">audio_input_format</span><span class="o">=</span><span class="n">audio_input_format</span><span class="p">,</span>
180206
<span class="n">prompt</span><span class="o">=</span><span class="n">prompt</span><span class="p">,</span>

_sources/index.rst.txt

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,35 @@ Audio/Text Streaming
196196
print(response, end="")
197197
198198
199+
Realtime Audio
200+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
201+
202+
If input and/or output is encoded (compressed) like mp4/aac then you must have ffmpeg installed.
203+
204+
Due to the overhead of the router (API call) - realtime only supports a single agent setup.
205+
206+
.. code-block:: python
207+
208+
from solana_agent import SolanaAgent
209+
210+
solana_agent = SolanaAgent(config=config)
211+
212+
# Example: mobile sends MP4/AAC; server encodes output to AAC
213+
audio_content = await audio_file.read() # bytes
214+
async for audio_chunk in solana_agent.process(
215+
"user123", # required
216+
audio_content, # required
217+
realtime=True, # optional (default False)
218+
output_format="audio", # required
219+
vad=True, # enable VAD (optional)
220+
rt_encode_input=True, # accept compressed input (optional)
221+
rt_encode_output=True, # encode output for client (optional)
222+
rt_voice="marin" # the voice to use for interactions (optional)
223+
audio_input_format="mp4", # client transport (optional)
224+
audio_output_format="aac" # client transport (optional)
225+
):
226+
handle_audio(audio_chunk)
227+
199228
Image/Text Streaming
200229
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
201230

api/index.html

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ <h1>API Reference<a class="headerlink" href="#api-reference" title="Link to this
169169
</dl>
170170
<dl class="py method">
171171
<dt class="sig sig-object py" id="solana_agent.client.solana_agent.SolanaAgent.process">
172-
<em class="property"><span class="k"><span class="pre">async</span></span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">process</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">user_id</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">message</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">prompt</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">capture_schema</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">capture_name</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">output_format</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'text'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">audio_voice</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'nova'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">audio_instructions</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'You</span> <span class="pre">speak</span> <span class="pre">in</span> <span class="pre">a</span> <span class="pre">friendly</span> <span class="pre">and</span> <span class="pre">helpful</span> <span class="pre">manner.'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">audio_output_format</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'aac'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">audio_input_format</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'mp4'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">router</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">images</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">output_model</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/solana_agent/client/solana_agent.html#SolanaAgent.process"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#solana_agent.client.solana_agent.SolanaAgent.process" title="Link to this definition"></a></dt>
172+
<em class="property"><span class="k"><span class="pre">async</span></span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">process</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">user_id</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">message</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">prompt</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">capture_schema</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">capture_name</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">output_format</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'text'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">realtime</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">vad</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">rt_encode_input</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">rt_encode_output</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">rt_voice</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'marin'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">audio_voice</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'nova'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">audio_output_format</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'aac'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">audio_input_format</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'mp4'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">router</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">images</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">output_model</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/solana_agent/client/solana_agent.html#SolanaAgent.process"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#solana_agent.client.solana_agent.SolanaAgent.process" title="Link to this definition"></a></dt>
173173
<dd><p>Process a user message (text or audio) and optional images, returning the response stream.</p>
174174
<dl class="field-list simple">
175175
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
@@ -178,15 +178,19 @@ <h1>API Reference<a class="headerlink" href="#api-reference" title="Link to this
178178
<li><p><strong>message</strong> (<em>str</em><em> | </em><em>bytes</em>) – Text message or audio bytes</p></li>
179179
<li><p><strong>prompt</strong> (<em>str</em><em> | </em><em>None</em>) – Optional prompt for the agent</p></li>
180180
<li><p><strong>output_format</strong> (<em>Literal</em><em>[</em><em>'text'</em><em>, </em><em>'audio'</em><em>]</em>) – Response format (“text” or “audio”)</p></li>
181+
<li><p><strong>capture_schema</strong> (<em>Dict</em><em>[</em><em>str</em><em>, </em><em>Any</em><em>] </em><em>| </em><em>None</em>) – Optional Pydantic schema for structured output</p></li>
182+
<li><p><strong>capture_name</strong> (<em>str</em><em> | </em><em>None</em>) – Optional name for structured output capture</p></li>
183+
<li><p><strong>realtime</strong> (<em>bool</em>) – Whether to use realtime (WebSocket) processing</p></li>
184+
<li><p><strong>vad</strong> (<em>bool</em><em> | </em><em>None</em>) – Whether to use voice activity detection (for audio input)</p></li>
185+
<li><p><strong>rt_encode_input</strong> (<em>bool</em>) – Whether to re-encode input audio for compatibility</p></li>
186+
<li><p><strong>rt_encode_output</strong> (<em>bool</em>) – Whether to re-encode output audio for compatibility</p></li>
187+
<li><p><strong>rt_voice</strong> (<em>Literal</em><em>[</em><em>'alloy'</em><em>, </em><em>'ash'</em><em>, </em><em>'ballad'</em><em>, </em><em>'cedar'</em><em>, </em><em>'coral'</em><em>, </em><em>'echo'</em><em>, </em><em>'marin'</em><em>, </em><em>'sage'</em><em>, </em><em>'shimmer'</em><em>, </em><em>'verse'</em><em>]</em>) – Voice to use for realtime audio output</p></li>
181188
<li><p><strong>audio_voice</strong> (<em>Literal</em><em>[</em><em>'alloy'</em><em>, </em><em>'ash'</em><em>, </em><em>'ballad'</em><em>, </em><em>'coral'</em><em>, </em><em>'echo'</em><em>, </em><em>'fable'</em><em>, </em><em>'onyx'</em><em>, </em><em>'nova'</em><em>, </em><em>'sage'</em><em>, </em><em>'shimmer'</em><em>]</em>) – Voice to use for audio output</p></li>
182-
<li><p><strong>audio_instructions</strong> (<em>str</em>) – Not used in this version</p></li>
183189
<li><p><strong>audio_output_format</strong> (<em>Literal</em><em>[</em><em>'mp3'</em><em>, </em><em>'opus'</em><em>, </em><em>'aac'</em><em>, </em><em>'flac'</em><em>, </em><em>'wav'</em><em>, </em><em>'pcm'</em><em>]</em>) – Audio output format</p></li>
184190
<li><p><strong>audio_input_format</strong> (<em>Literal</em><em>[</em><em>'flac'</em><em>, </em><em>'mp3'</em><em>, </em><em>'mp4'</em><em>, </em><em>'mpeg'</em><em>, </em><em>'mpga'</em><em>, </em><em>'m4a'</em><em>, </em><em>'ogg'</em><em>, </em><em>'wav'</em><em>, </em><em>'webm'</em><em>]</em>) – Audio input format</p></li>
185191
<li><p><strong>router</strong> (<em>RoutingService</em><em> | </em><em>None</em>) – Optional routing service for processing</p></li>
186192
<li><p><strong>images</strong> (<em>List</em><em>[</em><em>str</em><em> | </em><em>bytes</em><em>] </em><em>| </em><em>None</em>) – Optional list of image URLs (str) or image bytes.</p></li>
187193
<li><p><strong>output_model</strong> (<em>Type</em><em>[</em><em>BaseModel</em><em>] </em><em>| </em><em>None</em>) – Optional Pydantic model for structured output</p></li>
188-
<li><p><strong>capture_schema</strong> (<em>Dict</em><em>[</em><em>str</em><em>, </em><em>Any</em><em>] </em><em>| </em><em>None</em>)</p></li>
189-
<li><p><strong>capture_name</strong> (<em>str</em><em> | </em><em>None</em>)</p></li>
190194
</ul>
191195
</dd>
192196
<dt class="field-even">Returns<span class="colon">:</span></dt>

0 commit comments

Comments
 (0)