Skip to content

Commit cbf04c4

Browse files
authored
voice ai agent ver1
1 parent b78e768 commit cbf04c4

File tree

5 files changed

+479
-0
lines changed

5 files changed

+479
-0
lines changed
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2025 Luigi Saetta
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Voice AI Agent (OCI Realtime Speech + Generative AI Agent)
2+
3+
**Author:** msliwins
4+
**Last review date:** 2025-12-05
5+
6+
A small voice assistant that:
7+
8+
1. Listens to your microphone with VAD (voice activity detection),
9+
2. Streams audio to **OCI Realtime Speech** for STT,
10+
3. Sends the recognized text to an **OCI Generative AI Agent Endpoint**,
11+
4. Uses **OCI Text-to-Speech** to speak the answer back.
12+
13+
Everything runs in a loop until you stop it with `Ctrl+C`.
14+
15+
---
16+
17+
## Features
18+
19+
- 🎙️ Voice Activity Detection (VAD)
20+
Automatically starts recording when you speak and stops after a short silence.
21+
22+
- 🧠 Generative AI Agent integration
23+
Uses an OCI Generative AI Agent Endpoint to handle conversation and tools.
24+
25+
- 🗣️ Text-to-Speech
26+
Uses OCI AI Speech to synthesize responses and plays them locally.
27+
28+
- 🔁 Persistent agent session
29+
Single agent session reused across turns for conversational context.
30+
31+
- 🧪 Debug traces
32+
Optionally saves agent traces to `traces.json` for debugging.
33+
34+
---
35+
36+
## Project Structure (key files)
37+
38+
- `main.py` – the script you shared; runs the whole loop.
39+
- `requirements.txt` – Python dependencies.
40+
- `.env`**local**, not committed, real values.
41+
- `example.env` – safe template with placeholder values for others.
42+
43+
---
44+
45+
## Requirements
46+
47+
- Python 3.11+ (recommended)
48+
- Valid OCI tenancy and user with:
49+
- Permission to use **AI Speech** (STT + TTS),
50+
- Permission to use **Generative AI Agent Runtime**.
51+
- Configured `~/.oci/config` with a profile matching your env (`OCI_PROFILE`).
52+
- A working microphone on your machine (Windows, since it uses `winsound`).
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
OCI_PROFILE=PHOENIX
2+
OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..
3+
OCI_AGENT_ENDPOINT_ID=ocid1.genaiagentendpoint.oc1.phx.

0 commit comments

Comments
 (0)