|
| 1 | +# Voice AI Agent (OCI Realtime Speech + Generative AI Agent) |
| 2 | + |
| 3 | +**Author:** msliwins |
| 4 | +**Last review date:** 2025-12-05 |
| 5 | + |
| 6 | +A small voice assistant that: |
| 7 | + |
| 8 | +1. Listens to your microphone with VAD (voice activity detection), |
| 9 | +2. Streams audio to **OCI Realtime Speech** for STT, |
| 10 | +3. Sends the recognized text to an **OCI Generative AI Agent Endpoint**, |
| 11 | +4. Uses **OCI Text-to-Speech** to speak the answer back. |
| 12 | + |
| 13 | +Everything runs in a loop until you stop it with `Ctrl+C`. |
| 14 | + |
| 15 | +--- |
| 16 | + |
| 17 | +## Features |
| 18 | + |
| 19 | +- 🎙️ Voice Activity Detection (VAD) |
| 20 | + Automatically starts recording when you speak and stops after a short silence. |
| 21 | + |
| 22 | +- 🧠 Generative AI Agent integration |
| 23 | + Uses an OCI Generative AI Agent Endpoint to handle conversation and tools. |
| 24 | + |
| 25 | +- 🗣️ Text-to-Speech |
| 26 | + Uses OCI AI Speech to synthesize responses and plays them locally. |
| 27 | + |
| 28 | +- 🔁 Persistent agent session |
| 29 | + Single agent session reused across turns for conversational context. |
| 30 | + |
| 31 | +- 🧪 Debug traces |
| 32 | + Optionally saves agent traces to `traces.json` for debugging. |
| 33 | + |
| 34 | +--- |
| 35 | + |
| 36 | +## Project Structure (key files) |
| 37 | + |
| 38 | +- `main.py` – the script you shared; runs the whole loop. |
| 39 | +- `requirements.txt` – Python dependencies. |
| 40 | +- `.env` – **local**, not committed, real values. |
| 41 | +- `example.env` – safe template with placeholder values for others. |
| 42 | + |
| 43 | +--- |
| 44 | + |
| 45 | +## Requirements |
| 46 | + |
| 47 | +- Python 3.11+ (recommended) |
| 48 | +- Valid OCI tenancy and user with: |
| 49 | + - Permission to use **AI Speech** (STT + TTS), |
| 50 | + - Permission to use **Generative AI Agent Runtime**. |
| 51 | +- Configured `~/.oci/config` with a profile matching your env (`OCI_PROFILE`). |
| 52 | +- A working microphone on your machine (Windows, since it uses `winsound`). |
0 commit comments