Deepgram Voice AI VSCode Extension

A Visual Studio Code extension that integrates Deepgram's Speech-to-Text (STT) and Text-to-Speech (TTS) capabilities directly into your development environment.

Features

Speech-to-Text (STT)

Record audio snippets directly in VSCode
Store and manage multiple audio recordings
Transcribe audio using Deepgram's nova-3 model
Configurable transcription options:
- Multi-channel transcription
- Automatic punctuation
- Dictation mode
- Paragraph formatting
- Smart formatting
- Utterance detection
- Speaker diarization
Adjustable sample rate (8000Hz - 48000Hz)

Text-to-Speech (TTS)

Convert text to speech using Deepgram's Aura 2 voices
12 high-quality voice options
Real-time audio playback in the extension
Support for various English voice models

Authentication

Direct API key input
Optional short-lived token generation for enhanced security

Prerequisites

Audio Recording Requirements

For audio recording functionality, you need to install SoX (Sound eXchange):

macOS:

brew install sox

Linux (Ubuntu/Debian):

sudo apt-get install sox libsox-fmt-all

Linux (Fedora/RHEL):

sudo yum install sox

Windows: Download and install SoX from sourceforge.net/projects/sox

Installation

From Source

Clone this repository:
```
git clone <repository-url>
cd vscode
```
Install dependencies:
```
npm install
```
Compile the extension:
```
npm run compile
```
Open the project in VSCode and press F5 to run the extension in a new Extension Development Host window.

Building VSIX Package

To create an installable package:

npm install -g @vscode/vsce
vsce package

Then install the generated .vsix file in VSCode:

Open VSCode
Go to Extensions view
Click the ... menu
Select "Install from VSIX..."
Choose the generated .vsix file

Usage

Getting Started

Click the Deepgram icon in the Activity Bar (left sidebar)
Enter your Deepgram API key in the text field
Optionally enable "Use short-lived tokens" for enhanced security

Speech-to-Text

Expand the "Speech-to-Text (STT)" section
Click "Start Recording" to begin recording audio
Click "Stop Recording" when finished
Your recorded clips appear in the list
Click a clip to select it
Configure transcription options:
- Select sample rate
- Enable/disable features (punctuation, diarization, etc.)
Click "Transcribe Selected Audio"
View the transcription result below

Text-to-Speech

Expand the "Text-to-Speech (TTS)" section
Select a voice from the dropdown
Enter the text you want to convert to speech
Click "Speak"
Listen to the generated audio using the built-in player

Available Voices

The extension includes both Aura-1 and Aura-2 voice models:

Aura-1 English Voices

Asteria, Luna, Stella, Athena, Hera, Orion, Arcas, Perseus, Angus, Orpheus, Helios, Zeus

Aura-2 English Voices (Featured)

Thalia - Clear, Confident, Energetic, Enthusiastic
Andromeda - Casual, Expressive, Comfortable
Helena - Caring, Natural, Positive, Friendly
Apollo - Confident, Comfortable, Casual
Arcas - Natural, Smooth, Clear
Aries - Warm, Energetic, Caring

Aura-2 English Voices (Additional)

Amalthea, Asteria, Athena, Atlas, Aurora, Callisto, Cora, Cordelia, Delia, Draco, Electra, Harmonia, Hera, Hermes, Hyperion, Iris, Janus, Juno, Jupiter, Luna, Mars, Minerva, Neptune, Odysseus, Ophelia, Orion, Orpheus, Pandora, Phoebe, Pluto, Saturn, Selene, Theia, Vesta, Zeus

Aura-2 Spanish Voices (Featured)

Celeste - Colombian accent, Clear, Energetic, Positive
Estrella - Mexican accent, Natural, Calm, Comfortable
Nestor - Peninsular accent, Professional, Calm, Confident

Aura-2 Spanish Voices (Additional)

Sirio, Carina, Alvaro, Diana, Aquila, Selena, Javier (representing Mexican, Peninsular, Colombian, and Latin American accents)

Configuration

API Key

Get your Deepgram API key from Deepgram Console

Short-Lived Tokens

When enabled, the extension will automatically generate short-lived tokens using your API key. This provides an additional layer of security by limiting the lifespan of authentication credentials.

Technical Notes

Audio Recording

The extension uses node-record-lpcm16 for audio recording, which requires SoX to be installed on your system. Audio is captured in WAV format with configurable sample rates (8000Hz - 48000Hz).

The recording process:

Captures audio from your default microphone
Stores audio data in memory as WAV format
Sends the audio buffer to Deepgram's API for transcription

If you encounter recording issues, verify that:

SoX is installed and accessible in your PATH
Your system has microphone permissions enabled for terminal/VSCode
Your default audio input device is properly configured

API Integration

The extension uses:

Deepgram batch (pre-recorded) API for transcription
Deepgram TTS API for speech synthesis
Token-based authentication endpoint for short-lived tokens

Development

Project Structure

vscode/
├── src/
│   ├── extension.ts              # Extension entry point
│   ├── deepgramViewProvider.ts   # Webview UI provider
│   └── deepgramService.ts        # Deepgram API integration
├── resources/
│   ├── deepgram-icon.svg         # Activity bar icon (generated)
│   └── deepgram-logo.svg         # Activity bar icon (actual)
├── package.json                  # Extension manifest
└── tsconfig.json                 # TypeScript configuration

Building

npm run compile

Watching for Changes

npm run watch

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
resources		resources
src		src
.gitignore		.gitignore
.vscodeignore		.vscodeignore
LICENSE.md		LICENSE.md
README.md		README.md
instructions.md		instructions.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

License

deepgram-devs/vscode

Folders and files

Latest commit

History

Repository files navigation

Deepgram Voice AI VSCode Extension

Features

Speech-to-Text (STT)

Text-to-Speech (TTS)

Authentication

Prerequisites

Audio Recording Requirements

Installation

From Source

Building VSIX Package

Usage

Getting Started

Speech-to-Text

Text-to-Speech

Available Voices

Aura-1 English Voices

Aura-2 English Voices (Featured)

Aura-2 English Voices (Additional)

Aura-2 Spanish Voices (Featured)

Aura-2 Spanish Voices (Additional)

Configuration

API Key

Short-Lived Tokens

Technical Notes

Audio Recording

API Integration

Development

Project Structure

Building

Watching for Changes

Resources

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages