A Visual Studio Code extension that integrates Deepgram's Speech-to-Text (STT) and Text-to-Speech (TTS) capabilities directly into your development environment.
- Record audio snippets directly in VSCode
- Store and manage multiple audio recordings
- Transcribe audio using Deepgram's nova-3 model
- Configurable transcription options:
- Multi-channel transcription
- Automatic punctuation
- Dictation mode
- Paragraph formatting
- Smart formatting
- Utterance detection
- Speaker diarization
- Adjustable sample rate (8000Hz - 48000Hz)
- Convert text to speech using Deepgram's Aura 2 voices
- 12 high-quality voice options
- Real-time audio playback in the extension
- Support for various English voice models
- Direct API key input
- Optional short-lived token generation for enhanced security
For audio recording functionality, you need to install SoX (Sound eXchange):
macOS:
brew install soxLinux (Ubuntu/Debian):
sudo apt-get install sox libsox-fmt-allLinux (Fedora/RHEL):
sudo yum install soxWindows: Download and install SoX from sourceforge.net/projects/sox
-
Clone this repository:
git clone <repository-url> cd vscode
-
Install dependencies:
npm install
-
Compile the extension:
npm run compile
-
Open the project in VSCode and press
F5to run the extension in a new Extension Development Host window.
To create an installable package:
npm install -g @vscode/vsce
vsce packageThen install the generated .vsix file in VSCode:
- Open VSCode
- Go to Extensions view
- Click the
...menu - Select "Install from VSIX..."
- Choose the generated
.vsixfile
- Click the Deepgram icon in the Activity Bar (left sidebar)
- Enter your Deepgram API key in the text field
- Optionally enable "Use short-lived tokens" for enhanced security
- Expand the "Speech-to-Text (STT)" section
- Click "Start Recording" to begin recording audio
- Click "Stop Recording" when finished
- Your recorded clips appear in the list
- Click a clip to select it
- Configure transcription options:
- Select sample rate
- Enable/disable features (punctuation, diarization, etc.)
- Click "Transcribe Selected Audio"
- View the transcription result below
- Expand the "Text-to-Speech (TTS)" section
- Select a voice from the dropdown
- Enter the text you want to convert to speech
- Click "Speak"
- Listen to the generated audio using the built-in player
The extension includes both Aura-1 and Aura-2 voice models:
Asteria, Luna, Stella, Athena, Hera, Orion, Arcas, Perseus, Angus, Orpheus, Helios, Zeus
- Thalia - Clear, Confident, Energetic, Enthusiastic
- Andromeda - Casual, Expressive, Comfortable
- Helena - Caring, Natural, Positive, Friendly
- Apollo - Confident, Comfortable, Casual
- Arcas - Natural, Smooth, Clear
- Aries - Warm, Energetic, Caring
Amalthea, Asteria, Athena, Atlas, Aurora, Callisto, Cora, Cordelia, Delia, Draco, Electra, Harmonia, Hera, Hermes, Hyperion, Iris, Janus, Juno, Jupiter, Luna, Mars, Minerva, Neptune, Odysseus, Ophelia, Orion, Orpheus, Pandora, Phoebe, Pluto, Saturn, Selene, Theia, Vesta, Zeus
- Celeste - Colombian accent, Clear, Energetic, Positive
- Estrella - Mexican accent, Natural, Calm, Comfortable
- Nestor - Peninsular accent, Professional, Calm, Confident
Sirio, Carina, Alvaro, Diana, Aquila, Selena, Javier (representing Mexican, Peninsular, Colombian, and Latin American accents)
Get your Deepgram API key from Deepgram Console
When enabled, the extension will automatically generate short-lived tokens using your API key. This provides an additional layer of security by limiting the lifespan of authentication credentials.
The extension uses node-record-lpcm16 for audio recording, which requires SoX to be installed on your system. Audio is captured in WAV format with configurable sample rates (8000Hz - 48000Hz).
The recording process:
- Captures audio from your default microphone
- Stores audio data in memory as WAV format
- Sends the audio buffer to Deepgram's API for transcription
If you encounter recording issues, verify that:
- SoX is installed and accessible in your PATH
- Your system has microphone permissions enabled for terminal/VSCode
- Your default audio input device is properly configured
The extension uses:
- Deepgram batch (pre-recorded) API for transcription
- Deepgram TTS API for speech synthesis
- Token-based authentication endpoint for short-lived tokens
vscode/
├── src/
│ ├── extension.ts # Extension entry point
│ ├── deepgramViewProvider.ts # Webview UI provider
│ └── deepgramService.ts # Deepgram API integration
├── resources/
│ ├── deepgram-icon.svg # Activity bar icon (generated)
│ └── deepgram-logo.svg # Activity bar icon (actual)
├── package.json # Extension manifest
└── tsconfig.json # TypeScript configuration
npm run compilenpm run watchMIT
For issues and feature requests, please open an issue on the repository.