-
Notifications
You must be signed in to change notification settings - Fork 111
✨ feat: add support for audio input with input_audio format #241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ feat: add support for audio input with input_audio format #241
Conversation
Add native support for audio file inputs using OpenRouter's input_audio format, enabling models to process audio content directly. * Add ChatCompletionContentPartInputAudio type for input_audio format * Support mp3 and wav audio formats with automatic format normalization * Handle base64-encoded audio data from data URLs * Validate audio formats and provide clear error messages * Maintain cache_control support for audio parts * Throw helpful error for unsupported external audio URLs The implementation checks for audio/* media types and converts them to OpenRouter's expected input_audio format with base64 data and format specification.
Fix TypeScript type errors by adding const assertions to Set constructors
throughout the codebase. This ensures the Set type matches the expected
template literal type `Set<\`${string}:\`>` in the isUrl function signature.
* Add const assertion to audio input URL check
* Add const assertion to PDF file URL check
* Add const assertion to file-url-utils URL check
Add test coverage for audio input handling including format normalization, base64 conversion, error handling, and cache control support. * Test mp3 and wav format conversion * Test audio/x-wav normalization to wav * Test base64 data URL extraction * Test raw base64 string handling * Test error for external audio URLs * Test error for unsupported audio formats * Test cache control with audio inputs All tests pass successfully.
Improve user experience with clearer, more actionable error messages for audio handling that guide developers to the correct solution. * Add helpful multi-line error messages with specific solutions * Include supported MIME types (audio/mpeg, audio/mp3, audio/wav, audio/x-wav) * Clarify that audio/mp3 and audio/mpeg both map to mp3 format * Add support for audio/mp3 and audio/wave MIME types * Link to OpenRouter documentation for audio features * Provide step-by-step instructions for URL error case * Update tests to use regex matching for error messages * Add test for audio/mp3 MIME type normalization All tests pass (26 total in convert-to-openrouter-chat-messages.test.ts)
| if (part.mediaType?.startsWith('audio/')) { | ||
| const fileData = getFileUrl({ | ||
| part, | ||
| defaultMediaType: 'audio/mpeg', | ||
| }); | ||
|
|
||
| // Check if fileData is a URL - if so, we need to download it | ||
| let base64Data: string; | ||
| if ( | ||
| isUrl({ | ||
| url: fileData, | ||
| protocols: new Set(['http:', 'https:'] as const), | ||
| }) | ||
| ) { | ||
| // For URLs, OpenRouter's input_audio doesn't support URLs directly | ||
| // We need to download and convert to base64 | ||
| // For now, we'll throw an error to indicate this limitation | ||
| throw new Error( | ||
| `Audio files cannot be provided as URLs.\n\n` + | ||
| `OpenRouter requires audio to be base64-encoded. Please:\n` + | ||
| `1. Download the audio file locally\n` + | ||
| `2. Read it as a Buffer or Uint8Array\n` + | ||
| `3. Pass it as the data parameter\n\n` + | ||
| `The AI SDK will automatically handle base64 encoding.\n\n` + | ||
| `Learn more: https://openrouter.ai/docs/features/multimodal/audio`, | ||
| ); | ||
| } else { | ||
| // Extract base64 data (handles both data URLs and raw base64) | ||
| base64Data = getBase64FromDataUrl(fileData); | ||
| } | ||
|
|
||
| // Map media type to format | ||
| const mediaType = part.mediaType || 'audio/mpeg'; | ||
| let format = mediaType.replace('audio/', ''); | ||
|
|
||
| // Normalize format names for OpenRouter | ||
| // Common MIME types: audio/mpeg, audio/mp3 -> mp3 | ||
| // audio/wav, audio/x-wav, audio/wave -> wav | ||
| if (format === 'mpeg' || format === 'mp3') { | ||
| format = 'mp3'; | ||
| } else if (format === 'x-wav' || format === 'wave' || format === 'wav') { | ||
| format = 'wav'; | ||
| } | ||
|
|
||
| // Validate format - OpenRouter only supports mp3 and wav | ||
| if (format !== 'mp3' && format !== 'wav') { | ||
| throw new Error( | ||
| `Unsupported audio format: "${mediaType}"\n\n` + | ||
| `OpenRouter only supports MP3 and WAV audio formats.\n` + | ||
| `• For MP3: use "audio/mpeg" or "audio/mp3"\n` + | ||
| `• For WAV: use "audio/wav" or "audio/x-wav"\n\n` + | ||
| `Learn more: https://openrouter.ai/docs/features/multimodal/audio`, | ||
| ); | ||
| } | ||
|
|
||
| return { | ||
| type: 'input_audio' as const, | ||
| input_audio: { | ||
| data: base64Data, | ||
| format: format as 'mp3' | 'wav', | ||
| }, | ||
| cache_control: cacheControl, | ||
| } satisfies ChatCompletionContentPart; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this section should be moved into its own function, so we can reuse it. The logic looks very self-contained
louisgv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! LGTM beside the nits!
Colocate audio conversion logic with other file/URL utilities for better discoverability and reuse.
|
@louisgv Pulled out into a utility while keeping it composable for other audio utils in the future. |
🎵 Native Audio Input Support for OpenRouter
This PR introduces first-class support for audio file inputs through OpenRouter's
input_audioformat, enabling AI models to process audio content seamlessly alongside text, images, and documents.✨ Key Features
🎯 Smart Format Detection & Normalization
The implementation intelligently handles various audio MIME types that developers commonly use:
audio/mpeg,audio/mp3→mp3audio/wav,audio/x-wav,audio/wave→wavThis means developers can use the MIME types they're familiar with from the Web Audio API, file uploads, or various audio libraries without worrying about OpenRouter's specific format requirements.
📚 Developer-Friendly Error Messages
When things go wrong, developers get clear, actionable guidance instead of cryptic errors:
For unsupported formats:
For URL-based audio (not yet supported):
🏗️ Implementation Details
The audio handler integrates seamlessly into the existing multimodal content pipeline:
audio/*media types on file partsinput_audiostructureGenerated structure:
🧪 Comprehensive Test Coverage
Added 9 new test cases covering:
All 26 tests in
convert-to-openrouter-chat-messages.test.tspass successfully.🔧 Technical Improvements
ChatCompletionContentPartInputAudiointerface with proper TypeScript typesSetliterals📖 Usage Example
🚀 Impact
This enhancement makes OpenRouter's audio capabilities immediately accessible to all AI SDK users, enabling new use cases like:
📝 Notes
This PR maintains backward compatibility and requires no changes to existing code. Audio support is purely additive and opt-in through the file content type.