Skip to content

Conversation

@Karavil
Copy link
Contributor

@Karavil Karavil commented Nov 20, 2025

🎵 Native Audio Input Support for OpenRouter

This PR introduces first-class support for audio file inputs through OpenRouter's input_audio format, enabling AI models to process audio content seamlessly alongside text, images, and documents.

✨ Key Features

🎯 Smart Format Detection & Normalization

The implementation intelligently handles various audio MIME types that developers commonly use:

  • MP3 Support: audio/mpeg, audio/mp3mp3
  • WAV Support: audio/wav, audio/x-wav, audio/wavewav

This means developers can use the MIME types they're familiar with from the Web Audio API, file uploads, or various audio libraries without worrying about OpenRouter's specific format requirements.

📚 Developer-Friendly Error Messages

When things go wrong, developers get clear, actionable guidance instead of cryptic errors:

For unsupported formats:

Unsupported audio format: "audio/ogg"

OpenRouter only supports MP3 and WAV audio formats.
• For MP3: use "audio/mpeg" or "audio/mp3"
• For WAV: use "audio/wav" or "audio/x-wav"

Learn more: https://openrouter.ai/docs/features/multimodal/audio

For URL-based audio (not yet supported):

Audio files cannot be provided as URLs.

OpenRouter requires audio to be base64-encoded. Please:
1. Download the audio file locally
2. Read it as a Buffer or Uint8Array
3. Pass it as the data parameter

The AI SDK will automatically handle base64 encoding.

Learn more: https://openrouter.ai/docs/features/multimodal/audio

🏗️ Implementation Details

The audio handler integrates seamlessly into the existing multimodal content pipeline:

  1. Detection: Checks for audio/* media types on file parts
  2. Conversion: Transforms audio data into OpenRouter's input_audio structure
  3. Encoding: Handles base64 encoding from Uint8Array or extracts from data URLs
  4. Validation: Ensures only supported formats reach the API

Generated structure:

{
  type: 'input_audio',
  input_audio: {
    data: '<base64-encoded-audio>',
    format: 'mp3' | 'wav'
  },
  cache_control?: { type: 'ephemeral' }  // Optional caching support
}

🧪 Comprehensive Test Coverage

Added 9 new test cases covering:

  • ✅ Uint8Array to base64 conversion for MP3 and WAV
  • ✅ Format normalization (audio/mpeg → mp3, audio/x-wav → wav, audio/mp3 → mp3)
  • ✅ Base64 data URL extraction
  • ✅ Raw base64 string handling
  • ✅ Cache control support for audio parts
  • ✅ Proper error handling for unsupported formats
  • ✅ Helpful error for external URLs

All 26 tests in convert-to-openrouter-chat-messages.test.ts pass successfully.

🔧 Technical Improvements

  • Type Safety: Added ChatCompletionContentPartInputAudio interface with proper TypeScript types
  • Code Quality: Fixed pre-existing TypeScript errors by adding const assertions to Set literals
  • Maintainability: Used regex matching in tests for more flexible error message validation
  • Documentation: All error messages link directly to OpenRouter's audio documentation

📖 Usage Example

import { openrouter } from '@openrouter/ai-sdk-provider';
import { generateText } from 'ai';
import fs from 'fs';

const audioFile = fs.readFileSync('speech.mp3');

const result = await generateText({
  model: openrouter('openai/gpt-4o-audio-preview'),
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'What is being said in this audio?' },
      { 
        type: 'file',
        data: audioFile,  // Uint8Array
        mediaType: 'audio/mpeg'  // or 'audio/mp3'
      }
    ]
  }]
});

🚀 Impact

This enhancement makes OpenRouter's audio capabilities immediately accessible to all AI SDK users, enabling new use cases like:

  • Voice message analysis
  • Audio transcription workflows
  • Multimodal applications combining audio with text and images
  • Accessibility features for audio content

📝 Notes

  • External audio URLs are not currently supported by OpenRouter's API - the error message provides clear migration guidance
  • Only MP3 and WAV formats are supported per OpenRouter's specification
  • The implementation follows the same patterns as existing image and document handling for consistency

This PR maintains backward compatibility and requires no changes to existing code. Audio support is purely additive and opt-in through the file content type.

Add native support for audio file inputs using OpenRouter's input_audio
format, enabling models to process audio content directly.

* Add ChatCompletionContentPartInputAudio type for input_audio format
* Support mp3 and wav audio formats with automatic format normalization
* Handle base64-encoded audio data from data URLs
* Validate audio formats and provide clear error messages
* Maintain cache_control support for audio parts
* Throw helpful error for unsupported external audio URLs

The implementation checks for audio/* media types and converts them to
OpenRouter's expected input_audio format with base64 data and format
specification.
Fix TypeScript type errors by adding const assertions to Set constructors
throughout the codebase. This ensures the Set type matches the expected
template literal type `Set<\`${string}:\`>` in the isUrl function signature.

* Add const assertion to audio input URL check
* Add const assertion to PDF file URL check
* Add const assertion to file-url-utils URL check
Add test coverage for audio input handling including format normalization,
base64 conversion, error handling, and cache control support.

* Test mp3 and wav format conversion
* Test audio/x-wav normalization to wav
* Test base64 data URL extraction
* Test raw base64 string handling
* Test error for external audio URLs
* Test error for unsupported audio formats
* Test cache control with audio inputs

All tests pass successfully.
Improve user experience with clearer, more actionable error messages
for audio handling that guide developers to the correct solution.

* Add helpful multi-line error messages with specific solutions
* Include supported MIME types (audio/mpeg, audio/mp3, audio/wav, audio/x-wav)
* Clarify that audio/mp3 and audio/mpeg both map to mp3 format
* Add support for audio/mp3 and audio/wave MIME types
* Link to OpenRouter documentation for audio features
* Provide step-by-step instructions for URL error case
* Update tests to use regex matching for error messages
* Add test for audio/mp3 MIME type normalization

All tests pass (26 total in convert-to-openrouter-chat-messages.test.ts)
Comment on lines 102 to 164
if (part.mediaType?.startsWith('audio/')) {
const fileData = getFileUrl({
part,
defaultMediaType: 'audio/mpeg',
});

// Check if fileData is a URL - if so, we need to download it
let base64Data: string;
if (
isUrl({
url: fileData,
protocols: new Set(['http:', 'https:'] as const),
})
) {
// For URLs, OpenRouter's input_audio doesn't support URLs directly
// We need to download and convert to base64
// For now, we'll throw an error to indicate this limitation
throw new Error(
`Audio files cannot be provided as URLs.\n\n` +
`OpenRouter requires audio to be base64-encoded. Please:\n` +
`1. Download the audio file locally\n` +
`2. Read it as a Buffer or Uint8Array\n` +
`3. Pass it as the data parameter\n\n` +
`The AI SDK will automatically handle base64 encoding.\n\n` +
`Learn more: https://openrouter.ai/docs/features/multimodal/audio`,
);
} else {
// Extract base64 data (handles both data URLs and raw base64)
base64Data = getBase64FromDataUrl(fileData);
}

// Map media type to format
const mediaType = part.mediaType || 'audio/mpeg';
let format = mediaType.replace('audio/', '');

// Normalize format names for OpenRouter
// Common MIME types: audio/mpeg, audio/mp3 -> mp3
// audio/wav, audio/x-wav, audio/wave -> wav
if (format === 'mpeg' || format === 'mp3') {
format = 'mp3';
} else if (format === 'x-wav' || format === 'wave' || format === 'wav') {
format = 'wav';
}

// Validate format - OpenRouter only supports mp3 and wav
if (format !== 'mp3' && format !== 'wav') {
throw new Error(
`Unsupported audio format: "${mediaType}"\n\n` +
`OpenRouter only supports MP3 and WAV audio formats.\n` +
`• For MP3: use "audio/mpeg" or "audio/mp3"\n` +
`• For WAV: use "audio/wav" or "audio/x-wav"\n\n` +
`Learn more: https://openrouter.ai/docs/features/multimodal/audio`,
);
}

return {
type: 'input_audio' as const,
input_audio: {
data: base64Data,
format: format as 'mp3' | 'wav',
},
cache_control: cacheControl,
} satisfies ChatCompletionContentPart;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this section should be moved into its own function, so we can reuse it. The logic looks very self-contained

Copy link
Contributor

@louisgv louisgv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! LGTM beside the nits!

Karavil and others added 3 commits November 26, 2025 13:45
… function

Address PR review feedback to make audio handling more modular and reusable.

* Extract inline audio logic into getInputAudioData() function
* Return generic { data, format } object for flexibility
* Add comprehensive TSDoc with @throws and @example
Colocate audio conversion logic with other file/URL utilities for better
discoverability and reuse.
@Karavil
Copy link
Contributor Author

Karavil commented Nov 26, 2025

@louisgv Pulled out into a utility while keeping it composable for other audio utils in the future.

@Karavil Karavil requested a review from louisgv November 26, 2025 18:49
@subtleGradient subtleGradient merged commit fc002e0 into OpenRouterTeam:main Nov 26, 2025
2 checks passed
@github-actions github-actions bot mentioned this pull request Nov 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants