✨ feat: add support for audio input with input_audio format #241

Karavil · 2025-11-20T01:22:46Z

🎵 Native Audio Input Support for OpenRouter

This PR introduces first-class support for audio file inputs through OpenRouter's input_audio format, enabling AI models to process audio content seamlessly alongside text, images, and documents.

✨ Key Features

🎯 Smart Format Detection & Normalization

The implementation intelligently handles various audio MIME types that developers commonly use:

MP3 Support: audio/mpeg, audio/mp3 → mp3
WAV Support: audio/wav, audio/x-wav, audio/wave → wav

This means developers can use the MIME types they're familiar with from the Web Audio API, file uploads, or various audio libraries without worrying about OpenRouter's specific format requirements.

📚 Developer-Friendly Error Messages

When things go wrong, developers get clear, actionable guidance instead of cryptic errors:

For unsupported formats:

Unsupported audio format: "audio/ogg"

OpenRouter only supports MP3 and WAV audio formats.
• For MP3: use "audio/mpeg" or "audio/mp3"
• For WAV: use "audio/wav" or "audio/x-wav"

Learn more: https://openrouter.ai/docs/features/multimodal/audio

For URL-based audio (not yet supported):

Audio files cannot be provided as URLs.

OpenRouter requires audio to be base64-encoded. Please:
1. Download the audio file locally
2. Read it as a Buffer or Uint8Array
3. Pass it as the data parameter

The AI SDK will automatically handle base64 encoding.

Learn more: https://openrouter.ai/docs/features/multimodal/audio

🏗️ Implementation Details

The audio handler integrates seamlessly into the existing multimodal content pipeline:

Detection: Checks for audio/* media types on file parts
Conversion: Transforms audio data into OpenRouter's input_audio structure
Encoding: Handles base64 encoding from Uint8Array or extracts from data URLs
Validation: Ensures only supported formats reach the API

Generated structure:

{
  type: 'input_audio',
  input_audio: {
    data: '<base64-encoded-audio>',
    format: 'mp3' | 'wav'
  },
  cache_control?: { type: 'ephemeral' }  // Optional caching support
}

🧪 Comprehensive Test Coverage

Added 9 new test cases covering:

✅ Uint8Array to base64 conversion for MP3 and WAV
✅ Format normalization (audio/mpeg → mp3, audio/x-wav → wav, audio/mp3 → mp3)
✅ Base64 data URL extraction
✅ Raw base64 string handling
✅ Cache control support for audio parts
✅ Proper error handling for unsupported formats
✅ Helpful error for external URLs

All 26 tests in convert-to-openrouter-chat-messages.test.ts pass successfully.

🔧 Technical Improvements

Type Safety: Added ChatCompletionContentPartInputAudio interface with proper TypeScript types
Code Quality: Fixed pre-existing TypeScript errors by adding const assertions to Set literals
Maintainability: Used regex matching in tests for more flexible error message validation
Documentation: All error messages link directly to OpenRouter's audio documentation

📖 Usage Example

import { openrouter } from '@openrouter/ai-sdk-provider';
import { generateText } from 'ai';
import fs from 'fs';

const audioFile = fs.readFileSync('speech.mp3');

const result = await generateText({
  model: openrouter('openai/gpt-4o-audio-preview'),
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'What is being said in this audio?' },
      { 
        type: 'file',
        data: audioFile,  // Uint8Array
        mediaType: 'audio/mpeg'  // or 'audio/mp3'
      }
    ]
  }]
});

🚀 Impact

This enhancement makes OpenRouter's audio capabilities immediately accessible to all AI SDK users, enabling new use cases like:

Voice message analysis
Audio transcription workflows
Multimodal applications combining audio with text and images
Accessibility features for audio content

📝 Notes

External audio URLs are not currently supported by OpenRouter's API - the error message provides clear migration guidance
Only MP3 and WAV formats are supported per OpenRouter's specification
The implementation follows the same patterns as existing image and document handling for consistency

This PR maintains backward compatibility and requires no changes to existing code. Audio support is purely additive and opt-in through the file content type.

Add native support for audio file inputs using OpenRouter's input_audio format, enabling models to process audio content directly. * Add ChatCompletionContentPartInputAudio type for input_audio format * Support mp3 and wav audio formats with automatic format normalization * Handle base64-encoded audio data from data URLs * Validate audio formats and provide clear error messages * Maintain cache_control support for audio parts * Throw helpful error for unsupported external audio URLs The implementation checks for audio/* media types and converts them to OpenRouter's expected input_audio format with base64 data and format specification.

Fix TypeScript type errors by adding const assertions to Set constructors throughout the codebase. This ensures the Set type matches the expected template literal type `Set<\`${string}:\`>` in the isUrl function signature. * Add const assertion to audio input URL check * Add const assertion to PDF file URL check * Add const assertion to file-url-utils URL check

Add test coverage for audio input handling including format normalization, base64 conversion, error handling, and cache control support. * Test mp3 and wav format conversion * Test audio/x-wav normalization to wav * Test base64 data URL extraction * Test raw base64 string handling * Test error for external audio URLs * Test error for unsupported audio formats * Test cache control with audio inputs All tests pass successfully.

Improve user experience with clearer, more actionable error messages for audio handling that guide developers to the correct solution. * Add helpful multi-line error messages with specific solutions * Include supported MIME types (audio/mpeg, audio/mp3, audio/wav, audio/x-wav) * Clarify that audio/mp3 and audio/mpeg both map to mp3 format * Add support for audio/mp3 and audio/wave MIME types * Link to OpenRouter documentation for audio features * Provide step-by-step instructions for URL error case * Update tests to use regex matching for error messages * Add test for audio/mp3 MIME type normalization All tests pass (26 total in convert-to-openrouter-chat-messages.test.ts)

louisgv · 2025-11-26T16:19:27Z

src/chat/convert-to-openrouter-chat-messages.ts

+                if (part.mediaType?.startsWith('audio/')) {
+                  const fileData = getFileUrl({
+                    part,
+                    defaultMediaType: 'audio/mpeg',
+                  });
+
+                  // Check if fileData is a URL - if so, we need to download it
+                  let base64Data: string;
+                  if (
+                    isUrl({
+                      url: fileData,
+                      protocols: new Set(['http:', 'https:'] as const),
+                    })
+                  ) {
+                    // For URLs, OpenRouter's input_audio doesn't support URLs directly
+                    // We need to download and convert to base64
+                    // For now, we'll throw an error to indicate this limitation
+                    throw new Error(
+                      `Audio files cannot be provided as URLs.\n\n` +
+                        `OpenRouter requires audio to be base64-encoded. Please:\n` +
+                        `1. Download the audio file locally\n` +
+                        `2. Read it as a Buffer or Uint8Array\n` +
+                        `3. Pass it as the data parameter\n\n` +
+                        `The AI SDK will automatically handle base64 encoding.\n\n` +
+                        `Learn more: https://openrouter.ai/docs/features/multimodal/audio`,
+                    );
+                  } else {
+                    // Extract base64 data (handles both data URLs and raw base64)
+                    base64Data = getBase64FromDataUrl(fileData);
+                  }
+
+                  // Map media type to format
+                  const mediaType = part.mediaType || 'audio/mpeg';
+                  let format = mediaType.replace('audio/', '');
+
+                  // Normalize format names for OpenRouter
+                  // Common MIME types: audio/mpeg, audio/mp3 -> mp3
+                  // audio/wav, audio/x-wav, audio/wave -> wav
+                  if (format === 'mpeg' || format === 'mp3') {
+                    format = 'mp3';
+                  } else if (format === 'x-wav' || format === 'wave' || format === 'wav') {
+                    format = 'wav';
+                  }
+
+                  // Validate format - OpenRouter only supports mp3 and wav
+                  if (format !== 'mp3' && format !== 'wav') {
+                    throw new Error(
+                      `Unsupported audio format: "${mediaType}"\n\n` +
+                        `OpenRouter only supports MP3 and WAV audio formats.\n` +
+                        `• For MP3: use "audio/mpeg" or "audio/mp3"\n` +
+                        `• For WAV: use "audio/wav" or "audio/x-wav"\n\n` +
+                        `Learn more: https://openrouter.ai/docs/features/multimodal/audio`,
+                    );
+                  }
+
+                  return {
+                    type: 'input_audio' as const,
+                    input_audio: {
+                      data: base64Data,
+                      format: format as 'mp3' | 'wav',
+                    },
+                    cache_control: cacheControl,
+                  } satisfies ChatCompletionContentPart;


nit: this section should be moved into its own function, so we can reuse it. The logic looks very self-contained

louisgv

Thanks for the PR! LGTM beside the nits!

@throws

… function Address PR review feedback to make audio handling more modular and reusable. * Extract inline audio logic into getInputAudioData() function * Return generic { data, format } object for flexibility * Add comprehensive TSDoc with @throws and @example

Colocate audio conversion logic with other file/URL utilities for better discoverability and reuse.

Karavil · 2025-11-26T18:49:33Z

@louisgv Pulled out into a utility while keeping it composable for other audio utils in the future.

Karavil added 4 commits November 19, 2025 20:22

louisgv reviewed Nov 26, 2025

View reviewed changes

louisgv approved these changes Nov 26, 2025

View reviewed changes

Karavil and others added 3 commits November 26, 2025 13:45

♻️ refactor: move getInputAudioData to file-url-utils

98cbbe1

Colocate audio conversion logic with other file/URL utilities for better discoverability and reuse.

Merge branch 'main' into feat/add-audio-input-support

bd21c1d

Karavil requested a review from louisgv November 26, 2025 18:49

louisgv approved these changes Nov 26, 2025

View reviewed changes

subtleGradient merged commit fc002e0 into OpenRouterTeam:main Nov 26, 2025
2 checks passed

github-actions bot mentioned this pull request Nov 26, 2025

Version Packages #256

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

✨ feat: add support for audio input with input_audio format #241

✨ feat: add support for audio input with input_audio format #241

Uh oh!

Karavil commented Nov 20, 2025 •

edited

Loading

Uh oh!

louisgv Nov 26, 2025

Uh oh!

louisgv left a comment

Uh oh!

Karavil commented Nov 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

✨ feat: add support for audio input with input_audio format #241

✨ feat: add support for audio input with input_audio format #241

Uh oh!

Conversation

Karavil commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎵 Native Audio Input Support for OpenRouter

✨ Key Features

🎯 Smart Format Detection & Normalization

📚 Developer-Friendly Error Messages

For unsupported formats:

For URL-based audio (not yet supported):

🏗️ Implementation Details

🧪 Comprehensive Test Coverage

🔧 Technical Improvements

📖 Usage Example

🚀 Impact

📝 Notes

Uh oh!

louisgv Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

louisgv left a comment

Choose a reason for hiding this comment

Uh oh!

Karavil commented Nov 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Karavil commented Nov 20, 2025 •

edited

Loading