-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add configurable audio encoding for OpenAI models (Data URI support) #3596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add configurable audio encoding for OpenAI models (Data URI support) #3596
Conversation
This commit introduces `openai_audio_input_encoding` to `OpenAIModelProfile`, allowing users to choose between `'base64'` (default) and `'uri'` encoding for audio inputs. This addresses compatibility issues with providers like Qwen Omni that require Data URI format for audio data. Key changes: - Added `openai_audio_input_encoding` to `OpenAIModelProfile`. - Updated `OpenAIChatModel._map_user_prompt` to respect the configured encoding for `BinaryContent` and `AudioUrl`. - Added new tests in `tests/models/test_openai_audio.py` covering both encoding modes.
| profile = OpenAIModelProfile.from_profile(self.profile) | ||
| if profile.openai_audio_input_encoding == 'uri': | ||
| format_to_mime = {'wav': 'audio/wav', 'mp3': 'audio/mpeg'} | ||
| mime_type = format_to_mime.get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use item.media_type right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point — I can use item.media_type here instead of maintaining my own format_to_mime mapping. I’ll update the AudioUrl handling in _map_user_prompt to construct the data URI using item.media_type, with a simple 'fallback' if it’s missing.
| openai_chat_supports_web_search: bool = False | ||
| """Whether the model supports web search in Chat Completions API.""" | ||
|
|
||
| openai_audio_input_encoding: Literal['base64', 'uri'] = 'base64' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specific to OpenAIChatModel and doesn't affect OpenAIResponsesModel so let's prefix with openai_chat_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense — this is only used by OpenAIChatModel. I’ll rename the profile field to openai_chat_audio_input_encoding and update the chat mapping to use that, so it’s clearly scoped to Chat Completions and doesn’t imply anything about OpenAIResponsesModel.
| """The encoding to use for audio input. | ||
| - `'base64'`: Raw base64 encoded string. (Default, used by OpenAI) | ||
| - `'uri'`: Data URI (e.g. `data:audio/wav;base64,...`). (Used by Qwen Omni) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still make it so that this is used automatically for Qwen Omni. If that's only a requirement of Qwen's own ChatCompletions-compatible API, we may want a new provider class that can define its own model_profile method and be used with OpenAIChatModel. We shouldn't set this in the existing qwen_model_profile method as Qwen can also be used with providers that probably do not have this quirk.
|
@DouweM , For the Qwen Omni integration specifically, I’d like to follow your suggestion and handle the Data URI requirement via a dedicated provider rather than changing the shared qwen_model_profile.
|
|
@Pavanmanikanta98 Thanks, makes sense. It should be just |
…i models - Add QwenProvider for DashScope OpenAI-compatible API - Rename openai_audio_input_encoding to openai_chat_audio_input_encoding - Use item.media_type for Data URI MIME types instead of hardcoded mapping - Automatically set Data URI audio encoding for Qwen Omni models - Add comprehensive tests for QwenProvider and audio encoding - Add Qwen documentation section to OpenAI-compatible models docs Fixes pydantic#3530
- Include 'qwen' in the model inference options for compatibility with Qwen models. - Set up environment variable for Qwen API key in test_examples.py to facilitate testing. This enhances the integration of Qwen models within the existing framework.
- Add tests for initializing QwenProvider with `openai_client` and `http_client` to ensure full branch coverage.
|
Hi @DouweM, I've addressed your feedback: renamed to openai_chat_audio_input_encoding, used item.media_type instead of hardcoded mapping, and added QwenProvider with automatic Omni audio encoding. All tests pass. |
Fixes #3530
Key Changes:
encoding is 'uri'.