Skip to content

Conversation

@Pavanmanikanta98
Copy link
Contributor

@Pavanmanikanta98 Pavanmanikanta98 commented Nov 29, 2025

Fixes #3530

Key Changes:

  • Updated Profile: Added openai_audio_input_encoding: Literal['base64', 'uri'] to OpenAIModelProfile.
    • 'base64' (default): Maintains strict OpenAI compliance.
    • 'uri': Enables Data URI formatting for providers like Qwen Omni.
  • Updated Model Logic: Modified OpenAIChatModel._map_user_prompt to respect this setting.
    • For BinaryContent: Uses item.data_uri when encoding is 'uri'.
    • For AudioUrl: Manually constructs the Data URI with the correct MIME type (e.g., audio/mpeg for mp3) when
      encoding is 'uri'.
  • New Tests: Added tests/models/test_openai_audio.py covering both default and URI encoding scenarios for both binary content and audio URLs.

This commit introduces `openai_audio_input_encoding` to `OpenAIModelProfile`, allowing users to choose between `'base64'` (default) and `'uri'` encoding for audio inputs. This addresses compatibility issues with providers like Qwen Omni that require Data URI format for audio data.

Key changes:
- Added `openai_audio_input_encoding` to `OpenAIModelProfile`.
- Updated `OpenAIChatModel._map_user_prompt` to respect the configured encoding for `BinaryContent` and `AudioUrl`.
- Added new tests in `tests/models/test_openai_audio.py` covering both encoding modes.
profile = OpenAIModelProfile.from_profile(self.profile)
if profile.openai_audio_input_encoding == 'uri':
format_to_mime = {'wav': 'audio/wav', 'mp3': 'audio/mpeg'}
mime_type = format_to_mime.get(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use item.media_type right?

Copy link
Contributor Author

@Pavanmanikanta98 Pavanmanikanta98 Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point — I can use item.media_type here instead of maintaining my own format_to_mime mapping. I’ll update the AudioUrl handling in _map_user_prompt to construct the data URI using item.media_type, with a simple 'fallback' if it’s missing.

openai_chat_supports_web_search: bool = False
"""Whether the model supports web search in Chat Completions API."""

openai_audio_input_encoding: Literal['base64', 'uri'] = 'base64'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specific to OpenAIChatModel and doesn't affect OpenAIResponsesModel so let's prefix with openai_chat_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense — this is only used by OpenAIChatModel. I’ll rename the profile field to openai_chat_audio_input_encoding and update the chat mapping to use that, so it’s clearly scoped to Chat Completions and doesn’t imply anything about OpenAIResponsesModel.

"""The encoding to use for audio input.
- `'base64'`: Raw base64 encoded string. (Default, used by OpenAI)
- `'uri'`: Data URI (e.g. `data:audio/wav;base64,...`). (Used by Qwen Omni)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still make it so that this is used automatically for Qwen Omni. If that's only a requirement of Qwen's own ChatCompletions-compatible API, we may want a new provider class that can define its own model_profile method and be used with OpenAIChatModel. We shouldn't set this in the existing qwen_model_profile method as Qwen can also be used with providers that probably do not have this quirk.

@Pavanmanikanta98
Copy link
Contributor Author

@DouweM , For the Qwen Omni integration specifically, I’d like to follow your suggestion and handle the Data URI requirement via a dedicated provider rather than changing the shared qwen_model_profile.

Concretely, my plan is:

  • Add a new provider class for Qwen’s OpenAI‑compatible Chat Completions endpoint (e.g. QwenOpenAIProvider ), which implements its own model_profile(self, model_name: str).
  • That model_profile will start from the standard OpenAI profile (e.g. openai_model_profile(model_name)) and then update it to set openai_chat_audio_input_encoding='uri'.
  • Users who want to talk to Qwen Omni via an OpenAI‑style API would instantiate OpenAIChatModel with this provider and the Qwen Omni base URL, and they’d automatically get Data URI audio, while other Qwen providers keep the default base64 behavior.

@DouweM
Copy link
Collaborator

DouweM commented Dec 2, 2025

@Pavanmanikanta98 Thanks, makes sense. It should be just QwenProvider, and we should also support the qwen: model name prefix, update the docs, etc. See https://ai.pydantic.dev/models/openai/#openai-compatible-models for examples; anywhere those a referenced in the code, we should add a branch for qwen as well.

pavan added 3 commits December 3, 2025 22:15
…i models

- Add QwenProvider for DashScope OpenAI-compatible API
- Rename openai_audio_input_encoding to openai_chat_audio_input_encoding
- Use item.media_type for Data URI MIME types instead of hardcoded mapping
- Automatically set Data URI audio encoding for Qwen Omni models
- Add comprehensive tests for QwenProvider and audio encoding
- Add Qwen documentation section to OpenAI-compatible models docs

Fixes pydantic#3530
- Include 'qwen' in the model inference options for compatibility with Qwen models.
- Set up environment variable for Qwen API key in test_examples.py to facilitate testing.

This enhances the integration of Qwen models within the existing framework.
- Add tests for initializing QwenProvider with `openai_client` and `http_client` to ensure full branch coverage.
@Pavanmanikanta98
Copy link
Contributor Author

Hi @DouweM, I've addressed your feedback: renamed to openai_chat_audio_input_encoding, used item.media_type instead of hardcoded mapping, and added QwenProvider with automatic Omni audio encoding. All tests pass.
Ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The way OpenAIChatModel sends input audio is incompatible with Qwen Omni API

2 participants