Skip to content

The way OpenAIChatModel sends input audio is incompatible with Qwen Omni API #3530

@mkroen

Description

@mkroen

Question

qwen omni
According to the Qwen Omni documentation, audio input must be formatted like this:

messages = [
  {
    "role": "user",
    "content": [
      {
        "input_audio": {
          "data": "data:;base64,{base64_audio}",
          "format": "wav"
        },
        "type": "input_audio"
      },
      {
        "text": "prompt",
        "type": "text"
      }
    ]
  }
]

I am using BinaryContent in Pydantic AI like this:

response = await agent.run(
    user_prompt=[
        BinaryContent(
            data=base64.b64decode(audio_base64),
            media_type="audio/wav",
        ),
        self.prompt,
    ],
)

But the serialized content becomes:

{"data": "{base64_audio}", "format": "wav"}

The generated output is missing the required prefix:

data:;base64,

This means the final payload is not compatible with Qwen Omni’s expected input format.

Question

Is there a correct way to:

  1. Make BinaryContent output the audio in the required data:;base64,{...} format?
  2. Or hook into the serialization layer so I can manually prepend the prefix?
  3. Any recommended workaround or official way to customize the content part format would be helpful.

Additional Context

python==3.10.8
pydantic-ai-slim[openai]==1.21.0

LLM model: qwen3-omni-flash

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions