Skip to content

[BUG CLIENT]: Incorrect azure_endpoint when using the MistralAzure client for OCR models #292

@alouinisofiene

Description

@alouinisofiene

Python -VV

Python 3.13.7 (main, Aug 18 2025, 19:02:43) [Clang 20.1.4 ]

Pip Freeze

annotated-types==0.7.0
anyio==4.12.0
certifi==2025.11.12
eval-type-backport==0.3.1
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
idna==3.11
invoke==2.2.1
mistralai==1.9.11
pydantic==2.12.5
pydantic-core==2.41.5
python-dateutil==2.9.0.post0
pyyaml==6.0.3
six==1.17.0
typing-extensions==4.15.0
typing-inspection==0.4.2

Reproduction Steps

  1. Create an Azure AI Foundry resource, with a project inside
  2. In the project, create a deployment of mistral-document-ai-2505 (Data Zone Standard deployment)
  3. From the page of the deployment, copy the API Key and the Target URI

Warning

The format of the target URI is currently the following:

https://<YOUR_AI_FOUNDRY_RESOURCE_NAME>.services.ai.azure.com/providers/mistral/azure/ocr

It is different from the format mentioned by the Mistral official documentation

  1. Run this minimal code to call the model using a base64-encoded PDF file
import base64
from mistralai_azure import MistralAzure

# Get the AZURE_ENDPOINT and the AZURE_API_KEY from the model deployment page (in the Azure AI Foundry portal)
AZURE_ENDPOINT = "https://<YOUR_AI_FOUNDRY_RESOURCE_NAME>.services.ai.azure.com/providers/mistral/azure/ocr"
AZURE_API_KEY = "<YOUR_KEY_HERE>"

# Initialize the client with information from the Azure AI Foundry UI
client = MistralAzure(azure_endpoint=AZURE_ENDPOINT, azure_api_key=AZURE_API_KEY)

# Encode a PDF file in base64
with open("test_file.pdf", "rb") as pdf_file:
    encoded_document = base64.b64encode(pdf_file.read()).decode('utf-8')

# Run OCR
response = client.ocr.process(
    model="mistral-document-ai-2505",
    document={"type": "document_url", "document_url": f"data:application/pdf;base64,{encoded_document}"},
    include_image_base64=False,
)
  1. The API returns an error with the following stacktrace:
Traceback (most recent call last):
  File "/Users/abcdefg/sandbox/mistral-bug/main.py", line 18, in <module>
    response = client.ocr.process(
        model="mistral-document-ai-2505",
        document={"type": "document_url", "document_url": f"data:application/pdf;base64,{encoded_document}"},
        include_image_base64=False,
    )
  File "/Users/abcdefg/sandbox/mistral-bug/.venv/lib/python3.13/site-packages/mistralai_azure/ocr.py", line 124, in process
    raise models.SDKError(
        "API error occurred", http_res.status_code, http_res_text, http_res
    )
mistralai_azure.models.sdkerror.SDKError: API error occurred: Status 404
NOT FOUND

Expected Behavior

I expect to get a successful response from the model, since it works using this curl command:

curl -X POST "https://<YOUR_AI_FOUNDRY_RESOURCE_NAME>.services.ai.azure.com/providers/mistral/azure/ocr" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_API_KEY" \
  -d '{
      "model": "mistral-document-ai-2505",
      "document": {
       "type": "document_url",
       "document_url": "data:application/pdf;base64,<content_of_base64_string>"
      },
      "include_image_base64": false
    }'

Additional Context

No response

Suggested Solutions

It seems to me that the issue is related to how the azure_endpoint variable is modified by the mistralai library.

Looking at this block in the sdk.py file:

        # if azure_endpoint doesn't end with `/v1` add it
        if not azure_endpoint.endswith("/"):
            azure_endpoint += "/"
        if not azure_endpoint.endswith("v1/"):
            azure_endpoint += "v1/"
        server_url = azure_endpoint

It simply appends v1/ to the provided endpoint. This new base_url is then passed to the Ocr class in ocr.py, which builds a POST request using that base_url + another argument path="/ocr".

Therefore, the final (wrong) URL looks like this:

https://<YOUR_AI_FOUNDRY_RESOURCE_NAME>.services.ai.azure.com/providers/mistral/azure/ocr/v1/ocr

In order to work with the actual endpoint format mentioned on the deployment page in AI Foundry, the code in sdk.py should do something like this instead:

        if azure_endpoint.endswith("/ocr"):
            azure_endpoint = azure_endpoint.rsplit("/", maxsplit=1)[0]
        server_url = azure_endpoint

This change solves the issue for the OCR models, but should be double-checked for the other non-OCR Mistral models on Azure (I don't know their endpoint formats).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions