-
Notifications
You must be signed in to change notification settings - Fork 157
Description
Python -VV
Python 3.13.7 (main, Aug 18 2025, 19:02:43) [Clang 20.1.4 ]Pip Freeze
annotated-types==0.7.0
anyio==4.12.0
certifi==2025.11.12
eval-type-backport==0.3.1
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
idna==3.11
invoke==2.2.1
mistralai==1.9.11
pydantic==2.12.5
pydantic-core==2.41.5
python-dateutil==2.9.0.post0
pyyaml==6.0.3
six==1.17.0
typing-extensions==4.15.0
typing-inspection==0.4.2Reproduction Steps
- Create an Azure AI Foundry resource, with a project inside
- In the project, create a deployment of
mistral-document-ai-2505(Data Zone Standard deployment) - From the page of the deployment, copy the API Key and the Target URI
Warning
The format of the target URI is currently the following:
https://<YOUR_AI_FOUNDRY_RESOURCE_NAME>.services.ai.azure.com/providers/mistral/azure/ocr
It is different from the format mentioned by the Mistral official documentation
- Run this minimal code to call the model using a base64-encoded PDF file
import base64
from mistralai_azure import MistralAzure
# Get the AZURE_ENDPOINT and the AZURE_API_KEY from the model deployment page (in the Azure AI Foundry portal)
AZURE_ENDPOINT = "https://<YOUR_AI_FOUNDRY_RESOURCE_NAME>.services.ai.azure.com/providers/mistral/azure/ocr"
AZURE_API_KEY = "<YOUR_KEY_HERE>"
# Initialize the client with information from the Azure AI Foundry UI
client = MistralAzure(azure_endpoint=AZURE_ENDPOINT, azure_api_key=AZURE_API_KEY)
# Encode a PDF file in base64
with open("test_file.pdf", "rb") as pdf_file:
encoded_document = base64.b64encode(pdf_file.read()).decode('utf-8')
# Run OCR
response = client.ocr.process(
model="mistral-document-ai-2505",
document={"type": "document_url", "document_url": f"data:application/pdf;base64,{encoded_document}"},
include_image_base64=False,
)- The API returns an error with the following stacktrace:
Traceback (most recent call last):
File "/Users/abcdefg/sandbox/mistral-bug/main.py", line 18, in <module>
response = client.ocr.process(
model="mistral-document-ai-2505",
document={"type": "document_url", "document_url": f"data:application/pdf;base64,{encoded_document}"},
include_image_base64=False,
)
File "/Users/abcdefg/sandbox/mistral-bug/.venv/lib/python3.13/site-packages/mistralai_azure/ocr.py", line 124, in process
raise models.SDKError(
"API error occurred", http_res.status_code, http_res_text, http_res
)
mistralai_azure.models.sdkerror.SDKError: API error occurred: Status 404
NOT FOUND
Expected Behavior
I expect to get a successful response from the model, since it works using this curl command:
curl -X POST "https://<YOUR_AI_FOUNDRY_RESOURCE_NAME>.services.ai.azure.com/providers/mistral/azure/ocr" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AZURE_API_KEY" \
-d '{
"model": "mistral-document-ai-2505",
"document": {
"type": "document_url",
"document_url": "data:application/pdf;base64,<content_of_base64_string>"
},
"include_image_base64": false
}'Additional Context
No response
Suggested Solutions
It seems to me that the issue is related to how the azure_endpoint variable is modified by the mistralai library.
Looking at this block in the sdk.py file:
# if azure_endpoint doesn't end with `/v1` add it
if not azure_endpoint.endswith("/"):
azure_endpoint += "/"
if not azure_endpoint.endswith("v1/"):
azure_endpoint += "v1/"
server_url = azure_endpointIt simply appends v1/ to the provided endpoint. This new base_url is then passed to the Ocr class in ocr.py, which builds a POST request using that base_url + another argument path="/ocr".
Therefore, the final (wrong) URL looks like this:
https://<YOUR_AI_FOUNDRY_RESOURCE_NAME>.services.ai.azure.com/providers/mistral/azure/ocr/v1/ocr
In order to work with the actual endpoint format mentioned on the deployment page in AI Foundry, the code in sdk.py should do something like this instead:
if azure_endpoint.endswith("/ocr"):
azure_endpoint = azure_endpoint.rsplit("/", maxsplit=1)[0]
server_url = azure_endpointThis change solves the issue for the OCR models, but should be double-checked for the other non-OCR Mistral models on Azure (I don't know their endpoint formats).