-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Description
Your current environment
The bug is reproducible with docker image vllm/vllm-openai:v0.12.0
services:
vllm-gptoss-large:
image: vllm/vllm-openai:v0.12.0
restart: always
shm_size: '64gb'
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0', '1']
capabilities: [gpu]
volumes:
- ./data/hf:/data
environment:
- HF_TOKEN=${HF_TOKEN}
ports:
- 8000:8000
command: ["openai/gpt-oss-120b",
"--tool-call-parser","openai",
"--enable-auto-tool-choice",
"--reasoning-parser","openai_gptoss",
"--tensor-parallel-size","2",
"--port","8000",
"--api-key", "${VLLM_API_KEY}",
"--download_dir", "/data"]🐛 Describe the bug
This bash script cannot be executed a second time, unless the name of the function is changed to a value which was not yet sent. Without tool definition, the POST can be sent as often as you like.
#!/bin/bash
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer ${VLLM_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-120b",
"stream": false,
"messages": [
{
"role": "system",
"content": "Be a helpful assistant."
},
{
"role": "user",
"content": "Hi"
},
{
"role": "assistant",
"content": "How can I help you?"
},
{
"role": "user",
"content": "Do you like Monty Python?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "CHANGE-NAME-BEFORE-SENDING",
"description": "Use this tool if you need to extract information from a website.",
"parameters": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "The URL to search or extract information from."
}
},
"required": ["url"]
}
}
}
]
}'
The script doesn't finish waiting for a response and nvidia-smi shows the cards consuming max power. The vllm logs show that there are tokens generated, so from an external point of view the LLM seems to generate tokens without stopping.
This is quite weird, because when you call it with python SDK, it is working fine, e.g.
from openai import OpenAI
from dotenv import load_dotenv
import os
load_dotenv()
client = OpenAI(
api_key=os.getenv("API_KEY"),
base_url="http://localhost:8000/v1",
)
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"description": "Location and state, e.g., 'San Francisco, CA'"
},
"required": ["location"]
},
},
}
]
response = client.chat.completions.create(
model="openai/gpt-oss-120b",
messages=[{"role": "user", "content": "How is the weather in Berlin? use the tool get_weather."}],
tools=tools,
tool_choice="auto",
stream=False
)
print(response.choices[0].message)In fact this can also be reproduced using n8n, AI Agent nodes which are based on the typescipt langgraph implementation: https://github.com/n8n-io/n8n/blob/master/packages/%40n8n/nodes-langchain/nodes/agents/Agent/agents/ToolsAgent/V1/execute.ts#L34
Here you can also see that chat windows freeze when a tool is attached and a user is asking the second question.
The bug really seems to be related to this model, because I tested Mistral and Qwen Models and I couldn't reproduce it. When I tried to debug the issue, there was a sensetivity to the description field in the parameters list of the tool. To make it clear, this can also only be sent once using the OpenAI Python SDK, but works again when the function name is changed:
from openai import OpenAI
from dotenv import load_dotenv
import os
load_dotenv()
client = OpenAI(
api_key=os.getenv("API_KEY"),
base_url=f"https://{os.getenv('API_DOMAIN')}/v1",
)
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "Location and state, e.g., 'San Francisco, CA'"
},
},
"required": ["location"]
},
},
}
]
response = client.chat.completions.create(
model="openai/gpt-oss-120b",
messages=[{"role": "user", "content": "How is the weather in Berlin? use the tool get_weather."}],
tools=tools,
tool_choice="auto",
stream=False
)
print(response.choices[0].message)Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.