Your current environment
`python collect_env.py`
python collect_env.py failed due to this is a Nvidia L20 machine, no Ascend modules
🐛 Describe the bug
Online API Inference Generates Repetitive "batis)batis)" Pattern Instead of Coherent Text
I have follow the guide in "quick start"
other information:
Offline Inference (offline_inference.py) is good
source code version: tag v0.1.0rc4
key logs:
···
curl http://localhost:7800/v1/completions -H "Content-Type: application/json" -d '{
"model": "vllm_cpu_offload",
"prompt": "Shanghai is a",
"max_tokens": 7,
"temperature": 0
}'
···
{"id":"cmpl-42e9ba776e284460a48bcbdafb848d0e","object":"text_completion","created":1764055654,"model":"vllm_cpu_offload","choices":[{"index":0,"text":"batis)batis)batis)batis","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":4,"total_tokens":11,"completion_tokens":7,"prompt_tokens_details":null},"kv_transfer_params":null}(modelscope-py310)
more details in attachment:
log20251125.txt