[Bug]: Online API Inference Generates Repetitive "batis)batis)" Pattern Instead of Coherent Text

### Your current environment

<details>
<summary> `python collect_env.py` </summary>

```text
 python collect_env.py failed due to this is a Nvidia L20 machine, no Ascend modules
```

</details>


### 🐛 Describe the bug

Online API Inference Generates Repetitive "batis)batis)" Pattern Instead of Coherent Text 
I have follow the guide in "quick start"

other information:
Offline Inference (offline_inference.py) is good
source code version: tag v0.1.0rc4


key logs:
···
curl http://localhost:7800/v1/completions     -H "Content-Type: application/json"     -d '{
        "model": "vllm_cpu_offload",
        "prompt": "Shanghai is a",
        "max_tokens": 7,
        "temperature": 0
    }'
···
{"id":"cmpl-42e9ba776e284460a48bcbdafb848d0e","object":"text_completion","created":1764055654,"model":"vllm_cpu_offload","choices":[{"index":0,"text":"batis)batis)batis)batis","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":4,"total_tokens":11,"completion_tokens":7,"prompt_tokens_details":null},"kv_transfer_params":null}(modelscope-py310) 

more details in attachment:
[log20251125.txt](https://github.com/user-attachments/files/23742300/log20251125.txt)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Online API Inference Generates Repetitive "batis)batis)" Pattern Instead of Coherent Text #409

Your current environment

🐛 Describe the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Online API Inference Generates Repetitive "batis)batis)" Pattern Instead of Coherent Text #409

Description

Your current environment

🐛 Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions