-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Description
Name and Version
Server built from c31fc8b
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Problem description & steps to reproduce
It was mentioned in the discussion of the codestral model that changes in #10023 made /infill endpoint add <bos> token incorrectly. I'm not really sure it's the case since before this change prompt was a required field, but anyway it doesn't seem to be correct.
To reproduce you can use this model: https://huggingface.co/bartowski/codegemma-2b-GGUF with the following request:
curl -XPOST "localhost:8080/infill" -d '{"input_prefix": "1, ", "input_suffix": ", 5"}' -H "Content-Type: application/json"
In the response you will see 2 <bos> tokens: "prompt": "<bos><|fim_prefix|> 1, <bos><|fim_suffix|> , 5<|fim_middle|>".
According to the codegemma readme there shouldn't be any <bos> tokens (see the prompt in the first code snippet there).
I don't see any discussion in the mentioned PR regarding special tokens, so I guess this wasn't intentional? Feel free to close this issue if I'm wrong.
The fix is to simply change the flag in this line: https://github.com/ggerganov/llama.cpp/blob/b56f079e28fda692f11a8b59200ceb815b05d419/examples/server/server.cpp#L3800
First Bad Commit
Relevant log output
No response