-
Notifications
You must be signed in to change notification settings - Fork 13.8k
Description
Name and Version
./build/bin/llama-cli --version
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (AMD EPYC 9654 96-Core Processor)
load_backend: failed to find ggml_backend_init in /data/ylwang/Projects/llama.cpp/build/bin/libggml-cpu.so
version: 7139 (923ae3c)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
./llama-server -m 0.ggufProblem description & steps to reproduce
PoC
import requests
a = '\n' * 2147483648
resp = requests.post(
"http://localhost:8080/v1/chat/completions",
json={
"messages": [
{"role": 'user', "content": a}
],
"max_tokens": 20
}
)
response = resp.json()
print(resp.json())
Running this Python file will reproduce the issue.
Displayed Result
You can see that llama.server returns:
python3 1.py
{'error': {'code': 500, 'message': 'this custom template is not supported, try using --jinja', 'type': 'server_error'}}
This message says that the template is incorrect, but the real reason is that the token is too long, causing an integer overflow.
gdb Debugging
Thread 6 "llama-server" hit Breakpoint 1, common_chat_templates_apply_legacy (tmpls=0x5030002a8c20, inputs=...) at /data/ylwang/Projects/llama.cpp/common/chat.cpp:3392
3392 int32_t res = llama_chat_apply_template(src.c_str(), chat.data(), chat.size(), inputs.add_generation_prompt, buf.data(), buf.size());
(gdb) p buf.size()
$2 = 2684354565
(gdb) p res
$3 = -2147483648
(gdb) n
3395 if (res < 0) {
(gdb) n
3398 throw std::runtime_error("this custom template is not supported, try using --jinja");
(gdb) n
Root Cause Analysis
In:
int32_t res = llama_chat_apply_template(src.c_str(), chat.data(), chat.size(), inputs.add_generation_prompt, buf.data(), buf.size());
an int32_t is used to store the return value of llama_chat_apply_template:
int32_t res = llm_chat_apply_template(detected_tmpl, chat_vec, formatted_chat, add_ass);
...
return res;
Inside llm_chat_apply_template:
return dest.size();
The actual return value is dest.size(), whose type is size_t. This should actually be stored in int64_t. When a negative value is returned, it indicates an error.
Fix Suggestion
For the function located at ./src/llama.cpp:336:int32_t llama_chat_apply_template(:
int32_t llama_chat_apply_template(
const char * tmpl,
const struct llama_chat_message * chat,
size_t n_msg,
bool add_ass,
char * buf,
int32_t length) {
Two places need to be changed:
- Change the return type int32_t to int64_t, so that template errors and integer-overflow-induced template error misreports can be distinguished.
- Change
int32_t lengthtoint64_t length, because the argument passed in is buf.size(), which is actually an unsigned integer. Considering that in real-world cases it is unlikely to exceed int64, changing int32 to int64 allows negative values to be handled properly.
In ./common/chat.cpp:3392: int32_t res = llama_chat_apply_template(src.c_str(), chat.data(), chat.size(), inputs.add_generation_prompt, buf.data(), buf.size());
we should use int64_t to store the value of res, making an overflow very unlikely in this range.
For ./src/llama-chat.cpp:225:int32_t llm_chat_apply_template(:
int32_t llm_chat_apply_template(
llm_chat_template tmpl,
const std::vector<const llama_chat_message *> & chat,
std::string & dest, bool add_ass) {
Change its return type to int64_t.
Use grep to ensure the fix does not affect other code
In a directory excluding tests and examples, run:
grep -Rn "llama_chat_apply_templae" .
The occurrences found are only:
./include/llama.h:1071: /// NOTE: This function does not use a jinja parser. It only support a pre-defined list of template. See more: https://github.com/ggml-org/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template
./include/llama.h:1079: LLAMA_API int32_t llama_chat_apply_template(
./common/chat.cpp:456: const int res = llama_chat_apply_template(tmpl.c_str(), chat, 1, true, nullptr, 0);
./common/chat.cpp:3357:// Legacy template route (adhoc C++ implementation of known templates), forward to llama_chat_apply_template.
./common/chat.cpp:3392: int32_t res = llama_chat_apply_template(src.c_str(), chat.data(), chat.size(), inputs.add_generation_prompt, buf.data(), buf.size());
./common/chat.cpp:3404: res = llama_chat_apply_template(src.c_str(), chat.data(), chat.size(), inputs.add_generation_prompt, buf.data(), buf.size());
./src/llama.cpp:336:int32_t llama_chat_apply_template(
Here:
./include/llama.his the declaration./common/chat.cppis where the res variable issue occurs./src/llama.cpp:336is where the function is defined
So we only need to modify the types declared in llama.h.
Using:
grep -Rn "llm_chat_apply_templae" .
The results are only:
./src/llama-chat.h:66:int32_t llm_chat_apply_template(
./src/llama.cpp:357: int32_t res = llm_chat_apply_template(detected_tmpl, chat_vec, formatted_chat, add_ass);
./src/llama-chat.cpp:225:int32_t llm_chat_apply_template(
So we only need to change:
./src/llama.cpp:357— change res to int64- Change the return type in
./src/llama-chat.cppand.hto int64_t.
First Bad Commit
No response