Misc. bug: Large tokens produce incorrect error messages due to integer overflow.

### Name and Version

./build/bin/llama-cli --version
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (AMD EPYC 9654 96-Core Processor)
load_backend: failed to find ggml_backend_init in /data/ylwang/Projects/llama.cpp/build/bin/libggml-cpu.so
version: 7139 (923ae3c61)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu

### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell
./llama-server -m 0.gguf
```

### Problem description & steps to reproduce

## PoC

```
import requests
a = '\n' * 2147483648
resp = requests.post(
    "http://localhost:8080/v1/chat/completions",
    json={
        "messages": [
             {"role": 'user', "content": a}
        ],
        "max_tokens": 20
    }
)
response = resp.json()
print(resp.json())
```

Running this Python file will reproduce the issue.

## Displayed Result

You can see that llama.server returns:

```
python3 1.py
{'error': {'code': 500, 'message': 'this custom template is not supported, try using --jinja', 'type': 'server_error'}}
```

This message says that the template is incorrect, but the real reason is that the token is too long, causing an integer overflow.

## gdb Debugging

```
Thread 6 "llama-server" hit Breakpoint 1, common_chat_templates_apply_legacy (tmpls=0x5030002a8c20, inputs=...) at /data/ylwang/Projects/llama.cpp/common/chat.cpp:3392
3392        int32_t res = llama_chat_apply_template(src.c_str(), chat.data(), chat.size(), inputs.add_generation_prompt, buf.data(), buf.size());
(gdb) p buf.size()
$2 = 2684354565
(gdb) p res
$3 = -2147483648
(gdb) n
3395        if (res < 0) {
(gdb) n
3398            throw std::runtime_error("this custom template is not supported, try using --jinja");
(gdb) n
```

## Root Cause Analysis

In:

```
int32_t res = llama_chat_apply_template(src.c_str(), chat.data(), chat.size(), inputs.add_generation_prompt, buf.data(), buf.size());
```

an int32_t is used to store the return value of llama_chat_apply_template:

```
    int32_t res = llm_chat_apply_template(detected_tmpl, chat_vec, formatted_chat, add_ass);
    ...
    return res;
```

Inside llm_chat_apply_template:

```
return dest.size();
```

The actual return value is dest.size(), whose type is size_t. This should actually be stored in int64_t. When a negative value is returned, it indicates an error.

## Fix Suggestion

For the function located at `./src/llama.cpp:336:int32_t llama_chat_apply_template(`:

```
int32_t llama_chat_apply_template(
                              const char * tmpl,
         const struct llama_chat_message * chat,
                                  size_t   n_msg,
                                    bool   add_ass,
                                    char * buf,
                                 int32_t   length) {
```

Two places need to be changed:

- Change the return type int32_t to int64_t, so that template errors and integer-overflow-induced template error misreports can be distinguished.
- Change `int32_t length` to `int64_t length`, because the argument passed in is buf.size(), which is actually an unsigned integer. Considering that in real-world cases it is unlikely to exceed int64, changing int32 to int64 allows negative values to be handled properly.

In `./common/chat.cpp:3392:    int32_t res = llama_chat_apply_template(src.c_str(), chat.data(), chat.size(), inputs.add_generation_prompt, buf.data(), buf.size());`

we should use int64_t to store the value of res, making an overflow very unlikely in this range.

For `./src/llama-chat.cpp:225:int32_t llm_chat_apply_template(`:

```
int32_t llm_chat_apply_template(
    llm_chat_template tmpl,
    const std::vector<const llama_chat_message *> & chat,
    std::string & dest, bool add_ass) {
```

Change its return type to int64_t.

### Use grep to ensure the fix does not affect other code

In a directory excluding tests and examples, run:

```
grep -Rn "llama_chat_apply_templae" .
```

The occurrences found are only:

```
./include/llama.h:1071:    /// NOTE: This function does not use a jinja parser. It only support a pre-defined list of template. See more: https://github.com/ggml-org/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template
./include/llama.h:1079:    LLAMA_API int32_t llama_chat_apply_template(
./common/chat.cpp:456:    const int res = llama_chat_apply_template(tmpl.c_str(), chat, 1, true, nullptr, 0);
./common/chat.cpp:3357:// Legacy template route (adhoc C++ implementation of known templates), forward to llama_chat_apply_template.
./common/chat.cpp:3392:    int32_t res = llama_chat_apply_template(src.c_str(), chat.data(), chat.size(), inputs.add_generation_prompt, buf.data(), buf.size());
./common/chat.cpp:3404:        res = llama_chat_apply_template(src.c_str(), chat.data(), chat.size(), inputs.add_generation_prompt, buf.data(), buf.size());
./src/llama.cpp:336:int32_t llama_chat_apply_template(
```

Here:

- `./include/llama.h` is the declaration
- `./common/chat.cpp` is where the res variable issue occurs
- `./src/llama.cpp:336` is where the function is defined

So we only need to modify the types declared in llama.h.

Using:

```
grep -Rn "llm_chat_apply_templae" .
```

The results are only:

```
./src/llama-chat.h:66:int32_t llm_chat_apply_template(
./src/llama.cpp:357:    int32_t res = llm_chat_apply_template(detected_tmpl, chat_vec, formatted_chat, add_ass);
./src/llama-chat.cpp:225:int32_t llm_chat_apply_template(
```

So we only need to change:

- `./src/llama.cpp:357` — change res to int64
- Change the return type in `./src/llama-chat.cpp` and `.h` to int64_t.

### First Bad Commit

_No response_

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Large tokens produce incorrect error messages due to integer overflow. #17463

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

PoC

Displayed Result

gdb Debugging

Root Cause Analysis

Fix Suggestion

Use grep to ensure the fix does not affect other code

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Large tokens produce incorrect error messages due to integer overflow. #17463

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

PoC

Displayed Result

gdb Debugging

Root Cause Analysis

Fix Suggestion

Use grep to ensure the fix does not affect other code

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions