Is there a way to turn-off tool calling json unicode escape for llama-server? #17517

TkskKurumi · 2025-11-26T07:30:27Z

TkskKurumi
Nov 26, 2025

I'm using llama-server to serve the model, python openai-API for client and Chinese prompting.
(server-launching command-line, client python source code are attached at the end.)

In the following screenshots, shows generation for calling get_weather(city="北京"), and continuos multi-turn to ask other city weather.
In the first round, model generates normal "北京", but its response to client is converted to "\uxxxx".
The first-round tool-message containing "\uxxxx" feed as context for the second round.
Causing model in-context-learn to generate "\uxxxx" for tool argument later. This can lead to catastrophic accuracy drop, since its much more difficult for model to generate unicode-hex than actual Chinese token.

Generated tokens. Notice first round tool call is Chinese and later becomes "\u" escape.

Tool arguments in reponse is escaped.

Escaped arguments is in context.

Yes, I do have a work-around, just manually un-escape the arguments on the client side for later turns input. Despite simple, this is dirty I think. I want this behaviour directly configured at server side. Is there an option for llama-server to turn-off tool calling json unicode escape?

Thank you very much for your time and assistance. Any insights or suggestions you might have would be greatly appreciated.

Attachments:

# launching params
.\llama-server.exe --model "E:\LLM\Qwen3-VL-30B-A3B-Instruct\Qwen3-VL-30B-A3B-Instruct-UD-Q4_K_XL.gguf" `
                   --mmproj "E:\LLM\Qwen3-VL-30B-A3B-Instruct\mmproj-BF16.gguf" `
                   --ctx-size 32768 `
                   -ctk q4_0 `
                   -ctv q4_0 `
                   -fa on `
                   -ngl 99 `
                   --parallel 1 `
                   --threads 8 `
                   --host 0.0.0.0 `
                   --port 8090 `
                   --log-file ./serve0.log `
                   --verbose-prompt `
                   --jinja `
                   --device CUDA1 `
                   -ub 4096 `
                   -b 2048 `
                   --image-max-tokens 2560 `
                   --image-min-tokens 2048 `
                   -a "Qwen3-VL-30B-A3B-Instruct" `
                   --metrics `
                   -v

client.py
client-wa.py (work-around client)
serve0.log server logfile for the run.

aldehir · 2025-11-27T08:43:13Z

aldehir
Nov 27, 2025
Collaborator

There is no server side option to turn this off.

This is most likely a regression introduced in #16526 from setting ensure_ascii = true. I'll have to think about an alternative approach.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is there a way to turn-off tool calling json unicode escape for llama-server? #17517

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is there a way to turn-off tool calling json unicode escape for llama-server? #17517

Uh oh!

Uh oh!

TkskKurumi Nov 26, 2025

Replies: 1 comment

Uh oh!

aldehir Nov 27, 2025 Collaborator

TkskKurumi
Nov 26, 2025

aldehir
Nov 27, 2025
Collaborator