-
Notifications
You must be signed in to change notification settings - Fork 13.8k
Open
Labels
low severityUsed to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)
Description
Name and Version
latest master (96ac5a2)
Operating systems
No response
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
Try to run any model using llama-server. Documentation says that --kv-unified arg should enable unified cache and it shouldn't be enabled by default. But it's enabled by default and I didn't find a way to disable it.
First Bad Commit
probably this one? #16736
Relevant log output
llama_context: constructing llama_context
llama_context: n_seq_max = 4
llama_context: n_ctx = 131072
llama_context: n_ctx_seq = 131072
llama_context: n_batch = 2048
llama_context: n_ubatch = 2048
llama_context: causal_attn = 1
llama_context: flash_attn = enabled
llama_context: kv_unified = true
llama_context: freq_base = 150000.0
llama_context: freq_scale = 0.03125Metadata
Metadata
Assignees
Labels
low severityUsed to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)