-
Notifications
You must be signed in to change notification settings - Fork 162
Open
Description
What happened?
I was running the Aider Polyglot against Kimi K2 Thinking ubergarm smol_iq3_ks. Excellent results by the way!!! Anyways KV cache got full every now and again I had to manually restart ik_llama.cpp otherwise it wouldn't accept a new request. Probably when the Aider test sent a too long prompt full of compile errors it filed the context window and thus the kv cache and then ik_llama seems to have frozen up. I needed to exit our of the inference and reload the model from scratch to get it working again
Name and Version
./build/bin/llama-server --version
version: 4006 (da5de88)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
ERR [ update_slots] failed to decode the batch: KV cache is full - try increasing it via the context size | tid="134899983839232" timestamp=1763505182 i=0 n_batch=1 ret=1
ERR [ send_error] task error | tid="134899983839232" timestamp=1763505182 id_multi=-1 id_task=166865 error="Input prompt is too big compared to KV size. Please try increasing KV size."
INFO [ update_slots] slot released | tid="134899983839232" timestamp=1763505182 id_slot=0 id_task=166865 n_ctx=50176 n_past=1047 n_system_tokens=0 n_cache_tokens=1047 truncated=false
INFO [ update_slots] all slots are idle | tid="134899983839232" timestamp=1763505182
INFO [ log_server_request] request | tid="134866460991488" timestamp=1763505182 remote_addr="127.0.0.1" remote_port=36114 status=500 method="POST" path="/v1/chat/completions" params={}Metadata
Metadata
Assignees
Labels
No labels