Bug: Inference hangs when kv cache gets full

### What happened?

I was running the Aider Polyglot against Kimi K2 Thinking ubergarm smol_iq3_ks. Excellent results by the way!!! Anyways KV cache got full every now and again I had to manually restart ik_llama.cpp otherwise it wouldn't accept a new request. Probably when the Aider test sent a too long prompt full of compile errors it filed the context window and thus the kv cache and then ik_llama seems to have frozen up. I needed to exit our of the inference and reload the model from scratch to get it working again

### Name and Version

./build/bin/llama-server --version
version: 4006 (da5de880)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

### What operating system are you seeing the problem on?

Linux

### Relevant log output

```shell
ERR [            update_slots] failed to decode the batch: KV cache is full - try increasing it via the context size | tid="134899983839232" timestamp=1763505182 i=0 n_batch=1 ret=1
 ERR [              send_error] task error | tid="134899983839232" timestamp=1763505182 id_multi=-1 id_task=166865 error="Input prompt is too big compared to KV size. Please try increasing KV size."
INFO [            update_slots] slot released | tid="134899983839232" timestamp=1763505182 id_slot=0 id_task=166865 n_ctx=50176 n_past=1047 n_system_tokens=0 n_cache_tokens=1047 truncated=false
INFO [            update_slots] all slots are idle | tid="134899983839232" timestamp=1763505182
INFO [      log_server_request] request | tid="134866460991488" timestamp=1763505182 remote_addr="127.0.0.1" remote_port=36114 status=500 method="POST" path="/v1/chat/completions" params={}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: Inference hangs when kv cache gets full #982

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: Inference hangs when kv cache gets full #982

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions