docs(server): clarify that --ctx-size is total context divided among parallel slots #17767

kitaekatt · 2025-12-04T20:22:07Z

Summary

When using --parallel N, the --ctx-size value is the total context divided among all slots, not the per-slot context. This is a common source of confusion (see #11681, #5732).

Changes

Added clarification to two flags in tools/server/README.md:

--ctx-size: Added note explaining that when using --parallel N, this is the total context divided among all slots. Each slot gets ctx-size / parallel tokens. To allocate X tokens per slot with N parallel slots, set --ctx-size to X * N.

--parallel: Added note that the total context is divided equally among these slots.

Example

--ctx-size 4096 --parallel 4 → each slot gets 1024 tokens
To get 4096 tokens per slot with 4 parallel slots, use --ctx-size 16384 --parallel 4

Related Issues

Fixes Misc. bug: llama-server --ctx-size is divided by --parallel and cannot be increased? #11681 (ctx-size divided by parallel confusion)
Related to Context length documentation confusion #5732 (context length documentation confusion)

…parallel slots When using `--parallel N`, the `--ctx-size` value is the total context divided among all slots, not the per-slot context. This is a common source of confusion. For example: - `--ctx-size 4096 --parallel 4` → each slot gets 1024 tokens - To get 4096 tokens per slot with 4 parallel slots, use `--ctx-size 16384` Fixes ggml-org#11681

ngxson · 2025-12-04T20:37:16Z

this documentation is auto-generated, modify its source in arg.cpp instead

taronaeo · 2025-12-05T07:08:41Z

This is a common source of confusion (see #11681, #5732).

#17671 as well

kitaekatt requested review from ggerganov and ngxson as code owners December 4, 2025 20:22

loci-dev mentioned this pull request Dec 4, 2025

UPSTREAM PR #17767: docs(server): clarify that --ctx-size is total context divided among parallel slots auroralabs-loci/llama.cpp#439

Open

github-actions bot added examples server labels Dec 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(server): clarify that --ctx-size is total context divided among parallel slots #17767

docs(server): clarify that --ctx-size is total context divided among parallel slots #17767

kitaekatt commented Dec 4, 2025

Uh oh!

ngxson commented Dec 4, 2025

Uh oh!

taronaeo commented Dec 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

docs(server): clarify that --ctx-size is total context divided among parallel slots #17767

Are you sure you want to change the base?

docs(server): clarify that --ctx-size is total context divided among parallel slots #17767

Conversation

kitaekatt commented Dec 4, 2025

Summary

Changes

Example

Related Issues

Uh oh!

ngxson commented Dec 4, 2025

Uh oh!

taronaeo commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

taronaeo commented Dec 5, 2025 •

edited

Loading