docs(server): clarify that --ctx-size is total context divided among parallel slots #17767
+2
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
When using
--parallel N, the--ctx-sizevalue is the total context divided among all slots, not the per-slot context. This is a common source of confusion (see #11681, #5732).Changes
Added clarification to two flags in
tools/server/README.md:--ctx-size: Added note explaining that when using--parallel N, this is the total context divided among all slots. Each slot getsctx-size / paralleltokens. To allocate X tokens per slot with N parallel slots, set--ctx-sizetoX * N.--parallel: Added note that the total context is divided equally among these slots.Example
--ctx-size 4096 --parallel 4→ each slot gets 1024 tokens--ctx-size 16384 --parallel 4Related Issues
--ctx-sizeis divided by--paralleland cannot be increased? #11681 (ctx-size divided by parallel confusion)