Skip to content

Misc. bug: Can't use longer context than model via RoPE due to server-imposed restrictions #17459

@woof-dog

Description

@woof-dog

Name and Version

version: 716 (10e9780)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server -m ./model/with/4096context -c 16384 --rope-scaling yarn --rope-scale 4

Problem description & steps to reproduce

The llama-server does not allow using the extended context. Server output announces the context length is capped

the slot context (%d) exceeds the training context of the model (%d) - capping\n

This is not aware of RoPE settings or other user configurations which would allow for longer context. Disabling this check in llama-server introduced in cd5e3b5 allowed me to use longer context via the RoPE settings.

This "capping" forces the model to load with 4,096 tokens of context and causes my long-context queries to fail.

Please allow us to override this cap. For users who don't know what they are doing, sure, it's probably helpful, but for advanced users there should be a way to disable it.

First Bad Commit

cd5e3b5

Relevant log output

the slot context (%d) exceeds the training context of the model (%d) - capping\n

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions