🤖 feat: add 5% buffer between auto-compaction threshold and force-compaction #840
+100
−67
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
Auto-compaction runs after a response completes when context usage exceeds a user-configured threshold (default 70%). It summarizes the conversation to free up context space while preserving important information.
Force-compaction is a safety mechanism that triggers during streaming when context usage gets dangerously high. Unlike auto-compaction which waits for a natural break point, force-compaction interrupts the current response to prevent hitting the context window limit.
The Problem
Previously, force-compaction triggered based on a fixed token buffer from the context window limit. This meant force-compaction timing was disconnected from the user's configured auto-compaction threshold—changing your threshold didn't affect when force-compaction would kick in.
The Change
Force-compaction now triggers at threshold + 5%. With a 70% threshold, force-compaction happens at 75%.
This gives users a predictable buffer zone between when auto-compaction would run (after the response) and when force-compaction will run (during streaming). If a response slightly overshoots the threshold, the chat has some leeway—it won't immediately force-compact just because usage landed a bit over. This avoids unnecessary force-compactions when the stream ends and usage settles back within acceptable bounds.
The buffer is intentionally small (5%) to balance user control with safety margins as context approaches capacity.
UI Change
Also adds a "Force-compacting in N%" countdown during streaming when usage is in the buffer zone, so users know how much room remains before force-compaction triggers.
Generated with
mux