-
Notifications
You must be signed in to change notification settings - Fork 5
Xsn/server improve msg diff #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughThis change refactors internal state management by removing public chat message tracking from Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings, 1 inconclusive)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai find the bug here |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 15968 🏁 Script executed: Length of output: 7522 @ngxson I found two bugs in this PR: 1. Critical: Use-after-move bug (tools/server/server-context.cpp, lines 2607-2608)tasks.push_back(std::move(task)); // line 2607
states.emplace_back(task.params.oaicompat_chat_syntax); // line 2608After Fix: Access the parameter before moving the task: states.emplace_back(task.params.oaicompat_chat_syntax);
tasks.push_back(std::move(task));2. Minor: Redundant ternary operator (tools/server/server-task.cpp, line 717)if (!new_msg.empty()) {
new_msg.set_tool_call_ids(generated_tool_call_ids, gen_tool_call_id);
chat_msg = new_msg;
diffs = common_chat_msg_diff::compute_diffs(msg_prv_copy, new_msg.empty() ? msg_prv_copy : new_msg);
}Inside the diffs = common_chat_msg_diff::compute_diffs(msg_prv_copy, new_msg);The first bug is critical and will cause unpredictable behavior at runtime! 🐰 🧠 Learnings used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tools/server/server-context.cpp (1)
607-608: Critical: Use-after-move undefined behavior.At line 607,
taskis moved into thetasksvector. Then at line 608,task.params.oaicompat_chat_syntaxis accessed from the moved-from object. Accessing members of a moved-from object is undefined behavior and can lead to unpredictable results.Apply this diff to fix the issue by capturing the value before the move:
+ auto chat_syntax = task.params.oaicompat_chat_syntax; tasks.push_back(std::move(task)); - states.emplace_back(task.params.oaicompat_chat_syntax); + states.emplace_back(chat_syntax);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
common/chat.cpp(2 hunks)common/chat.h(1 hunks)tools/server/server-context.cpp(6 hunks)tools/server/server-task.cpp(1 hunks)tools/server/server-task.h(6 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,h,hpp}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{cpp,h,hpp}: Always format C++ code usinggit clang-formatbefore committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary
Files:
common/chat.hcommon/chat.cpptools/server/server-task.htools/server/server-task.cpptools/server/server-context.cpp
**/*.{cpp,h,hpp,py}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible
Files:
common/chat.hcommon/chat.cpptools/server/server-task.htools/server/server-task.cpptools/server/server-context.cpp
🧠 Learnings (1)
📓 Common learnings
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
🧬 Code graph analysis (2)
tools/server/server-task.h (2)
tools/server/server-task.cpp (4)
update_chat_msg(703-720)update_chat_msg(703-706)update(1453-1493)update(1453-1453)common/chat-peg-parser.h (1)
content(14-14)
tools/server/server-task.cpp (3)
common/chat-parser.cpp (4)
common_chat_parse(1402-1485)common_chat_parse(1402-1402)common_chat_parse(1487-1509)common_chat_parse(1487-1487)tools/server/server-common.cpp (2)
gen_tool_call_id(83-85)gen_tool_call_id(83-83)common/chat.cpp (2)
compute_diffs(88-135)compute_diffs(88-88)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: ubuntu-24-cmake-vulkan
- GitHub Check: ubuntu-latest-cmake-cuda
- GitHub Check: ubuntu-24-cmake-vulkan-deb
- GitHub Check: ubuntu-24-cmake-webgpu
- GitHub Check: windows-latest-cmake-sycl
- GitHub Check: macOS-latest-cmake-visionos
- GitHub Check: ubuntu-latest-cmake-rpc
- GitHub Check: ubuntu-22-cmake-musa
- GitHub Check: ubuntu-cpu-cmake (arm64, ubuntu-22.04-arm)
- GitHub Check: android-build
- GitHub Check: ubuntu-latest-cmake-sanitizer (THREAD, Debug)
- GitHub Check: ubuntu-22-cmake-hip
- GitHub Check: ubuntu-latest-cmake-sanitizer (ADDRESS, Debug)
- GitHub Check: ubuntu-latest-cmake-sanitizer (UNDEFINED, Debug)
- GitHub Check: macOS-latest-cmake-arm64
- GitHub Check: macOS-latest-cmake-arm64-webgpu
- GitHub Check: ubuntu-latest-llguidance
- GitHub Check: macOS-latest-cmake-x64
- GitHub Check: server-windows
🔇 Additional comments (6)
common/chat.cpp (1)
88-134: LGTM - Parameter renaming is consistent throughout.The rename from
previous_msg/new_msgtomsg_prv/msg_newis applied consistently. The diff computation logic remains correct: it compares reasoning content, content, and tool calls between the previous and new message states.common/chat.h (1)
80-80: LGTM - Header matches implementation.The function signature declaration correctly mirrors the implementation in
chat.cpp.tools/server/server-task.h (4)
164-181: LGTM - Well-structured state tracking for streaming.The
task_result_statestruct cleanly encapsulates per-task streaming state (diffs, chat message, generated text, tool call IDs). The constructor properly initializes the syntax member.
197-199: LGTM - Virtual update hook enables polymorphic state updates.The default no-op implementation is appropriate for result types that don't need state updates.
272-274: LGTM - Final result update correctly passesis_partial=false.The final result correctly updates the state with the complete content and populates both
oaicompat_msgandoaicompat_msg_diffs.
321-323: LGTM - Partial result update correctly passesis_partial=true.The partial result correctly updates only the diffs, allowing incremental streaming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
tools/server/server-context.cpp (1)
2659-2659: State update before JSON conversion is correct.The pattern of updating state before JSON conversion is consistent with the non-streaming path. The same bounds safety consideration from the previous comment applies here.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tools/server/server-context.cpp(6 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,h,hpp}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{cpp,h,hpp}: Always format C++ code usinggit clang-formatbefore committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary
Files:
tools/server/server-context.cpp
**/*.{cpp,h,hpp,py}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible
Files:
tools/server/server-context.cpp
🧠 Learnings (1)
📓 Common learnings
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
🧬 Code graph analysis (1)
tools/server/server-context.cpp (1)
tools/server/server-common.cpp (6)
format_anthropic_sse(1485-1506)format_anthropic_sse(1485-1485)format_oai_sse(1466-1483)format_oai_sse(1466-1466)format_error_response(16-58)format_error_response(16-16)
🔇 Additional comments (2)
tools/server/server-context.cpp (2)
1293-1300: LGTM! Clear handling of streaming vs non-streaming modes.The conditional logic correctly differentiates between streaming and non-streaming modes. In streaming mode, content and tokens have already been sent in partial chunks, so they're appropriately cleared here. The comment clarifies the intent well.
2672-2750: Excellent refactoring with improved error handling and state management.The streaming lambda has been significantly improved:
- State ownership: Moving
statesinto the lambda (line 2672) correctly transfers ownership for the lifetime of the streaming response- Error handling: The try-catch block (lines 2684-2749) gracefully handles exceptions and sends properly formatted SSE error responses
- Consistent state updates: State is updated before JSON conversion (line 2731), maintaining consistency with non-streaming path
- Clear control flow: Early returns with debug logging make the logic easy to follow
- Format-aware errors: The
format_errorlambda respectsres_typefor Anthropic vs OAI compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tools/server/server-task.cpp (1)
704-721: Redundant ternary incompute_diffscallInside
if (!new_msg.empty()), the expressiondiffs = common_chat_msg_diff::compute_diffs(msg_prv_copy, new_msg.empty() ? msg_prv_copy : new_msg);always selects
new_msg, so the ternary adds noise without changing behavior. You can simplify to:- diffs = common_chat_msg_diff::compute_diffs(msg_prv_copy, new_msg.empty() ? msg_prv_copy : new_msg); + diffs = common_chat_msg_diff::compute_diffs(msg_prv_copy, new_msg);
🧹 Nitpick comments (1)
tools/server/server-task.h (1)
164-181: task_result_state design is sound;diffsmember appears unusedThe state container (syntax, accumulated text, chat_msg, tool-call IDs) matches how
update_chat_msgis used later. Thediffsmember in the struct itself is not referenced anywhere; since all callers pass an externalstd::vector<common_chat_msg_diff> &, you can drop the member to avoid confusion unless you plan to use it shortly.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
tools/server/server-context.cpp(5 hunks)tools/server/server-queue.cpp(2 hunks)tools/server/server-queue.h(3 hunks)tools/server/server-task.cpp(5 hunks)tools/server/server-task.h(6 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,h,hpp}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{cpp,h,hpp}: Always format C++ code usinggit clang-formatbefore committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary
Files:
tools/server/server-queue.cpptools/server/server-task.cpptools/server/server-context.cpptools/server/server-task.htools/server/server-queue.h
**/*.{cpp,h,hpp,py}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible
Files:
tools/server/server-queue.cpptools/server/server-task.cpptools/server/server-context.cpptools/server/server-task.htools/server/server-queue.h
🧠 Learnings (1)
📓 Common learnings
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
🧬 Code graph analysis (4)
tools/server/server-task.cpp (3)
common/chat-parser.cpp (4)
common_chat_parse(1402-1485)common_chat_parse(1402-1402)common_chat_parse(1487-1509)common_chat_parse(1487-1487)tools/server/server-common.cpp (2)
gen_tool_call_id(83-85)gen_tool_call_id(83-83)common/chat.cpp (2)
compute_diffs(88-135)compute_diffs(88-88)
tools/server/server-context.cpp (1)
tools/server/server-common.cpp (6)
format_anthropic_sse(1485-1506)format_anthropic_sse(1485-1485)format_oai_sse(1466-1483)format_oai_sse(1466-1466)format_error_response(16-58)format_error_response(16-16)
tools/server/server-task.h (1)
tools/server/server-task.cpp (4)
update_chat_msg(704-721)update_chat_msg(704-707)update(1455-1495)update(1455-1455)
tools/server/server-queue.h (1)
tools/server/server-queue.cpp (2)
set_states(274-276)set_states(274-274)
🔇 Additional comments (11)
tools/server/server-task.h (3)
197-199: Baseupdate(task_result_state &)no-op is appropriateLeaving the base implementation empty and documenting that only completion result types use it keeps the hierarchy flexible without impacting other result kinds (embeddings, metrics, etc.). No changes needed here.
258-277: Final completion resultupdate()correctly wires per-task stateUsing
task_result_state::update_chat_msg(content, /*is_partial=*/false, oaicompat_msg_diffs)and caching intooaicompat_msgplus anis_updatedflag matches the laterGGML_ASSERTcontract into_json(). This ensures all JSON paths have an up-to-date parsed message and diffs before serialization.
311-327: Partial completion resultupdate()correctly accumulates streaming diffsMarking
is_updatedand routing each chunk’scontentthroughstate.update_chat_msg(content, /*is_partial=*/true, oaicompat_msg_diffs)matches the intended “diffs per chunk” design and keeps state ownership outside the result object.tools/server/server-queue.cpp (1)
274-276: State application viaset_statesandnext()is consistent and guardedStoring states by move and, in
next(), mappingresult->get_index()intostates[idx]withGGML_ASSERT(idx >= 0 && idx < states.size())gives a clear 1:1 task-to-state contract and fail-fast behavior on mismatch, while remaining a no-op whenstatesis empty (embeddings, rerank, metrics, etc.). This is a clean integration point for the new per-task state mechanism.Also applies to: 305-310
tools/server/server-queue.h (3)
10-11: Queue-level comments improve discoverabilityThe brief description that
server_queueis normally used viaserver_response_readermakes the intended usage clear and matches how routes are structured.
72-73: server_response documentation matches actual usageClarifying that
server_response_readeris the typical consumer forserver_responsealigns with the implementation and keeps the lower-level API clearly marked as internal-ish.
127-130: Per-taskstatesstorage on reader is well-scopedTracking
std::vector<task_result_state> statesonly onserver_response_reader, with a dedicatedset_statesentry point and limited to “streaming completions” via comments, keeps the new stateful behavior localized and avoids pollutingserver_queue/server_responsewith extra coupling.Also applies to: 138-138
tools/server/server-task.cpp (2)
567-581:to_json()now requires priorupdate(); contract is satisfied by current callersAdding
GGML_ASSERT(is_updated && "update() must be called before to_json()")to both final and partial completion results enforces that per-task state has been applied. Given that all HTTP completion paths now go throughserver_response_reader::next(), which callsresult->update(state)before anyto_json()call, this is consistent and should be safe for current usages.Also applies to: 978-992
583-605: Always includingcontent/tokens/textin JSON matches PR goalNon-OAI and OAI completion JSON now unconditionally include
content,tokens, andtextfields irrespective of streaming mode. This simplifies clients that previously had to special-case stream vs non-stream payloads and aligns with the PR’s “JSON always includes content/tokens” objective.Also applies to: 608-650
tools/server/server-context.cpp (2)
577-596: Per-task state construction fixes prior use-after-move and aligns indicesBuilding
std::vector<task_result_state> states;alongsidetasks, reserving toinputs.size(), and pushingtask.params.oaicompat_chat_syntaxintostatesbeforetasks.push_back(std::move(task));both removes the earlier use-after-move bug and guarantees thatstates[i]corresponds totask.index == i. Passing the whole vector intord.set_states(std::move(states));cleanly wires this intoserver_response_reader.Also applies to: 613-619
661-682: Streaming SSE path correctly composes with stateful results and error handlingFor streaming completions, you now:
- Obtain
first_resultviard.next(should_stop)(which applies state), then serialize once tofirst_result_json.- Seed the HTTP response body with formatted SSE from that JSON.
- Use
res->nextto flush the initial chunk, then repeatedly callrd.next(should_stop)to get subsequent results, handlingshould_stop(), errors (formatted viaformat_error), and protocol-specific termination ([DONE]for OpenAI-style, final events for Anthropic).This keeps the transport layer oblivious to result internals while respecting the new
update()/to_json()contract and providing clear termination semantics.Also applies to: 671-748
aad20a6 to
1c30c28
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tools/server/server-task.cpp (1)
704-721: Redundant ternary inupdate_chat_msg()can be simplified.Inside
if (!new_msg.empty()), the expressiondiffs = common_chat_msg_diff::compute_diffs(msg_prv_copy, new_msg.empty() ? msg_prv_copy : new_msg);always passes
new_msgas the second argument, sincenew_msg.empty()is already known to be false. This can be simplified to:- diffs = common_chat_msg_diff::compute_diffs(msg_prv_copy, new_msg.empty() ? msg_prv_copy : new_msg); + diffs = common_chat_msg_diff::compute_diffs(msg_prv_copy, new_msg);to remove dead conditional logic.
🧹 Nitpick comments (5)
tools/server/server-queue.cpp (1)
274-276: State application wiring looks correct; document/update the coupling to get_index()/update().The new
set_states()+next()logic cleanly applies per-indextask_result_stateto results and defensively asserts the index. The only thing worth noting is the implicit contract: oncestatesis non-empty for a givenserver_response_reader, every non-error result it sees must (a) implementget_index()consistently with the task indices used to buildstates, and (b) provide a sensibleupdate(task_result_state &)implementation. It may be helpful to mention this in a comment nearset_states()or intask_result_state’s docs so future result types don’t accidentally violate the assumption.Also applies to: 305-310
tools/server/server-task.cpp (1)
568-581: Assert beforeto_json()is fine; ensure all call sites go throughupdate().The
GGML_ASSERT(is_updated && "update() must be called before to_json()");guards are a good invariant forcmpl_finalandcmpl_partial, givenupdate()is now the only place that wires intask_result_state(chat msg + diffs). Please double‑check (or grep) that no code path constructs these result types and callsto_json()without going throughserver_response_reader::next()(or otherwise callingupdate()), especially any older/direct uses ofqueue_results.recv(...)that might predateserver_response_reader.Also applies to: 979-992
tools/server/server-context.cpp (1)
1287-1300: Stream-modecmpl_finalnow omits repeated content/tokens; verify client expectations.For streaming requests you now zero out
res->contentandres->tokensinsend_final_response, relying on the last partial chunk to have already carried the final text/tokens. This avoids duplication but changes the shape of the last native-format response:content/tokenswill now be present but empty instead of repeating data. Please confirm this matches how existing clients consume the stream (especially any that were only looking at the final chunk).tools/server/server-task.h (2)
164-181:task_result_stateencapsulation aligns well with streaming needs.Bundling
diffs,oaicompat_chat_syntax,chat_msg,generated_text, andgenerated_tool_call_idsintotask_result_stategives a clear per-task home for streaming state. The non-explicit ctor fromcommon_chat_syntaxis what enables the convenientstates.push_back(task.params.oaicompat_chat_syntax);call inserver-context.cpp, which is reasonable here—just be aware that it’s an implicit conversion point if more constructors are ever added.
183-202: Resultupdate(task_result_state &)hook is consistent; rely on it for all chat/diff wiring.Adding a virtual
update(task_result_state &)toserver_task_resultwith a no-op default, and overriding it only inserver_task_result_cmpl_finaland_partial, keeps non-chat results untouched while giving a clean extension point for chat state/diffs. The new members:
oaicompat_msg,oaicompat_msg_diffs,is_updatedincmpl_final.oaicompat_msg_diffs,is_updatedincmpl_partial.combined with the corresponding overrides:
cmpl_final::update: oaicompat_msg = state.update_chat_msg(content, false, oaicompat_msg_diffs); cmpl_partial::update: state.update_chat_msg(content, true, oaicompat_msg_diffs);form a coherent contract for
server_response_reader::next()to invoke once per result before anyto_json()calls. This design looks solid; just ensure any future result types that participate in streaming diffs also overrideupdate()instead of duplicating parsing logic.Also applies to: 228-262, 291-328
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
tools/server/server-context.cpp(5 hunks)tools/server/server-queue.cpp(2 hunks)tools/server/server-queue.h(3 hunks)tools/server/server-task.cpp(5 hunks)tools/server/server-task.h(6 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,h,hpp}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{cpp,h,hpp}: Always format C++ code usinggit clang-formatbefore committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary
Files:
tools/server/server-task.cpptools/server/server-task.htools/server/server-queue.cpptools/server/server-queue.htools/server/server-context.cpp
**/*.{cpp,h,hpp,py}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible
Files:
tools/server/server-task.cpptools/server/server-task.htools/server/server-queue.cpptools/server/server-queue.htools/server/server-context.cpp
🧠 Learnings (1)
📓 Common learnings
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
🧬 Code graph analysis (3)
tools/server/server-task.cpp (3)
common/chat-parser.cpp (4)
common_chat_parse(1402-1485)common_chat_parse(1402-1402)common_chat_parse(1487-1509)common_chat_parse(1487-1487)tools/server/server-common.cpp (2)
gen_tool_call_id(83-85)gen_tool_call_id(83-83)common/chat.cpp (2)
compute_diffs(88-135)compute_diffs(88-88)
tools/server/server-task.h (2)
tools/server/server-task.cpp (4)
update_chat_msg(704-721)update_chat_msg(704-707)update(1455-1495)update(1455-1455)tools/server/server-context.cpp (1)
state(241-243)
tools/server/server-context.cpp (1)
tools/server/server-common.cpp (6)
format_anthropic_sse(1485-1506)format_anthropic_sse(1485-1485)format_oai_sse(1466-1483)format_oai_sse(1466-1466)format_error_response(16-58)format_error_response(16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: server-windows
- GitHub Check: check-requirements
- GitHub Check: pyright type-check
🔇 Additional comments (3)
tools/server/server-context.cpp (2)
2577-2619: Per-task state setup and use-after-move fix look correct.The new
statesvector oftask_result_stateis built in lockstep withtasks:
tasks.reserve(inputs.size());/states.reserve(inputs.size());- Within the loop:
task.index = i; ... states.push_back(task.params.oaicompat_chat_syntax); tasks.push_back(std::move(task));- Afterward:
rd.set_states(std::move(states));.This both fixes the prior use-after-move (state captured before
std::move(task)) and ensuresresult->get_index()maps cleanly intostates[idx]inserver_response_reader::next(). Just keep in mind that if you ever reusehandle_completions_implfor non-chat responses but still callset_states(), those result types must implementget_index()/update()consistently with this indexing.
2662-2747: Streaming lambda refactor improves robustness; capture lifetime is acceptable.The updated
nextlambda:
- Flushes the precomputed first chunk once from
res_this->data.- Uses
rd.has_next()+rd.next(should_stop)to drive streaming, honoringshould_stop()both before and during result retrieval.- Emits SSE-formatted error events via the local
format_errorhelper for both OpenAI-like and Anthropic streams.- Catches exceptions, maps them to
format_error_response(...), and terminates the stream cleanly.The capture of
&should_stopis okay as long as the underlyingserver_http_req::should_stopoutlives the generator (which matches the existing pattern). Overall this is a solid incremental hardening of the streaming path.tools/server/server-queue.h (1)
10-12: Header-level design and documentation for queues/responses look good.The header changes clearly separate responsibilities:
server_queuefor task management.server_responsefor result fan-in.server_response_readeras the recommended high-level API, now with an explicitstatesvector andset_states(...)for streaming chat state.The comments around “only used by streaming completions” help constrain expectations; implementation in the
.cppfile aligns with these declarations.Also applies to: 72-114, 127-139
Make sure to read the contributing guidelines before submitting a PR
Summary by CodeRabbit
Bug Fixes
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.