Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b7205
ggml: fix: macOS build with `-DGGML_BACKEND_DL=ON` (#17581)
b7204
common: update env var name (#17588)
b7203
CUDA: add stream-based concurrency (#16991) * CUDA: add stream-based concurrency * HIP: fix hipStreamWaitEvent define and nodiscard warnings * ggml-cuda: fix fusion inside stream * ggml-cuda: fix bug w.r.t first stream launch * ggml-cuda: format * ggml-cuda: improve assert message * ggml-cuda: use lambda instead of duplicating code * ggml-cuda: add some more comments * ggml-cuda: add more detailed comments about concurrency * ggml-cuda: rename + remove unused var * ggml-cuda: fix condition for stream launch * ggml-cuda: address review comments, add destructor * common.cuh: add is_valid for concurrent events * common.cuh: make comment better * update comment Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * update comment Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * common.cuh: fix lower_bound condition + remove join_node data from write_ranges * ggml-cuda: fix overlap condition + shadowing parameter --------- Co-authored-by: Carl Philipp Klemm <carl@uvos.xyz> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
b7202
cuda : add error checking for cudaMemcpyAsync in argsort (#17599) * cuda : add error checking for cudaMemcpyAsync in argsort (#12836) * fix indentation
b7201
vulkan : fix FA mask load with bounds check (coopmat2) (#17606)
b7200
server: move server-context to its own cpp|h (#17595) * git mv * add server-context.h * add server-context.h * clean up headers * cont : cleanup * also expose server_response_reader (to be used by CLI) * fix windows build * decouple server_routes and server_http --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b7199
server: explicitly set the function name in lambda (#17538) As [1] explained, the real debug message will be like: "res operator(): operator() : queue result stop" Set the name explicitly, the message is easy for debugging: "res operator(): recv : queue result stop" The left "operator()" is generated by 'RES_DBG() ... __func__' [1]: https://clang.llvm.org/extra/clang-tidy/checks/bugprone/lambda-function-name.html Signed-off-by: Haiyue Wang <haiyuewa@163.com>
b7198
common : fix json schema with '\' in literals (#17307) * Fix json schema with '\' in literals * Add "literal string with escapes" test