Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b7200
server: move server-context to its own cpp|h (#17595) * git mv * add server-context.h * add server-context.h * clean up headers * cont : cleanup * also expose server_response_reader (to be used by CLI) * fix windows build * decouple server_routes and server_http --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b7199
server: explicitly set the function name in lambda (#17538) As [1] explained, the real debug message will be like: "res operator(): operator() : queue result stop" Set the name explicitly, the message is easy for debugging: "res operator(): recv : queue result stop" The left "operator()" is generated by 'RES_DBG() ... __func__' [1]: https://clang.llvm.org/extra/clang-tidy/checks/bugprone/lambda-function-name.html Signed-off-by: Haiyue Wang <haiyuewa@163.com>
b7198
common : fix json schema with '\' in literals (#17307) * Fix json schema with '\' in literals * Add "literal string with escapes" test
b7197
sycl : support to malloc memory on device more than 4GB, update the d…
b7196
ggml: replace hwcap with riscv_hwprobe for RVV detection (#17567) Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
b7195
Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support (#16900) * vulkan: split mul_mmq_funcs for mul_mat_vecq use * add mxfp4 mmvq * add q2_k mmvq * add q3_k mmvq * add q4_k and q5_k mmvq * add q6_k mmvq * handle 4x4 quants per mmvq thread * enable MUL_MAT_ID mmvq support * enable subgroup optimizations for mul_mat_vec_id shaders * device tuning * request prealloc_y sync after quantization * fix indentation * fix llvmpipe test failures * fix mul_mat_id mmvq condition * fix unused variable warning
b7194
vulkan: improve topk perf for large k, fix overflow in unit tests (#1…
b7192
common : move all common_chat_parse_* to chat-parser.cpp. (#17481)
b7191
server: fix: /metrics endpoint returning JSON-escaped Prometheus form…