Releases: ggml-org/llama.cpp
Releases Β· ggml-org/llama.cpp
b7197
sycl : support to malloc memory on device more than 4GB, update the dβ¦
b7196
ggml: replace hwcap with riscv_hwprobe for RVV detection (#17567) Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
b7195
Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support (#16900) * vulkan: split mul_mmq_funcs for mul_mat_vecq use * add mxfp4 mmvq * add q2_k mmvq * add q3_k mmvq * add q4_k and q5_k mmvq * add q6_k mmvq * handle 4x4 quants per mmvq thread * enable MUL_MAT_ID mmvq support * enable subgroup optimizations for mul_mat_vec_id shaders * device tuning * request prealloc_y sync after quantization * fix indentation * fix llvmpipe test failures * fix mul_mat_id mmvq condition * fix unused variable warning
b7194
vulkan: improve topk perf for large k, fix overflow in unit tests (#1β¦
b7192
common : move all common_chat_parse_* to chat-parser.cpp. (#17481)
b7191
server: fix: /metrics endpoint returning JSON-escaped Prometheus formβ¦
b7190
ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in gβ¦
b7189
[MUSA] enable fp16/fast_fp16/bf16_mma on PH1 (#17551) * [MUSA] enable fp16/fast_fp16/bf16_mma on PH1 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Update ggml/src/ggml-cuda/fattn-vec.cuh Co-authored-by: Johannes GΓ€Γler <johannesg@5d6.de> * Update ggml/src/ggml-cuda/fattn-vec.cuh Co-authored-by: Johannes GΓ€Γler <johannesg@5d6.de> * Update ggml/src/ggml-cuda/fattn-tile.cuh Co-authored-by: Johannes GΓ€Γler <johannesg@5d6.de> * Address review comments Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Johannes GΓ€Γler <johannesg@5d6.de>
b7188
ggml-cuda: add stricter checking for fusion (#17568) * ggml-cuda: make conditions for fusion more explicit * ggml-cuda: remove size check as std::equal already does it
b7187
server : add Anthropic Messages API support (#17570) * server : add Anthropic Messages API support * remove -@pytest.mark.slow from tool calling/jinja tests * server : remove unused code and slow/skip on test_anthropic_vision_base64_with_multimodal_model in test_anthropic_api.py * server : removed redundant n field logic in anthropic_params_from_json * server : use single error object instead of error_array in streaming response handler for /v1/chat/completions and use unordered_set instead of set in to_json_anthropic_stream() * server : refactor Anthropic API to use OAI conversion * make sure basic test always go first * clean up * clean up api key check, add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>