Skip to content

Releases: ggml-org/llama.cpp

b7197

29 Nov 14:27
7d2add5

Choose a tag to compare

sycl : support to malloc memory on device more than 4GB, update the d…

b7196

29 Nov 14:23
f698a79

Choose a tag to compare

ggml: replace hwcap with riscv_hwprobe for RVV detection (#17567)

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>

b7195

29 Nov 09:01
47a268e

Choose a tag to compare

Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support (#16900)

* vulkan: split mul_mmq_funcs for mul_mat_vecq use

* add mxfp4 mmvq

* add q2_k mmvq

* add q3_k mmvq

* add q4_k and q5_k mmvq

* add q6_k mmvq

* handle 4x4 quants per mmvq thread

* enable MUL_MAT_ID mmvq support

* enable subgroup optimizations for mul_mat_vec_id shaders

* device tuning

* request prealloc_y sync after quantization

* fix indentation

* fix llvmpipe test failures

* fix mul_mat_id mmvq condition

* fix unused variable warning

b7194

29 Nov 08:30
59d8d4e

Choose a tag to compare

vulkan: improve topk perf for large k, fix overflow in unit tests (#1…

b7192

28 Nov 20:21
03914c7

Choose a tag to compare

common : move all common_chat_parse_* to chat-parser.cpp. (#17481)

b7191

28 Nov 19:57
3ce7a65

Choose a tag to compare

server: fix: /metrics endpoint returning JSON-escaped Prometheus form…

b7190

28 Nov 18:39
e072b20

Choose a tag to compare

ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in g…

b7189

28 Nov 17:42
c6f7a42

Choose a tag to compare

[MUSA] enable fp16/fast_fp16/bf16_mma on PH1 (#17551)

* [MUSA] enable fp16/fast_fp16/bf16_mma on PH1

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Update ggml/src/ggml-cuda/fattn-vec.cuh

Co-authored-by: Johannes GÀßler <johannesg@5d6.de>

* Update ggml/src/ggml-cuda/fattn-vec.cuh

Co-authored-by: Johannes GÀßler <johannesg@5d6.de>

* Update ggml/src/ggml-cuda/fattn-tile.cuh

Co-authored-by: Johannes GÀßler <johannesg@5d6.de>

* Address review comments

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Johannes GÀßler <johannesg@5d6.de>

b7188

28 Nov 15:19
2e7ef98

Choose a tag to compare

ggml-cuda: add stricter checking for fusion (#17568)

* ggml-cuda: make conditions for fusion more explicit

* ggml-cuda: remove size check as std::equal already does it

b7187

28 Nov 14:09
ddf9f94

Choose a tag to compare

server : add Anthropic Messages API support (#17570)

* server : add Anthropic Messages API support

* remove -@pytest.mark.slow from tool calling/jinja tests

* server : remove unused code and slow/skip on test_anthropic_vision_base64_with_multimodal_model in test_anthropic_api.py

* server : removed redundant n field logic in anthropic_params_from_json

* server : use single error object instead of error_array in streaming response handler for /v1/chat/completions and use unordered_set instead of set in to_json_anthropic_stream()

* server : refactor Anthropic API to use OAI conversion

* make sure basic test always go first

* clean up

* clean up api key check, add test

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>