Releases · ggml-org/llama.cpp

29 Nov 21:41

ab49f09

b7200

server: move server-context to its own cpp|h (#17595)

* git mv

* add server-context.h

* add server-context.h

* clean up headers

* cont : cleanup

* also expose server_response_reader (to be used by CLI)

* fix windows build

* decouple server_routes and server_http

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Assets 20

29 Nov 18:37

github-actions

b7199

8c32d9d

b7199

server: explicitly set the function name in lambda (#17538)

As [1] explained, the real debug message will be like:
	"res    operator(): operator() : queue result stop"

Set the name explicitly, the message is easy for debugging:
	"res    operator(): recv : queue result stop"

The left "operator()" is generated by 'RES_DBG() ... __func__'

[1]: https://clang.llvm.org/extra/clang-tidy/checks/bugprone/lambda-function-name.html

Signed-off-by: Haiyue Wang <haiyuewa@163.com>

Assets 20

29 Nov 16:57

github-actions

b7198

0874693

b7198

common : fix json schema with '\' in literals (#17307)

* Fix json schema with '\' in literals

* Add "literal string with escapes" test

Assets 20

29 Nov 14:27

github-actions

b7197

7d2add5

b7197

sycl : support to malloc memory on device more than 4GB, update the d…

Assets 20

29 Nov 14:23

github-actions

b7196

f698a79

b7196

ggml: replace hwcap with riscv_hwprobe for RVV detection (#17567)

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>

Assets 20

29 Nov 09:01

github-actions

b7195

47a268e

b7195

Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support (#16900)

* vulkan: split mul_mmq_funcs for mul_mat_vecq use

* add mxfp4 mmvq

* add q2_k mmvq

* add q3_k mmvq

* add q4_k and q5_k mmvq

* add q6_k mmvq

* handle 4x4 quants per mmvq thread

* enable MUL_MAT_ID mmvq support

* enable subgroup optimizations for mul_mat_vec_id shaders

* device tuning

* request prealloc_y sync after quantization

* fix indentation

* fix llvmpipe test failures

* fix mul_mat_id mmvq condition

* fix unused variable warning

Assets 20

29 Nov 08:30

github-actions

b7194

59d8d4e

b7194

vulkan: improve topk perf for large k, fix overflow in unit tests (#1…

Assets 20

28 Nov 20:21

github-actions

b7192

03914c7

b7192

common : move all common_chat_parse_* to chat-parser.cpp. (#17481)

Assets 20

28 Nov 19:57

github-actions

b7191

3ce7a65

b7191

server: fix: /metrics endpoint returning JSON-escaped Prometheus form…

Assets 20

Releases: ggml-org/llama.cpp

b7200

Uh oh!

b7199

Uh oh!

b7198

Uh oh!

b7197

Uh oh!

b7196

Uh oh!

b7195

Uh oh!

b7194

Uh oh!

b7192

Uh oh!

b7191

Uh oh!