Releases · ggml-org/llama.cpp

27 Nov 00:59

e509411

b7170 Latest

Latest

server: enable jinja by default, update docs (#17524)

* server: enable jinja by default, update docs

* fix tests

Assets 20

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-11-27T00:59:24Z
llama-b7170-bin-310p-openEuler-aarch64.zip

sha256:ed09f3a0d1172a9bc715c06d6d3956304f7478c09054f4d4b86d3f783da09c4b

32.7 MB 2025-11-27T00:59:45Z
llama-b7170-bin-310p-openEuler-x86.zip

sha256:8236a168f2cf08449cc5c354f02c628e740b6b447d0381b08c99605517c3c1c8

34.4 MB 2025-11-27T00:59:47Z
llama-b7170-bin-910b-openEuler-aarch64.zip

sha256:a537a6cf118f27c6e8f7e225ca03061b1e9a2b891276d895c030d79918fdc8fc

32.7 MB 2025-11-27T00:59:49Z
llama-b7170-bin-910b-openEuler-x86.zip

sha256:3e88e5dd2647ac59f3153b592b47748190a00ea3bbe2671069769c6b10ba4347

34.4 MB 2025-11-27T00:59:51Z
llama-b7170-bin-macos-arm64.zip

sha256:908f9e77cdac3adc8bacd2c85c6ac66139742db4b356394ec98676eb645a4257

15.1 MB 2025-11-27T00:59:53Z
llama-b7170-bin-macos-x64.zip

sha256:fcae5cdb3944fffef5150e4db0882a3cc5ae0f0481099ea84b16c6a3c01c2ac9

33.9 MB 2025-11-27T00:59:55Z
llama-b7170-bin-ubuntu-s390x.zip

sha256:428163d6a40f9a6c37d043e6340ada4246ae25a854bf5a27ce334dbab852e442

17.7 MB 2025-11-27T00:59:57Z
llama-b7170-bin-ubuntu-vulkan-x64.zip

sha256:8425bb1f505c6fabf19759a310f5d48779f1385572d04aa7cddc3d186581ce1d

30.8 MB 2025-11-27T00:59:58Z
llama-b7170-bin-ubuntu-x64.zip

sha256:25342b1c43ee2b2c9949badf55fc4032386d60e5b72cdc61a727a344011096b1

16.9 MB 2025-11-27T01:00:00Z
Source code (zip)

2025-11-27T00:02:50Z
Source code (tar.gz)

2025-11-27T00:02:50Z

26 Nov 22:56

github-actions

b7169

7cba58b

b7169

opencl: add sqr, sqrt, mean and ssm_conv (#17476)

* opencl: add sqr

* opencl: add sqrt

* opencl: add mean

* opencl: add ssm_conv

* opencl: add missing cl_khr_fp16

* opencl: do sqrt in f32 then convert to f16 for better precision

Assets 20

26 Nov 22:32

github-actions

b7168

5449367

b7168

Fix chunks being too small with small matrix sizes (#17526)

Assets 20

26 Nov 21:22

github-actions

b7167

1d594c2

b7167

clip: (minicpmv) fix resampler kq_scale (#17516)

* debug:"solve minicpmv precision problem"

* “debug minicpmv”

* Apply suggestion from @ngxson

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

Assets 20

26 Nov 18:23

github-actions

b7166

eec1e33

b7166

vulkan: allow graph_optimize for prompt processing workloads (#17475)

Assets 20

26 Nov 17:12

github-actions

b7165

879d673

b7165

vulkan: Implement top-k (#17418)

* vulkan: Implement top-k

Each pass launches workgroups that each sort 2^N elements (where N is usually 7-10)
and discards all but the top K. Repeat until only K are left. And there's a fast
path when K==1 to just find the max value rather than sorting.

* fix pipeline selection

* vulkan: Add N-ary search algorithm for topk

* microoptimizations

Assets 20

26 Nov 14:35

github-actions

b7164

6ab4e50

b7164

ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 (#17448)

* ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16

* ggml-cpu : dedup scalar impl

* Update ggml/src/ggml-cpu/vec.h

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Assets 20

26 Nov 14:30

github-actions

b7163

2336cc4

b7163

cmake : use EXCLUDE_FROM_ALL to avoid patch-boringssl.cmake (#17520)

We have to separate the code path starting 3.28 because
`FetchContent_Populate` is now deprecated and will be completely removed
in a future version.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

Assets 20

26 Nov 14:11

github-actions

b7162

e6923ca

b7162

ggml : fix ARM feature verification (#17519)

On arm64 with `cmake` version 3.31.6, the final feature verification fails:

    -- ARM detected flags: -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
    -- Performing Test GGML_MACHINE_SUPPORTS_dotprod
    -- Performing Test GGML_MACHINE_SUPPORTS_dotprod - Success
    -- Performing Test GGML_MACHINE_SUPPORTS_i8mm
    -- Performing Test GGML_MACHINE_SUPPORTS_i8mm - Success
    -- Performing Test GGML_MACHINE_SUPPORTS_sve
    -- Performing Test GGML_MACHINE_SUPPORTS_sve - Success
    -- Performing Test GGML_MACHINE_SUPPORTS_sme
    -- Performing Test GGML_MACHINE_SUPPORTS_sme - Failed
    -- Performing Test GGML_MACHINE_SUPPORTS_nosme
    -- Performing Test GGML_MACHINE_SUPPORTS_nosme - Success
    -- Checking for ARM features using flags:
    --   -U__ARM_FEATURE_SME
    --   -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme
    -- Performing Test HAVE_DOTPROD
    -- Performing Test HAVE_DOTPROD - Failed
    -- Performing Test HAVE_SVE
    -- Performing Test HAVE_SVE - Failed
    -- Performing Test HAVE_MATMUL_INT8
    -- Performing Test HAVE_MATMUL_INT8 - Failed
    -- Performing Test HAVE_FMA
    -- Performing Test HAVE_FMA - Success
    -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC
    -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC - Failed
    -- Performing Test HAVE_SME
    -- Performing Test HAVE_SME - Failed
    -- Adding CPU backend variant ggml-cpu: -U__ARM_FEATURE_SME;-mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme

We need to explicitly replace `;` with spaces from the list to make
`CMAKE_REQUIRED_FLAGS` work correctly...

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

Assets 20

26 Nov 11:46

github-actions

b7161

3e18dba

b7161

HIP: Patch failed testcase in WMMA-MMQ kernels for RDNA 4  (#17502)

* patch failed test case MUL_MAT(type_a=q4_0,type_b=f32,m=576,n=512,k=576,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) for enabling WMMA on RDNA4

* Quick clean up on mma.cuh to add ggml_cuda_memcpy_1 back in for half2 and bfloat162

Assets 20

Releases: ggml-org/llama.cpp

b7170

Uh oh!

b7169

Uh oh!

b7168

Uh oh!

b7167

Uh oh!

b7166

Uh oh!

b7165

Uh oh!

b7164

Uh oh!

b7163

Uh oh!

b7162

Uh oh!

b7161

Uh oh!