Releases · ggml-org/llama.cpp

26 Nov 14:30

2336cc4

b7163

cmake : use EXCLUDE_FROM_ALL to avoid patch-boringssl.cmake (#17520)

We have to separate the code path starting 3.28 because
`FetchContent_Populate` is now deprecated and will be completely removed
in a future version.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

Assets 20

26 Nov 14:11

github-actions

b7162

e6923ca

b7162

ggml : fix ARM feature verification (#17519)

On arm64 with `cmake` version 3.31.6, the final feature verification fails:

    -- ARM detected flags: -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
    -- Performing Test GGML_MACHINE_SUPPORTS_dotprod
    -- Performing Test GGML_MACHINE_SUPPORTS_dotprod - Success
    -- Performing Test GGML_MACHINE_SUPPORTS_i8mm
    -- Performing Test GGML_MACHINE_SUPPORTS_i8mm - Success
    -- Performing Test GGML_MACHINE_SUPPORTS_sve
    -- Performing Test GGML_MACHINE_SUPPORTS_sve - Success
    -- Performing Test GGML_MACHINE_SUPPORTS_sme
    -- Performing Test GGML_MACHINE_SUPPORTS_sme - Failed
    -- Performing Test GGML_MACHINE_SUPPORTS_nosme
    -- Performing Test GGML_MACHINE_SUPPORTS_nosme - Success
    -- Checking for ARM features using flags:
    --   -U__ARM_FEATURE_SME
    --   -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme
    -- Performing Test HAVE_DOTPROD
    -- Performing Test HAVE_DOTPROD - Failed
    -- Performing Test HAVE_SVE
    -- Performing Test HAVE_SVE - Failed
    -- Performing Test HAVE_MATMUL_INT8
    -- Performing Test HAVE_MATMUL_INT8 - Failed
    -- Performing Test HAVE_FMA
    -- Performing Test HAVE_FMA - Success
    -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC
    -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC - Failed
    -- Performing Test HAVE_SME
    -- Performing Test HAVE_SME - Failed
    -- Adding CPU backend variant ggml-cpu: -U__ARM_FEATURE_SME;-mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme

We need to explicitly replace `;` with spaces from the list to make
`CMAKE_REQUIRED_FLAGS` work correctly...

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

Assets 20

26 Nov 11:46

github-actions

b7161

3e18dba

b7161

HIP: Patch failed testcase in WMMA-MMQ kernels for RDNA 4  (#17502)

* patch failed test case MUL_MAT(type_a=q4_0,type_b=f32,m=576,n=512,k=576,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) for enabling WMMA on RDNA4

* Quick clean up on mma.cuh to add ggml_cuda_memcpy_1 back in for half2 and bfloat162

Assets 20

26 Nov 11:11

github-actions

b7160

eeb5605

b7160

CANN: Add MROPE and IMROPE support (#17401)

* CANN: ROPE supports both MROPE and IMROPE.

1. Optimize the caching logic of rope_cache_init.
2. Add support for mRoPE and i-mRoPE.

Note that on Ascend 910B devices, it is necessary to disable FA
in CLIP and disable NZ-format conversion. These two issues are
still under investigation.

* Resolve review comments

Assets 20

26 Nov 08:25

github-actions

b7159

f3a848a

b7159

chore: upgrade cpp-httplib from v0.27.0 to v0.28.0 (#17513)

Assets 20

26 Nov 07:11

github-actions

b7158

b3b03a7

b7158

vulkan: Implement GGML_OP_CUMSUM (#17479)

Assets 20

25 Nov 15:21

github-actions

b7157

583cb83

b7157

ggml : add ggml_top_k (#17365)

* ggml : add ggml_top_k

* cont : add ggml_argsort_top_k

* metal : add top_k support

* ggml : cleanup

* tests : add virtual err() function for test_case

* ggml : add comments

Assets 20

25 Nov 10:27

github-actions

b7154

064c90d

b7154

CANN: supports out_prod operator for F32 and F16 (#17406)

Co-authored-by: tianhao <tianhao42@huawei.com>

Assets 20

25 Nov 07:34

github-actions

b7152

d414db0

b7152

vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16 …

Assets 20

25 Nov 02:46

github-actions

b7151

877566d

b7151

llama: introduce support for model-embedded sampling parameters (#17120)

Assets 20

Releases: ggml-org/llama.cpp

b7163

Uh oh!

b7162

Uh oh!

b7161

Uh oh!

b7160

Uh oh!

b7159

Uh oh!

b7158

Uh oh!

b7157

Uh oh!

b7154

Uh oh!

b7152

Uh oh!

b7151

Uh oh!