Releases · jeffbolznv/llama.cpp

18 Oct 14:53

ee09828

b6795 Latest

Latest

HIP: fix GPU_TARGETS (#16642)

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-10-18T14:53:59Z
llama-b6795-bin-macos-arm64.zip

sha256:257af57a9d034a61a50451fc47864cb5586c5ad57276601e5794a85bc36121bf

10.4 MB 2025-10-18T14:54:10Z
llama-b6795-bin-macos-x64.zip

sha256:575905cbc35d1f3f3d8213a5058174cd49c887bd53dc6cc1107f270f7a1f3370

27 MB 2025-10-18T14:54:11Z
llama-b6795-bin-ubuntu-vulkan-x64.zip

sha256:82eebb9b59954bcaab7c090980fda94e4713b4cb4a7519ec16b4dbb185b68a8f

25.9 MB 2025-10-18T14:54:12Z
llama-b6795-bin-ubuntu-x64.zip

sha256:0ac9fbda3ac402d2fee25e61a9782b8d6a25c70acb8be9c2f103bb0e8646abc9

12.5 MB 2025-10-18T14:54:13Z
llama-b6795-bin-win-cpu-arm64.zip

sha256:361971beb5ed8b03f59f15c75d4805820c281ce6a12f901281697b8a0bad8d91

10.6 MB 2025-10-18T14:54:14Z
llama-b6795-bin-win-cpu-x64.zip

sha256:1f711b6b3080c52534fe3b28ba74f17f7691602f5ed896baef17cd7220d9d1e2

13.7 MB 2025-10-18T14:54:15Z
llama-b6795-bin-win-cuda-12.4-x64.zip

sha256:c77f764a25e89c3d32690d52099b2a258bf7974ccae26fe2f2992bf130c7533d

169 MB 2025-10-18T14:54:16Z
llama-b6795-bin-win-hip-radeon-x64.zip

sha256:cce67396b796a4f3397969dab6a28da2331b7074f9896507d8b0010cb2bdcc96

321 MB 2025-10-18T14:54:21Z
llama-b6795-bin-win-opencl-adreno-arm64.zip

sha256:76dd7a2bed0ac36c104dfee48a98a4597fe9abc3fe51c840bfe2f94858f51672

11 MB 2025-10-18T14:54:28Z
Source code (zip)

2025-10-18T12:47:32Z
Source code (tar.gz)

2025-10-18T12:47:32Z

17 Oct 19:00

github-actions

b6791

66b0dbc

b6791

llama-model: fix insonsistent ctxs <-> bufs order (#16581)

Assets 15

16 Oct 20:21

github-actions

b6782

1bb4f43

b6782

mtmd : support home-cooked Mistral Small Omni (#14928)

Assets 15

12 Oct 20:11

github-actions

b6745

a31cf36

b6745

metal : add opt_step_adamw and op_sum (#16529)

* scaffold to support opt step adamw on metal (not written so far)

* add opt-step-adamw kernel for metal

* pass op->src[4] as a separate buffer to the pipeline

* add bounds check to opt-step-adamw kernel

* complete scaffold for GGML_OP_SUM

* naive GGML_OP_SUM kernel

* remove unwanted comment

* change OP_SUM capability gate

* Add has_simdgroup_reduction to both ops to pass CI

Assets 15

12 Oct 19:23

github-actions

b6744

81d54bb

b6744

webui: remove client-side context pre-check and rely on backend for l…

Assets 15

30 Sep 13:09

github-actions

b6644

075c015

b6644

ggml : bump version to 0.9.4 (ggml/1363)

Assets 15

28 Sep 16:56

github-actions

b6618

d9e0e7c

b6618

ci : fix musa docker build (#16306)

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

Assets 15

27 Sep 16:26

github-actions

b6604

3f81b4e

b6604

vulkan: support GET_ROWS for k-quants (#16235)

The dequantize functions are copy/pasted from mul_mm_funcs.comp with very few
changes - add a_offset and divide iqs by 2. It's probably possible to call
these functions from mul_mm_funcs and avoid the duplication, but I didn't go
that far in this change.

Assets 15

22 Sep 16:17

github-actions

b6548

37a23c1

b6548

common : enable `--offline` mode without curl support (#16137)

* common : use the json parser

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* common : enable --offline mode without CURL support

This change refactors the download logic to properly support offline mode
even when the project is built without CURL.

Without this commit, using `--offline` would give the following error:

    error: built without CURL, cannot download model from the internet

even if all the files are already cached.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

Assets 15

21 Sep 15:40

github-actions

b6530

28baac9

b6530

ci : migrate ggml ci to self-hosted runners (#16116)

* ci : migrate ggml ci to a self-hosted runners

* ci : add T4 runner

* ci : add instructions for adding self-hosted runners

* ci : disable test-backend-ops from debug builds due to slowness

* ci : add AMD V710 runner (vulkan)

* cont : add ROCM workflow

* ci : switch to qwen3 0.6b model

* cont : fix the context size

Assets 15

Releases: jeffbolznv/llama.cpp

b6795

Uh oh!

b6791

Uh oh!

b6782

Uh oh!

b6745

Uh oh!

b6744

Uh oh!

b6644

Uh oh!

b6618

Uh oh!

b6604

Uh oh!

b6548

Uh oh!

b6530

Uh oh!