Sync master with upstream release b6663 #270

jan-service-account · 2025-10-02T02:32:59Z

Updates dev branch with latest release (b6663) from ggml-org/llama.cpp

* ci : add AMD runners and workflows * ci : move AMD jobs to separate workflow * cont : fix paths

…locks (ggml-org#16326) * fix: prevent reasoning blocks with quotes from being truncated * chore: update webui build output * feat: Improve thinking content parsing * test: Adds ChatMessage component stories for different thinking blocks * chore: update webui build output * fix: ChatMessage story fix --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

…ounding differences (ggml-org#16295) * tests: override test_set_rows::max_nmse_err to allow for occasional rounding differences * apply similar error bounds to test_cpy

The JSON parser is temporarily kept only for backward compatibility. It reads the etag from old .json files to prevent unnecessary re-downloads for existing users. This legacy code can be removed in a future version. Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* metal : dynamic simdgroups for MV kernels * cont : minor

…-org#16328) * Fix Nemotron Nano v2 9B not executing as CUDA Graph on NVIDIA GPUs * fix to ensure test-backend-ops check passes

`test-arg-parser.cpp` has been updated to work consistently, regardless of whether CURL or SSL support is available, and now always points to `ggml.ai`. The previous timeout test has been removed, but it can be added back by providing a dedicated URL under `ggml.ai`. Signed-off-by: Adrien Gallouët <angt@huggingface.co>

…org#16187) * Work on rope * Simplify inplace operation generation and combine mul/add generation * Work on rope variants * implement neox rope * rope complete * Add sub,div,glu operators * implement scale op * Update cpy shader to handle cont/more types * formatting * Update test vars printing for rope,rms_norm * Avoid ROPE hardcoded constants * Add TODO to change ROPE constants to enum Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix TODO comment --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* fix: skip empty sampling fields instead of coercing to 0 in chat API options * chore: update webui build output

* common : disable progress bar without a tty Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Add missing headers Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* fix ccache key for ubuntu-cpu-cmake * set it for release as well [no ci]

…gml-org#16359) * Make a few GLM tensors not required layer.nextn.shared_head_head and layer.nextn.embed_tokens are both excluded from GLM 4.6 resulting in the model not loading after conversion/quantization, this marks those tensors as not required which makes it work * Update llama-model.cpp layer.nextn.shared_head_norm also not required in case of future models

…ml-org#16363)

…gml-org#16345) * make ggml_vk_default_dispatcher support older vulkan headers * simpilfy with using

…16337) * feat: Add a setting to include model name used to generate the message * feat: UI improvements * feat: Save model info along with the database message entry creation * chore: Build webui static output

* feat: Improve code block theming * chore: update webui build output * chore: Update webui static build

…onditional rendering for Actions Dropdown for Chat Conversation Items (ggml-org#16369) * fix: Render Conversation action dialogs as singletons from Chat Sidebar level * chore: update webui build output * fix: Render Actions Dropdown conditionally only when user hovers conversation item + remove unused markup * chore: Update webui static build * fix: Always truncate conversation names * chore: Update webui static build

* common: introduce http.h for httplib-based client This change moves cpp-httplib based URL parsing and client setup into a new header `common/http.h`, and integrates it in `arg.cpp` and `run.cpp`. It is an iteration towards removing libcurl, while intentionally minimizing changes to existing code to guarantee the same behavior when `LLAMA_CURL` is used. Signed-off-by: Adrien Gallouët <angt@huggingface.co> * tools : add missing WIN32_LEAN_AND_MEAN Signed-off-by: Adrien Gallouët <adrien@gallouet.fr> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co> Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>

* CI: Properly install rocwmma for hip builds on windows we now windows install rocwmma from ubuntu pacakges * CI: update linux rocm docker build to use rocm 7.0

…gml-org#16075) * Fix to use hidden_size_per_head * Fix num heads * Fix array * Fix loading weights * Support old GGUF converted by the previous version of llama.cpp * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Move shared parameter definitions to the outside of loop * Not calculating n_embd_head_k,v by n_embd / n_head --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

ggml-org#16221) * HIP: Disable ROCWMMA fatt on CDNA when compiled against ROCWMMA 2.0.0 rocwmma 2.0.0 includes a bug in the code fakeing fp16 accumulation on CDNA * CUDA: Fix volta condition in ggml_cuda_should_use_wmma_fattn

ggerganov and others added 28 commits September 29, 2025 17:51

ci : add AMD runners and workflows (ggml-org#16249)

d72f5f7

* ci : add AMD runners and workflows * ci : move AMD jobs to separate workflow * cont : fix paths

tests: override test_set_rows::max_nmse_err to allow for occasional r…

a74a0d6

…ounding differences (ggml-org#16295) * tests: override test_set_rows::max_nmse_err to allow for occasional rounding differences * apply similar error bounds to test_cpy

codeowners: add codeowners for opencl backend (ggml-org#16344)

de41f2b

kleidiai : fix work size and threads sync for fp16 (ggml-org#16246)

f1eb1cb

metal : dynamic simdgroups for MV kernels (ggml-org#16340)

35fb824

* metal : dynamic simdgroups for MV kernels * cont : minor

cuda : Enable CUDA Graph usage for Nemotron Nano v2 (NemotronH) (ggml…

a014310

…-org#16328) * Fix Nemotron Nano v2 9B not executing as CUDA Graph on NVIDIA GPUs * fix to ensure test-backend-ops check passes

ggml : bump version to 0.9.4 (ggml/1363)

075c015

ci : disable ccache for android (ggml-org#16348)

2df5bcf

opencl: support ne3 in get_rows (ggml-org#15866)

d1c84a6

Chatapi ignore empty sampling (ggml-org#16330)

16b0ca0

* fix: skip empty sampling fields instead of coercing to 0 in chat API options * chore: update webui build output

opencl: support pad_ext (ggml-org#15888)

7c156df

ci : fix ccache key for ubuntu-cpu-cmake (ggml-org#16355)

b2ba81d

* fix ccache key for ubuntu-cpu-cmake * set it for release as well [no ci]

webui: Remove running llama-server within WebUI dev.sh script (gg…

aa9538a

…ml-org#16363)

vulkan: make ggml_vk_default_dispatcher support older vulkan headers (g…

132d673

…gml-org#16345) * make ggml_vk_default_dispatcher support older vulkan headers * simpilfy with using

Add optional setting for showing "Model used:" information (ggml-org#…

4f15759

…16337) * feat: Add a setting to include model name used to generate the message * feat: UI improvements * feat: Save model info along with the database message entry creation * chore: Build webui static output

ci : use registry cache for docker builds (ggml-org#16366)

1104ca1

Improve code block color theming (ggml-org#16325)

2a9b633

* feat: Improve code block theming * chore: update webui build output * chore: Update webui static build

ci: Properly install rocwmma for hip builds (ggml-org#16305)

1fe4e38

* CI: Properly install rocwmma for hip builds on windows we now windows install rocwmma from ubuntu pacakges * CI: update linux rocm docker build to use rocm 7.0

Minh141120 merged commit 6928d0d into dev Oct 2, 2025
5 of 7 checks passed

Minh141120 deleted the update-dev-from-master-2025-10-02-02-32 branch October 2, 2025 02:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b6663 #270

Sync master with upstream release b6663 #270

Uh oh!

jan-service-account commented Oct 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants