Sync master with upstream release b6962 #315

jan-service-account · 2025-11-06T00:35:26Z

Updates dev branch with latest release (b6962) from ggml-org/llama.cpp

* opencl: update docs * opencl: update docs * opencl: fix link * opencl: update doc

* Fix test-conv2d-dw failure on ARM SVE by using runtime vector length The ggml_compute_forward_conv_2d_dw_cwhn function was using a hardcoded GGML_F32_EPR (8) for SIMD vectorization, but on ARM SVE the actual vector length varies by hardware. This caused incorrect computation when processing CWHN layout tensors on ARM machines. Fix by using svcntw() to get the runtime SVE vector length instead of the compile-time constant. Co-authored-by: ggerganov <1991296+ggerganov@users.noreply.github.com> * ci : reduce sam score threshold * ci : update bbox checks for sam test --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ggerganov <1991296+ggerganov@users.noreply.github.com>

* Add buffer label and enable dawn-specific toggles to turn off some checks * Minor set_rows optimization (#4) * updated optimization, fixed errors * non vectorized version now dispatches one thread per element * Simplify * Change logic for set_rows pipelines --------- Co-authored-by: Neha Abbas <nehaabbas@macbookpro.lan> Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local> Co-authored-by: Reese Levine <reeselevine1@gmail.com> * Comment on dawn toggles * Remove some comments * Implement overlap binary operators * Revert "Implement overlap binary operators" This reverts commit ed710b3. * Disable support for non-contiguous binary_op tensors and leave note for future support --------- Co-authored-by: neha-ha <137219201+neha-ha@users.noreply.github.com> Co-authored-by: Neha Abbas <nehaabbas@macbookpro.lan> Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local>

* Model: add openPangu-Embedded * fixed according to reviewer's comments * fixed the chat template check condition * Apply suggestions from code review change the chat-template check condition and some formatting issue Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * whitespace cleanup --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

…gml-org#17017) * server : do not default to multiple slots with speculative decoding * cont : fix

* feat(llama-gguf): Print out the tensor type in llama-gguf r Branch: Mamba2Perf Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat(off-topic): print the number of elements in tensors with llama-gguf Branch: Mamba2SSD Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * style: valign Branch: GGUFToolOutputs Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * Update examples/gguf/gguf.cpp --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

…rg#16919)

…l-org#16841) * WIP * added a cpy kernel specific to transposed tensor which uses smem to avoid uncoalesced access; test cases also added shwoing improved memory bandwidth * added BF16 support * more strict check to make sure src0 is a transpose * reformulated to handle more complicated transpose cases * bring back 2D transpose for higher performance * allow build on windows * tranpose copy more shapes * minor tweak * final clean up * restore some test cases * keep only the kernel for true tranposed case; updated with review suggestions * make CI happy * remove headers not needed * reduced bank conflicts for fp16 and bf16 * add missing const* * now bank conflicts free * use padding instead of swizzling --------- Co-authored-by: bssrdf <bssrdf@gmail.com>

lhez and others added 13 commits November 4, 2025 16:02

opencl: update doc (ggml-org#17011)

5e90233

* opencl: update docs * opencl: update docs * opencl: fix link * opencl: update doc

CUDA: update ops.md (ggml-org#17005)

9aa6337

sync : ggml

cdabeb2

docs: Clarify the endpoint that webui uses (ggml-org#17001)

fd2f84f

mtmd: improve struct initialization (ggml-org#16981)

2f0c2db

server : do not default to multiple slots with speculative decoding (g…

13b339b

…gml-org#17017) * server : do not default to multiple slots with speculative decoding * cont : fix

mtmd: allow QwenVL to process larger image by default (ggml-org#17020)

92bb84f

vulkan: Fix GGML_VULKAN_CHECK_RESULTS to better handle fusion (ggml-o…

a44d771

…rg#16919)

jan-service-account merged commit 8dea587 into dev Nov 6, 2025
3 checks passed

jan-service-account deleted the update-dev-from-master-2025-11-06-00-35 branch November 6, 2025 00:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b6962 #315

Sync master with upstream release b6962 #315

Uh oh!

jan-service-account commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Sync master with upstream release b6962 #315

Sync master with upstream release b6962 #315

Uh oh!

Conversation

jan-service-account commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants