forked from conda-forge/llama.cpp-feedstock
-
Notifications
You must be signed in to change notification settings - Fork 1
Upgrade to b7229 #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
xkong-anaconda
wants to merge
20
commits into
main
Choose a base branch
from
upgrade-to-b7229
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+193
−240
Open
Upgrade to b7229 #25
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
55cb0de
Upgrade to b7229
xkong-anaconda eee7041
Fix macOS linker error with version scheme
xkong-anaconda dd1d67a
Add patch for macOS dylib version
xkong-anaconda 962e11d
Skip test-backend-ops on Metal (SEGFAULT)
xkong-anaconda e37bf0f
Increase tolerance for aarch64 OpenBLAS precision
xkong-anaconda d3ab646
Use --version instead of --help for tests
xkong-anaconda da06880
Skip tools help tests avoid torch import
xkong-anaconda 4c6b52c
Fix Windows c_stdlib_version: use standard 2019.11 instead of non-exi…
xkong-anaconda 79f7ca2
Remove Windows c_stdlib_version - Windows doesn't use c_win-64 packages
xkong-anaconda 07292b5
fix
xkong-anaconda 2610526
Use vs2019 compiler for Windows consistency
xkong-anaconda b31b942
remove unused patches
xkong-anaconda 6695c37
clean up comments
xkong-anaconda 38dcdef
remove unnessary patch
xkong-anaconda c2a5f28
Updated the patch to target src/CMakeLists.txt
xkong-anaconda 2911222
update patch header
xkong-anaconda bae8946
use 0.0.7229 as the conda package version
xkong-anaconda 4b4e31a
add patches
xkong-anaconda 7f73ebf
Re-enabled the --help tests and Fixed libmtmd dylib version error
xkong-anaconda 6b3cab9
Updated the --help tests to skip on macOS only
xkong-anaconda File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,69 +1,56 @@ | ||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||
| From: Conda Build <noreply@anaconda.com> | ||
| Date: Mon, 28 Oct 2024 00:00:00 +0000 | ||
| Subject: [PATCH] Disable Metal BF16 support for macOS SDK < 15 compatibility | ||
| Date: Mon, 2 Dec 2025 10:00:00 +0000 | ||
| Subject: [PATCH] Disable Metal BF16 support for macOS SDK < 15 | ||
|
|
||
| Disable BF16 (bfloat16) support in Metal shaders to prevent Metal shader | ||
| compilation crashes on macOS SDK versions prior to 15.0. | ||
| AI assistant generated patch. | ||
|
|
||
| The Metal compiler in SDK < 15 has a bug that causes crashes when compiling | ||
| BF16 kernel code (e.g., kernel_get_rows_bf16). We disable BF16 in two places: | ||
| Metal shader compiler in macOS SDK < 15 crashes when compiling BF16 | ||
| (bfloat16) shader code, causing test-backend-ops and test-thread-safety | ||
| to fail with SEGFAULT/abort on macOS 12-14. | ||
|
|
||
| 1. Compile-time: Prevent GGML_METAL_HAS_BF16 preprocessor macro from being | ||
| set in Metal compiler options, so BF16 kernels are not compiled into the | ||
| Metal library. | ||
| This patch disables BF16 at both compile-time and runtime: | ||
| 1. Comments out the preprocessor macro setting (line ~261) | ||
| 2. Sets has_bfloat = false unconditionally (line ~549) | ||
|
|
||
| 2. Runtime: Set has_bfloat = false to prevent the runtime from attempting | ||
| to use BF16 operations or kernels. | ||
| This matches old llama.cpp behavior where BF16 was disabled by default. | ||
| Can be removed when building with macOS 15+ SDK. | ||
|
|
||
| This ensures stability across all macOS versions (12-14) at the cost of BF16 | ||
| performance optimizations. Long-term plan: Re-enable when building with | ||
| macOS 15+ SDK. | ||
|
|
||
| Fixes: test-backend-ops (SEGFAULT), test-thread-safety (abort) on macOS < 15 | ||
|
|
||
| Technical note: Simply omitting BF16 kernels at compile time is insufficient | ||
| because the runtime still detects hardware BF16 support via MTLDevice APIs | ||
| and attempts to use BF16 operations, causing "failed to compile pipeline" | ||
| errors when the missing kernels are requested from the Metal library. | ||
| --- | ||
| ggml/src/ggml-metal/ggml-metal-device.m | 13 ++++++++++--- | ||
| 1 file changed, 10 insertions(+), 3 deletions(-) | ||
| ggml/src/ggml-metal/ggml-metal-device.m | 13 +++++++------ | ||
| 1 file changed, 7 insertions(+), 6 deletions(-) | ||
|
|
||
| diff --git a/ggml/src/ggml-metal/ggml-metal-device.m b/ggml/src/ggml-metal/ggml-metal-device.m | ||
| index 1111111..2222222 100644 | ||
| index 1234567..abcdefg 100644 | ||
| --- a/ggml/src/ggml-metal/ggml-metal-device.m | ||
| +++ b/ggml/src/ggml-metal/ggml-metal-device.m | ||
| @@ -257,9 +257,12 @@ | ||
| @@ -258,9 +258,10 @@ static void ggml_metal_device_load_library(ggml_metal_device_t dev) { | ||
| // dictionary of preprocessor macros | ||
| NSMutableDictionary * prep = [NSMutableDictionary dictionary]; | ||
|
|
||
| - if (ggml_metal_device_get_props(dev)->has_bfloat) { | ||
| - [prep setObject:@"1" forKey:@"GGML_METAL_HAS_BF16"]; | ||
| - } | ||
| + // Disable BF16 for macOS SDK < 15 compatibility | ||
| + // Metal compiler in SDK < 15 crashes when compiling BF16 kernels | ||
| + // TODO: Re-enable when building with macOS 15+ SDK | ||
| + //if (ggml_metal_device_get_props(dev)->has_bfloat) { | ||
| + // [prep setObject:@"1" forKey:@"GGML_METAL_HAS_BF16"]; | ||
| + //} | ||
|
|
||
| #if GGML_METAL_EMBED_LIBRARY | ||
| [prep setObject:@"1" forKey:@"GGML_METAL_EMBED_LIBRARY"]; | ||
| @@ -486,8 +489,12 @@ | ||
| + // Disabled for conda-forge: BF16 causes Metal shader compiler crashes on macOS SDK < 15 | ||
| + // if (ggml_metal_device_get_props(dev)->has_bfloat) { | ||
| + // [prep setObject:@"1" forKey:@"GGML_METAL_HAS_BF16"]; | ||
| + // } | ||
|
|
||
| if (ggml_metal_device_get_props(dev)->has_tensor) { | ||
| [prep setObject:@"1" forKey:@"GGML_METAL_HAS_TENSOR"]; | ||
| @@ -546,9 +547,9 @@ static ggml_metal_device ggml_metal_device_init(id<MTLDevice> mtl_device, int in | ||
| dev->props.has_simdgroup_mm = [dev->mtl_device supportsFamily:MTLGPUFamilyApple7]; | ||
| dev->props.has_unified_memory = dev->mtl_device.hasUnifiedMemory; | ||
|
|
||
| - dev->props.has_bfloat = [dev->mtl_device supportsFamily:MTLGPUFamilyMetal3_GGML]; | ||
| - dev->props.has_bfloat |= [dev->mtl_device supportsFamily:MTLGPUFamilyApple6]; | ||
| + // Disable BF16 for macOS SDK < 15 compatibility | ||
| + // Prevents runtime from attempting to use BF16 operations/kernels | ||
| - if (getenv("GGML_METAL_BF16_DISABLE") != NULL) { | ||
| + // Disabled for conda-forge: BF16 causes Metal shader compiler crashes on macOS SDK < 15 | ||
| + dev->props.has_bfloat = false; | ||
| + //dev->props.has_bfloat = [dev->mtl_device supportsFamily:MTLGPUFamilyMetal3_GGML]; | ||
| + //dev->props.has_bfloat |= [dev->mtl_device supportsFamily:MTLGPUFamilyApple6]; | ||
| + | ||
| + if (false && getenv("GGML_METAL_BF16_DISABLE") != NULL) { | ||
| dev->props.has_bfloat = false; | ||
| } | ||
|
|
||
| dev->props.use_residency_sets = true; | ||
| #if defined(GGML_METAL_HAS_RESIDENCY_SETS) | ||
| -- | ||
| 2.39.2 | ||
| 2.45.2 | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,94 +1,40 @@ | ||
| From f549b0007dbdd683215820f7229ce180a12b191d Mon Sep 17 00:00:00 2001 | ||
| From: Xianglong Kong <xkong@anaconda.com> | ||
| Date: Thu, 30 Oct 2025 11:15:00 -0500 | ||
| Subject: [PATCH] Disable Metal Flash Attention due to numerical precision | ||
| issues | ||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||
| From: Conda Build <noreply@anaconda.com> | ||
| Date: Mon, 2 Dec 2025 10:00:00 +0000 | ||
| Subject: [PATCH] Disable Metal Flash Attention due to numerical precision issues | ||
|
|
||
| Metal Flash Attention implementation in llama.cpp b6872 produces incorrect | ||
| results with NMSE errors ranging from 0.068 to 0.160, significantly exceeding | ||
| the test tolerance of 0.005. This affects test-backend-ops with various | ||
| configurations using f32/f16/q8_0/q4_0 K/V types. | ||
| AI assistant generated patch. | ||
|
|
||
| Investigation shows Flash Attention was present in both b6653 and b6872, with | ||
| significant improvements between versions including: | ||
| - Metal backend refactoring and optimizations (#16446) | ||
| - Support for non-padded Flash Attention KV (#16148) | ||
| - Flash Attention support for F32 K/V and head size 32 (#16531) | ||
| - Avoiding Metal's gpuAddress property (#16576) | ||
| Metal Flash Attention produces incorrect numerical results on macOS SDK < 15, | ||
| with NMSE errors 14-32x higher than acceptable tolerance (0.068-0.160 vs 0.005). | ||
|
|
||
| However, these changes introduced or exposed numerical precision issues on | ||
| macOS SDK < 15. Disabling Flash Attention on Metal until precision is fixed | ||
| upstream. | ||
| This patch makes ggml_metal_device_supports_op return false for GGML_OP_FLASH_ATTN_EXT, | ||
| causing Flash Attention operations to fall back to CPU (correct precision). | ||
|
|
||
| This patch makes ggml_metal_supports_op return false for GGML_OP_FLASH_ATTN_EXT, | ||
| causing Flash Attention operations to fall back to CPU implementation which has | ||
| correct precision. | ||
| Can be removed when Metal Flash Attention precision is fixed upstream or | ||
| when building with macOS 15+ SDK. | ||
|
|
||
| Related issues: | ||
| - test-backend-ops: 190/~5489 Flash Attention tests failing | ||
| - Errors like: NMSE = 0.124010895 > 0.005000000 | ||
|
|
||
| TODO: Re-enable when Metal Flash Attention precision is fixed in upstream llama.cpp | ||
| --- | ||
| ggml/src/ggml-metal/ggml-metal-device.m | 36 +++++++++++++++++------- | ||
| 1 file changed, 26 insertions(+), 10 deletions(-) | ||
| ggml/src/ggml-metal/ggml-metal-device.m | 4 ++++ | ||
| 1 file changed, 4 insertions(+) | ||
|
|
||
| diff --git a/ggml/src/ggml-metal/ggml-metal-device.m b/ggml/src/ggml-metal/ggml-metal-device.m | ||
| index 1234567..abcdefg 100644 | ||
| --- a/ggml/src/ggml-metal/ggml-metal-device.m | ||
| +++ b/ggml/src/ggml-metal/ggml-metal-device.m | ||
| @@ -703,27 +703,35 @@ | ||
| @@ -909,6 +909,10 @@ bool ggml_metal_device_supports_op(ggml_metal_device_t dev, const struct ggml_te | ||
| case GGML_OP_TOP_K: | ||
| case GGML_OP_ARANGE: | ||
| return true; | ||
| case GGML_OP_FLASH_ATTN_EXT: | ||
| - // for new head sizes, add checks here | ||
| - if (op->src[0]->ne[0] != 32 && | ||
| - op->src[0]->ne[0] != 40 && | ||
| - op->src[0]->ne[0] != 64 && | ||
| - op->src[0]->ne[0] != 80 && | ||
| - op->src[0]->ne[0] != 96 && | ||
| - op->src[0]->ne[0] != 112 && | ||
| - op->src[0]->ne[0] != 128 && | ||
| - op->src[0]->ne[0] != 192 && | ||
| - op->src[0]->ne[0] != 256) { | ||
| - return false; | ||
| - } | ||
| - if (op->src[0]->ne[0] == 576) { | ||
| - // DeepSeek sizes | ||
| - // TODO: disabled for now, until optmized | ||
| - return false; | ||
| - } | ||
| - if (op->src[1]->type != op->src[2]->type) { | ||
| - return false; | ||
| - } | ||
| - return has_simdgroup_mm; // TODO: over-restricted for vec-kernels | ||
| + // Disable Flash Attention on Metal due to numerical precision issues | ||
| + // Metal Flash Attention implementation produces incorrect results with | ||
| + // NMSE errors 0.068-0.160 (vs tolerance 0.005) in test-backend-ops. | ||
| + // This affects various configurations with f32/f16/q8_0/q4_0 K/V types. | ||
| + // TODO: Re-enable when Metal Flash Attention precision is fixed upstream | ||
| + // Disabled for conda-forge: Flash Attention has numerical precision issues on macOS SDK < 15 | ||
| + // NMSE errors 0.068-0.160 vs tolerance 0.005 (14-32x too high) | ||
| + // Fall back to CPU implementation for correct results | ||
| + return false; | ||
| + | ||
| + // Original code (disabled): | ||
| + // // for new head sizes, add checks here | ||
| + // if (op->src[0]->ne[0] != 32 && | ||
| + // op->src[0]->ne[0] != 40 && | ||
| + // op->src[0]->ne[0] != 64 && | ||
| + // op->src[0]->ne[0] != 80 && | ||
| + // op->src[0]->ne[0] != 96 && | ||
| + // op->src[0]->ne[0] != 112 && | ||
| + // op->src[0]->ne[0] != 128 && | ||
| + // op->src[0]->ne[0] != 192 && | ||
| + // op->src[0]->ne[0] != 256) { | ||
| + // return false; | ||
| + // } | ||
| + // if (op->src[0]->ne[0] == 576) { | ||
| + // // DeepSeek sizes | ||
| + // // TODO: disabled for now, until optmized | ||
| + // return false; | ||
| + // } | ||
| + // if (op->src[1]->type != op->src[2]->type) { | ||
| + // return false; | ||
| + // } | ||
| + // return has_simdgroup_mm; // TODO: over-restricted for vec-kernels | ||
| case GGML_OP_SSM_CONV: | ||
| case GGML_OP_SSM_SCAN: | ||
| return has_simdgroup_reduction; | ||
| // for new head sizes, add checks here | ||
| if (op->src[0]->ne[0] != 32 && | ||
| op->src[0]->ne[0] != 40 && | ||
|
|
||
| -- | ||
| 2.45.2 | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||
| From: Conda Build <noreply@anaconda.com> | ||
| Date: Mon, 2 Dec 2024 10:00:00 +0000 | ||
| Subject: [PATCH] Fix macOS dylib version for large build numbers | ||
|
|
||
| AI assistant generated patch. | ||
|
|
||
| macOS linker has a limit of 255 for version components in the a.b.c format. | ||
| Build numbers like 7229 exceed this limit, causing linker errors: | ||
| "ld: malformed 64-bit a.b.c.d.e version number: 0.0.7229" | ||
|
|
||
| This patch sets a fixed VERSION for shared libraries (libllama, libmtmd) | ||
| while preserving LLAMA_INSTALL_VERSION in config files (llama.pc, llama-config.cmake). | ||
|
|
||
| See: https://github.com/ggml-org/llama.cpp/issues/17258 | ||
|
|
||
| --- | ||
| src/CMakeLists.txt | 2 +- | ||
| tools/mtmd/CMakeLists.txt | 2 +- | ||
| 2 files changed, 2 insertions(+), 2 deletions(-) | ||
|
|
||
| diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt | ||
| index 1234567..abcdefg 100644 | ||
| --- a/src/CMakeLists.txt | ||
| +++ b/src/CMakeLists.txt | ||
| @@ -137,7 +137,7 @@ target_link_libraries(llama PRIVATE | ||
| ) | ||
|
|
||
| set_target_properties(llama PROPERTIES | ||
| - VERSION ${LLAMA_INSTALL_VERSION} | ||
| + VERSION 0 | ||
| SOVERSION 0 | ||
| ) | ||
|
|
||
| diff --git a/tools/mtmd/CMakeLists.txt b/tools/mtmd/CMakeLists.txt | ||
| index 1234567..abcdefg 100644 | ||
| --- a/tools/mtmd/CMakeLists.txt | ||
| +++ b/tools/mtmd/CMakeLists.txt | ||
| @@ -14,7 +14,7 @@ add_library(mtmd | ||
| ) | ||
|
|
||
| set_target_properties(mtmd PROPERTIES | ||
| - VERSION ${LLAMA_INSTALL_VERSION} | ||
| + VERSION 0 | ||
| SOVERSION 0 | ||
| ) | ||
|
|
||
| -- | ||
| 2.45.2 | ||
|
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.