Fix DeepSeek V3.2 on Apple Silicon: Force CPU fallback for Sparse Top… #24

stevescot · 2025-12-05T15:47:23Z

Fix DeepSeek V3.2 Inference on Apple Silicon (Metal)

Description:
This PR addresses multiple runtime crashes and assertions encountered when running the experimental DeepSeek V3.2 model on Apple Silicon (M3 Ultra).

Changes:

llama-model-loader.cpp: Fixed a crash during tensor loading where tensor->src[0] was being accessed before initialization. Rewrote the loop to correctly handle tensor naming and allocation.
llama-sparse-mla-fwd.cpp: Disabled LLAMA_SPARSE_MLA_FUSED_DECODE by default. The fused decode kernel is not currently implemented for Metal, causing assertions.
llama-sparse-topk.cpp: Added a fallback for SPARSE_TOPK_RADIX.
Issue: The Metal backend does not implement the SPARSE_TOPK_RADIX operator, causing the graph to fall back to CPU. However, the CPU backend explicitly asserts false for this operator, leading to a crash.
Fix: Wrapped the call in an #ifdef APPLE block to force the use of the CPU-based sparse_attn_topk::topk_radix_indices function instead of the graph operator, bypassing the missing backend support while preserving CUDA behavior for other platforms.
Result:
Model loads and runs inference successfully on M3 Ultra with ~12 t/s prompt processing and ~8 t/s generation.
…-K, disable fused decode, and fix model loader

Make sure to read the contributing guidelines before submitting a PR

…-K, disable fused decode, and fix model loader

Fix DeepSeek V3.2 on Apple Silicon: Force CPU fallback for Sparse Top…

2247a0b

…-K, disable fused decode, and fix model loader

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix DeepSeek V3.2 on Apple Silicon: Force CPU fallback for Sparse Top… #24

Fix DeepSeek V3.2 on Apple Silicon: Force CPU fallback for Sparse Top… #24

Uh oh!

stevescot commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix DeepSeek V3.2 on Apple Silicon: Force CPU fallback for Sparse Top… #24

Are you sure you want to change the base?

Fix DeepSeek V3.2 on Apple Silicon: Force CPU fallback for Sparse Top… #24

Uh oh!

Conversation

stevescot commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant