Skip to content

Conversation

@CISC
Copy link
Collaborator

@CISC CISC commented Dec 8, 2025

Since all major backends (CUDA, Metal and Vulkan) support the fill op now use that instead of scale_bias.

@CISC CISC requested a review from ggerganov December 8, 2025 18:46
@ggerganov
Copy link
Member

Instead of using the explicit ggml_fill_inplace, leave this to the allocator by adding the op to ggml-alloc.c::ggml_op_can_inplace. Would also need to remove the restrict classifier in the CUDA implementation.

@CISC
Copy link
Collaborator Author

CISC commented Dec 8, 2025

Instead of using the explicit ggml_fill_inplace, leave this to the allocator by adding the op to ggml-alloc.c::ggml_op_can_inplace. Would also need to remove the restrict classifier in the CUDA implementation.

Ah, didn't know about that, I'll change it to just ggml_fill.

I'll look into the rest for another PR where/how is the decision to do it in place done?

@ggerganov
Copy link
Member

Generally, avoid using explicit _inplace ops unless you need to modify some buffer permanently (e.g. a KV cache).

I'll look into the rest for another PR where/how is the decision to do it in place done?

The logic for that is in ggml_gallocr_allocate_node. If one of the parents of the op is suitable, the result will reuse it's memory making the op effectively inplace.

@CISC
Copy link
Collaborator Author

CISC commented Dec 8, 2025

@ggerganov Hmmm, it won't reuse it for FILL, it's skipping it here:

not reusing parent ffn_moe_probs_biased-19 (reshaped) for node_1556 as (nil) is external

@ggerganov
Copy link
Member

I think this means that the parent buffer is on a different backend buffer - this is expected to not be able to inplace.

@CISC
Copy link
Collaborator Author

CISC commented Dec 8, 2025

I think this means that the parent buffer is on a different backend buffer - this is expected to not be able to inplace.

Ah, this might be why:

not reusing parent blk.19.exp_probs_b.bias for ffn_moe_probs_biased-19 as 0x7e10eba4b800 is external

@CISC
Copy link
Collaborator Author

CISC commented Dec 8, 2025

It's a little odd that ggml_fill_inplace works fine then, but I guess it's an ordering issue, I tried forcing ggml_gallocr_allocate_node to do it for FILL and then it just outputs ?.

@CISC CISC merged commit c8554b6 into master Dec 8, 2025
76 of 78 checks passed
@CISC CISC deleted the cisc/use-fill-instead-of-scale-bias branch December 8, 2025 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants