ggml : add ggml_top_k #17365

ggerganov · 2025-11-18T14:41:21Z

Add a dedicated top-k op so that it can be more efficiently optimized by backend implementations. The old implementation is renamed to ggml_argsort_top_k.

TODO:

Allow unsorted output (ggml : add ggml_top_k #17365 (comment)) (see: c63ecde)
Do not rely on op_params
CUDA (will be added in sampling : add support for backend sampling #17004)
Metal

Next PRs:

Vulkan
etc.

am17an · 2025-11-19T10:32:27Z

Does this operator expect the top-K elements to be sorted?

ORippler · 2025-11-19T10:39:09Z

Does this operator expect the top-K elements to be sorted?

I feel it should not be sorted, as algorithmically we are performing a selection, and depending on the algorithm the outcome of this selection is unordered:

https://leimao.github.io/blog/CPU-TopK-Algorithm/
https://nvidia.github.io/cccl/cub/api/structcub_1_1DeviceTopK.html#overview

Should one wish to sort, one could easily do GGML_OP_TOP_K -> GGML_OP_ARGSORT

ggerganov · 2025-11-19T10:40:43Z

Does this operator expect the top-K elements to be sorted?

In principle it does not have to expect the elements to be sorted. However the current implementation sorts them in descending order in order to be able to verify correctness with test-backend-ops. If we allow arbitrary order I am not sure how we would verify correctness.

ORippler · 2025-11-19T10:44:54Z

If we allow arbitrary order I am not sure how we would verify correctness.

By treating them as sets rather than lists? We could use std::unordered_set for this

am17an · 2025-11-19T10:46:44Z

If we allow arbitrary order I am not sure how we would verify correctness.

By treating them as sets rather than lists? We could use std::unordered_set for this

Currently, test-backend-ops relies on NMSE of outputs rather than cardinality checks, but I guess that can be changed.

slaren · 2025-11-19T10:49:20Z

It would be ok to add an overrideable error function to test_case. Leave NMSE as the default and override it in test_top_k to compare as a set.

jeffbolznv · 2025-11-19T14:43:02Z

What are common tensor shapes and values of k we should optimize for?

Does this operation support non-contiguous rows?

ggerganov · 2025-11-19T15:13:17Z

@jeffbolznv This will be used in #17004 to do top-k sampling efficiently on the GPU. The typical shapes are:

large src[0]->ne[0] (i.e. up to vocab size)
small k (usually 1, 10, 40)

Support for non-contiguous rows is not necessary for now - will add asserts for that.

jeffbolznv · 2025-11-19T20:43:47Z

OK, understood. When you get a chance, please rebase, I'll implement something based on #17313.

CISC · 2025-11-20T09:57:54Z

src/llama-graph.cpp

        ggml_tensor * selection_groups = ggml_reshape_3d(ctx0, selection_probs, n_exp_per_group, hparams.n_expert_groups, n_tokens); // [n_exp_per_group, n_expert_groups, n_tokens]

-        ggml_tensor * group_scores = ggml_top_k(ctx0, selection_groups, 2); // [2, n_expert_groups, n_tokens]
+        ggml_tensor * group_scores = ggml_argsort_top_k(ctx0, selection_groups, 2); // [2, n_expert_groups, n_tokens]


I guess these are temporary until all backends support are in place? Add a TODO?

Not 100% sure yet - keeping the expert order deterministic might be necessary. And using ggml_top_k here would likely not make a big difference performance wise since the arrays are very small.

Completely unnecessary for the expert group selection at least.

Please don't change it anywhere else as it will also break fusion in all backends for topk-moe

ggerganov · 2025-11-20T10:08:19Z

since we store ascending int numbers in our array,

The values are also shuffled:

llama.cpp/tests/test-backend-ops.cpp

Lines 5034 to 5042 in c63ecde

    
           // initialize with unique values to avoid ties 
        
           for (int64_t r = 0; r < ggml_nrows(t); r++) { 
        
               std::vector<float> data(t->ne[0]); 
        
               for (int i = 0; i < t->ne[0]; i++) { 
        
                   data[i] = i; 
        
               } 
        
               std::shuffle(data.begin(), data.end(), rng); 
        
               ggml_backend_tensor_set(t, data.data(), r * t->nb[1], t->ne[0] * sizeof(float)); 
        
           }

So top 1 could be any number.

jeffbolznv · 2025-11-21T00:02:42Z

Vulkan support is ready in #17418.

ggerganov · 2025-11-25T08:45:26Z

It would be ok to add an overrideable error function to test_case. Leave NMSE as the default and override it in test_top_k to compare as a set.

Added the overridable error function in 961dd4f

Think this is OK to merge. Will do so later today if there are no additional concerns.

ORippler · 2025-11-25T09:39:27Z

ggml/src/ggml-metal/ggml-metal-impl.h

-    int64_t  ne00;
-    int64_t  ne01;
-    int64_t  ne02;
-    int64_t  ne03;
+    int32_t  ne00;
+    int32_t  ne01;
+    int32_t  ne02;
+    int32_t  ne03;


Don't know my way around the metal backend but this could lead to overflow; Do we need to protect against this?

The convention is to use 32-bit ints for the number of elements:

llama.cpp/ggml/src/ggml-metal/ggml-metal-impl.h

Lines 87 to 94 in b1846f1

// kernel argument structs

//

// - element counters (e.g. ne00) typically use int32_t to reduce register usage

// however, be careful from int overflows when using those in the kernel implementation

//

// - strides (e.g. nb00) use uint64_t

Overflows are handled by explicitly casting to 64-bit when we multiply 32-bit ints:

llama.cpp/ggml/src/ggml-metal/ggml-metal.metal

Lines 3200 to 3202 in 1e3d461

device float * dst_f32 = (device float *) dst + (uint64_t)im*args.ne0*args.ne1 + (uint64_t)r1*args.ne0;

It's possible that some casts are missing here and there, but I usually update these when I spot them.

ggerganov force-pushed the gg/ggml-top-k branch from 56ab2ca to 4d75c05 Compare November 18, 2025 16:06

github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Nov 18, 2025

ggerganov force-pushed the gg/ggml-top-k branch from 4d75c05 to 5d8ce1c Compare November 19, 2025 10:31

ggerganov force-pushed the gg/ggml-top-k branch from 5d8ce1c to 4dea5dd Compare November 20, 2025 08:29

ggerganov marked this pull request as ready for review November 20, 2025 09:37

ggerganov requested review from CISC and slaren as code owners November 20, 2025 09:37

CISC approved these changes Nov 20, 2025

View reviewed changes

This comment was marked as resolved.

Sign in to view

jeffbolznv mentioned this pull request Nov 21, 2025

vulkan: Implement top-k #17418

Merged

ggerganov added 6 commits November 24, 2025 19:17

ggml : add ggml_top_k

525040c

cont : add ggml_argsort_top_k

8f17d48

metal : add top_k support

48f1225

ggml : cleanup

5d413c3

tests : add virtual err() function for test_case

961dd4f

ggml : add comments

1e3d461

ggerganov force-pushed the gg/ggml-top-k branch from db4570a to 1e3d461 Compare November 24, 2025 17:17

loci-dev mentioned this pull request Nov 24, 2025

UPSTREAM PR #17365: ggml : add ggml_top_k auroralabs-loci/llama.cpp#310

Open

5 tasks

ORippler approved these changes Nov 25, 2025

View reviewed changes

ggerganov merged commit 583cb83 into master Nov 25, 2025
71 of 77 checks passed

ggerganov deleted the gg/ggml-top-k branch November 25, 2025 13:31


	// kernel argument structs
	//
	// - element counters (e.g. ne00) typically use int32_t to reduce register usage
	// however, be careful from int overflows when using those in the kernel implementation
	//
	// - strides (e.g. nb00) use uint64_t


	device float * dst_f32 = (device float ) dst + (uint64_t)imargs.ne0args.ne1 + (uint64_t)r1args.ne0;

ggml : add ggml_top_k #17365

ggml : add ggml_top_k #17365

Uh oh!

Conversation

ggerganov commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

am17an commented Nov 19, 2025

Uh oh!

ORippler commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Nov 19, 2025

Uh oh!

ORippler commented Nov 19, 2025

Uh oh!

am17an commented Nov 19, 2025

Uh oh!

slaren commented Nov 19, 2025

Uh oh!

jeffbolznv commented Nov 19, 2025

Uh oh!

ggerganov commented Nov 19, 2025

Uh oh!

jeffbolznv commented Nov 19, 2025

Uh oh!

CISC Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

am17an Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

ggerganov commented Nov 20, 2025

Uh oh!

jeffbolznv commented Nov 21, 2025

Uh oh!

ggerganov commented Nov 25, 2025

Uh oh!

ORippler Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ggerganov commented Nov 18, 2025 •

edited

Loading

ORippler commented Nov 19, 2025 •

edited

Loading