cub, c.parallel: {lower,upper}_bound #7007

griwes · 2025-12-18T06:13:56Z

Description

This PR adds {lower,upper}_bound device algorithms to both CUB and c.parallel.

In the case of CUB, the implementation is very straightforward and directly follows current implementation in Thrust (which I have cleaned up as a drive by change).

In the case of c.parallel, because of how CUB's for_each passes in kernel arguments, repeating the slight madness of the current for operator construction for the for_each algorithm itself felt beyond annoying, but I needed a kernel pointer; so instead of reusing the kernels available in CUB, I adapted the static-block-size for_each kernel to accept all the necessary arguments as separate kernel arguments, then construct the for operator expected by CUB inside the kernel, and finally invoke the CUB for_each agent with that operator. So, it's a manually constructed kernel, but it reuses both the agent code and binary search helper types from CUB.

Resolves #6695

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

…arch

shwina · 2025-12-18T17:58:23Z

c/parallel/src/binary_search.cu

+  const unsigned int thread_count = 256;
+  const size_t items_per_block    = 512;


Are these essentially hardcoded somewhere in the CUB as well?

Yes. There's no tunings for for, and this just follows that.

I'm planning to start working on a warp level binary search algorithm in the new year, and then build a device wide one on top of that, as a replacement of the current approach - we'll do actual tunings then.

…arch

cub/cub/device/device_merge_sort.cuh

cub/cub/detail/binary_search_helpers.cuh

cub/cub/device/device_find.cuh

docs/cub/Doxyfile

shwina

The C side looks good to me. Thanks!

c/parallel/src/binary_search.cu

…arch

github-actions · 2025-12-20T08:13:53Z

😬 CI Workflow Results

🟥 Finished in 9h 20m: Pass: 98%/143 | Total: 7d 15h | Max: 5h 51m | Hits: 71%/186811

See results here.

griwes added 3 commits December 4, 2025 19:38

cub: add DeviceFind::{Lower,Upper}Bound.

2391c39

c.parallel: add binary_search with {lower,upper}_bound modes.

83e8b0d

Merge remote-tracking branch 'origin/main' into feature/cub-binary-se…

f535044

…arch

griwes requested review from a team as code owners December 18, 2025 06:13

github-project-automation bot added this to CCCL Dec 18, 2025

griwes requested review from NaderAlAwar and alliepiper December 18, 2025 06:13

github-project-automation bot moved this to Todo in CCCL Dec 18, 2025

griwes requested a review from elstehle December 18, 2025 06:13

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Dec 18, 2025

...whoops.

a192645

griwes mentioned this pull request Dec 18, 2025

Provide lower_bound in cuda.compute #6688

Open

This comment has been minimized.

Sign in to view

griwes added 2 commits December 18, 2025 02:02

...whoops, fix part 2.

1e6c658

Maybe fix the MSVC warning?

d035343

griwes mentioned this pull request Dec 18, 2025

c.parallel: refactor storage_t handling in jit templates #7010

Open

Back out the changes to iterator jit templates.

3112932

This comment has been minimized.

Sign in to view

shwina reviewed Dec 18, 2025

View reviewed changes

griwes added 4 commits December 18, 2025 22:22

Fix C++17 aggregate init.

abe521d

...make the large problem size tests actually use the large size...

eeb6478

Merge remote-tracking branch 'origin/main' into feature/cub-binary-se…

993cd94

…arch

Widen the type of the size argument in tests.

bd300a3

bernhardmgruber reviewed Dec 19, 2025

View reviewed changes

cub/cub/device/device_merge_sort.cuh Outdated Show resolved Hide resolved

cub/cub/detail/binary_search_helpers.cuh Outdated Show resolved Hide resolved

cub/cub/device/device_find.cuh Show resolved Hide resolved

docs/cub/Doxyfile Outdated Show resolved Hide resolved

shwina approved these changes Dec 19, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

oleksandr-pavlyk reviewed Dec 19, 2025

View reviewed changes

c/parallel/src/binary_search.cu Outdated Show resolved Hide resolved

griwes added 2 commits December 19, 2025 14:49

Address review comments.

39e927e

Merge remote-tracking branch 'origin/main' into feature/cub-binary-se…

e831978

…arch

griwes enabled auto-merge (squash) December 19, 2025 22:51

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cub, c.parallel: {lower,upper}_bound #7007

cub, c.parallel: {lower,upper}_bound #7007

Uh oh!

griwes commented Dec 18, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

shwina Dec 18, 2025

Uh oh!

griwes Dec 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shwina left a comment

Uh oh!

This comment has been minimized.

Uh oh!

This comment has been minimized.

github-actions bot commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		const unsigned int thread_count = 256;
		const size_t items_per_block = 512;

cub, c.parallel: {lower,upper}_bound #7007

Are you sure you want to change the base?

cub, c.parallel: {lower,upper}_bound #7007

Uh oh!

Conversation

griwes commented Dec 18, 2025

Description

Checklist

Uh oh!

This comment has been minimized.

This comment has been minimized.

shwina Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

griwes Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shwina left a comment

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

Uh oh!

This comment has been minimized.

github-actions bot commented Dec 20, 2025

😬 CI Workflow Results

🟥 Finished in 9h 20m: Pass: 98%/143 | Total: 7d 15h | Max: 5h 51m | Hits: 71%/186811

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants