-
Notifications
You must be signed in to change notification settings - Fork 308
cub, c.parallel: {lower,upper}_bound #7007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| const unsigned int thread_count = 256; | ||
| const size_t items_per_block = 512; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these essentially hardcoded somewhere in the CUB as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. There's no tunings for for, and this just follows that.
I'm planning to start working on a warp level binary search algorithm in the new year, and then build a device wide one on top of that, as a replacement of the current approach - we'll do actual tunings then.
shwina
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C side looks good to me. Thanks!
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
😬 CI Workflow Results🟥 Finished in 9h 20m: Pass: 98%/143 | Total: 7d 15h | Max: 5h 51m | Hits: 71%/186811See results here. |
Description
This PR adds {lower,upper}_bound device algorithms to both CUB and c.parallel.
In the case of CUB, the implementation is very straightforward and directly follows current implementation in Thrust (which I have cleaned up as a drive by change).
In the case of c.parallel, because of how CUB's
for_eachpasses in kernel arguments, repeating the slight madness of the currentforoperator construction for the for_each algorithm itself felt beyond annoying, but I needed a kernel pointer; so instead of reusing the kernels available in CUB, I adapted the static-block-sizefor_eachkernel to accept all the necessary arguments as separate kernel arguments, then construct the for operator expected by CUB inside the kernel, and finally invoke the CUBfor_eachagent with that operator. So, it's a manually constructed kernel, but it reuses both the agent code and binary search helper types from CUB.Resolves #6695
Checklist