Skip to content

Conversation

@oleksandr-pavlyk
Copy link
Contributor

This PR contributes to gh-1890, by streamlining code for performing copy to contiguous array.

  1. Save common subexpressions to variables

  2. Sub-group size type changed to uint16 (from uint32)

  3. sg.get_local_range() replaced with sg.get_max_local_range()

    This is safe to do since work-group size is chosen to be
    a multiple of sub-group size for all possile choices
    of sub-group size (1, 8, 16, 32, 64)

  4. Simplified computation of base value in generic branch for
    complex types, or when sg_load is disabled, to avoid a
    division (and left a comment)


  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?
  • If this PR is a work in progress, are you opening the PR as a draft?

1. Save common subexpressions to variables
2. Sub-group size type changed to uint16 (from uint32)
3. sg.get_local_range() replaced with sg.get_max_local_range()

   This is safe to do since work-group size is chosen to be
   a multiple of sub-group size for all possile choices
   of sub-group size (1, 8, 16, 32, 64)

4. Simplified computation of base value in generic branch for
   complex types, or when sg_load is disabled, to avoid a
   division (and left a comment)
@oleksandr-pavlyk oleksandr-pavlyk force-pushed the contribution-to-fix-gh-1887 branch from b8cf169 to 04fd35c Compare November 13, 2024 15:47
@github-actions
Copy link

github-actions bot commented Nov 13, 2024

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_201 ran successfully.
Passed: 895
Failed: 0
Skipped: 119

Also reordered template parameters vec_sz, n_vecs for consistency
with the wide code-base.
Copy link
Collaborator

@ndgrigorian ndgrigorian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@oleksandr-pavlyk oleksandr-pavlyk merged commit 2ea5dfd into fix-gh-1887 Nov 13, 2024
16 of 24 checks passed
@oleksandr-pavlyk oleksandr-pavlyk deleted the contribution-to-fix-gh-1887 branch November 13, 2024 17:54
@github-actions
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_202 ran successfully.
Passed: 894
Failed: 1
Skipped: 119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants