-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations #28693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
erifan
wants to merge
2
commits into
openjdk:master
Choose a base branch
from
erifan:JDK-8372980-umin-umax-intrinsic
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…tions The original implementation of UMIN/UMAX reductions in JDK-8346174 used incorrect identity values in the Java implementation and test code. Problem: -------- UMIN was using MAX_OR_INF (signed maximum value) as the identity: - Byte.MAX_VALUE (127) instead of max unsigned byte (255) - Short.MAX_VALUE (32767) instead of max unsigned short (65535) - Integer.MAX_VALUE instead of max unsigned int (-1) - Long.MAX_VALUE instead of max unsigned long (-1) UMAX was using MIN_OR_INF (signed minimum value) as the identity: - Byte.MIN_VALUE (-128) instead of 0 - Short.MIN_VALUE (-32768) instead of 0 - Integer.MIN_VALUE instead of 0 - Long.MIN_VALUE instead of 0 This caused incorrect result. For example: UMAX([42,42,...,42]) returned 128 instead of 42 Solution: --------- Use correct unsigned identity values: - UMIN: ($type$)-1 (maximum unsigned value) - UMAX: ($type$)0 (minimum unsigned value) Changes: -------- - X-Vector.java.template: Fixed identity values in reductionOperations - gen-template.sh: Fixed identity values for test code generation - templates/Unit-header.template: Updated copyright year to 2025 - Regenerated all Vector classes and test files Testing: -------- All types (byte/short/int/long) now return correct results in both interpreter mode (-Xint) and compiled mode.
…max reduction operations This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance. Changes: -------- 1. C2 mid-end: - Added UMinReductionVNode and UMaxReductionVNode 2. AArch64 Backend: - Added uminp/umaxp/sve_uminv/sve_umaxv instructions - Updated match rules for all vector sizes and element types - Both NEON and SVE implementation are supported 3. Test: - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java - Added assembly tests in aarch64-asmtest.py for new instructions - Added a JTReg test file VectorUMinMaxReductionTest.java Different configurations were tested on aarch64 and x86 machines, and all tests passed. Test results of JMH benchmarks from the panama-vector project: -------- On a Nvidia Grace machine with 128-bit SVE: ``` Benchmark Unit Before Error After Error Uplift Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01 Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82 23.65 46.79 Short128Vector.UMAXMaskedLanes ops/ms 308.90 351.78 15155.26 31.03 49.06 Short64Vector.UMAXLanes ops/ms 190.38 245.09 8022.46 14.30 42.14 Short64Vector.UMAXMaskedLanes ops/ms 195.54 36.15 7930.28 11.88 40.56 ``` On a Nvidia Grace machine with 128-bit NEON: ``` Benchmark Unit Before Error After Error Uplift Byte128Vector.UMAXLanes ops/ms 414.69 42.52 25257.61 25.91 60.91 Byte128Vector.UMAXMaskedLanes ops/ms 552.00 56.61 23063.14 304.45 41.78 Byte128Vector.UMINLanes ops/ms 634.98 849.04 28444.37 180.80 44.80 Byte128Vector.UMINMaskedLanes ops/ms 612.88 735.18 26127.07 27.99 42.63 Byte64Vector.UMAXLanes ops/ms 291.53 32.19 13893.62 28.09 47.66 Byte64Vector.UMAXMaskedLanes ops/ms 363.34 48.17 13290.59 12.53 36.58 Byte64Vector.UMINLanes ops/ms 368.70 433.60 15416.90 15.80 41.81 Byte64Vector.UMINMaskedLanes ops/ms 350.46 371.05 14524.29 121.63 41.44 Int128Vector.UMAXLanes ops/ms 177.67 201.38 10182.82 20.21 57.31 Int128Vector.UMAXMaskedLanes ops/ms 155.25 187.88 9194.13 393.35 59.22 Int64Vector.UMAXLanes ops/ms 93.93 115.02 5106.79 4.54 54.37 Int64Vector.UMAXMaskedLanes ops/ms 87.01 88.50 4405.87 8.06 50.63 Long128Vector.UMAXLanes ops/ms 80.32 98.50 3229.80 40.53 40.21 Long128Vector.UMAXMaskedLanes ops/ms 77.65 103.25 3161.50 4.45 40.72 Long64Vector.UMAXLanes ops/ms 47.72 65.38 46.41 50.38 0.97 Long64Vector.UMAXMaskedLanes ops/ms 45.26 47.46 45.13 47.23 1.00 Short128Vector.UMAXLanes ops/ms 316.09 429.34 14748.07 14.78 46.66 Short128Vector.UMAXMaskedLanes ops/ms 307.70 342.54 14359.11 44.99 46.67 Short64Vector.UMAXLanes ops/ms 187.67 253.01 8180.63 178.65 43.59 Short64Vector.UMAXMaskedLanes ops/ms 191.10 33.51 7949.19 108.65 41.60 ```
|
👋 Welcome back erfang! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
core-libs
core-libs-dev@openjdk.org
hotspot-compiler
hotspot-compiler-dev@openjdk.org
rfr
Pull request is ready for review
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance.
Changes:
C2 mid-end:
AArch64 Backend:
Test:
Different configurations were tested on aarch64 and x86 machines, and all tests passed.
Test results of JMH benchmarks from the panama-vector project:
On a Nvidia Grace machine with 128-bit SVE:
On a Nvidia Grace machine with 128-bit NEON:
Progress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28693/head:pull/28693$ git checkout pull/28693Update a local copy of the PR:
$ git checkout pull/28693$ git pull https://git.openjdk.org/jdk.git pull/28693/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 28693View PR using the GUI difftool:
$ git pr show -t 28693Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28693.diff
Using Webrev
Link to Webrev Comment