8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations #28693

erifan · 2025-12-08T03:29:03Z

This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance.

Changes:

C2 mid-end:
- Added UMinReductionVNode and UMaxReductionVNode
AArch64 Backend:
- Added uminp/umaxp/sve_uminv/sve_umaxv instructions
- Updated match rules for all vector sizes and element types
- Both NEON and SVE implementation are supported
Test:
- Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java
- Added assembly tests in aarch64-asmtest.py for new instructions
- Added a JTReg test file VectorUMinMaxReductionTest.java

Different configurations were tested on aarch64 and x86 machines, and all tests passed.

Test results of JMH benchmarks from the panama-vector project:

On a Nvidia Grace machine with 128-bit SVE:

Benchmark                       Unit    Before  Error   After           Error   Uplift
Byte128Vector.UMAXLanes         ops/ms  411.60  42.18   25226.51        33.92   61.29
Byte128Vector.UMAXMaskedLanes   ops/ms  558.56  85.12   25182.90        28.74   45.09
Byte128Vector.UMINLanes         ops/ms  645.58  780.76  28396.29        103.11  43.99
Byte128Vector.UMINMaskedLanes   ops/ms  621.09  718.27  26122.62        42.68   42.06
Byte64Vector.UMAXLanes          ops/ms  296.33  34.44   14357.74        15.95   48.45
Byte64Vector.UMAXMaskedLanes    ops/ms  376.54  44.01   14269.24        21.41   37.90
Byte64Vector.UMINLanes          ops/ms  373.45  426.51  15425.36        66.20   41.31
Byte64Vector.UMINMaskedLanes    ops/ms  353.32  346.87  14201.37        13.79   40.19
Int128Vector.UMAXLanes          ops/ms  174.79  192.51  9906.07         286.93  56.67
Int128Vector.UMAXMaskedLanes    ops/ms  157.23  206.68  10246.77        11.44   65.17
Int64Vector.UMAXLanes           ops/ms  95.30   126.49  4719.30         98.57   49.52
Int64Vector.UMAXMaskedLanes     ops/ms  88.19   87.44   4693.18         19.76   53.22
Long128Vector.UMAXLanes         ops/ms  80.62   97.82   5064.01         35.52   62.82
Long128Vector.UMAXMaskedLanes   ops/ms  78.15   102.91  5028.24         8.74    64.34
Long64Vector.UMAXLanes          ops/ms  47.56   62.01   46.76           52.28   0.98
Long64Vector.UMAXMaskedLanes    ops/ms  45.44   46.76   45.79           42.91   1.01
Short128Vector.UMAXLanes        ops/ms  316.65  410.30  14814.82        23.65   46.79
Short128Vector.UMAXMaskedLanes  ops/ms  308.90  351.78  15155.26        31.03   49.06
Short64Vector.UMAXLanes         ops/ms  190.38  245.09  8022.46         14.30   42.14
Short64Vector.UMAXMaskedLanes   ops/ms  195.54  36.15   7930.28         11.88   40.56

On a Nvidia Grace machine with 128-bit NEON:

Benchmark                       Unit    Before  Error   After           Error   Uplift
Byte128Vector.UMAXLanes         ops/ms  414.69  42.52   25257.61        25.91   60.91
Byte128Vector.UMAXMaskedLanes   ops/ms  552.00  56.61   23063.14        304.45  41.78
Byte128Vector.UMINLanes         ops/ms  634.98  849.04  28444.37        180.80  44.80
Byte128Vector.UMINMaskedLanes   ops/ms  612.88  735.18  26127.07        27.99   42.63
Byte64Vector.UMAXLanes          ops/ms  291.53  32.19   13893.62        28.09   47.66
Byte64Vector.UMAXMaskedLanes    ops/ms  363.34  48.17   13290.59        12.53   36.58
Byte64Vector.UMINLanes          ops/ms  368.70  433.60  15416.90        15.80   41.81
Byte64Vector.UMINMaskedLanes    ops/ms  350.46  371.05  14524.29        121.63  41.44
Int128Vector.UMAXLanes          ops/ms  177.67  201.38  10182.82        20.21   57.31
Int128Vector.UMAXMaskedLanes    ops/ms  155.25  187.88  9194.13         393.35  59.22
Int64Vector.UMAXLanes           ops/ms  93.93   115.02  5106.79         4.54    54.37
Int64Vector.UMAXMaskedLanes     ops/ms  87.01   88.50   4405.87         8.06    50.63
Long128Vector.UMAXLanes         ops/ms  80.32   98.50   3229.80         40.53   40.21
Long128Vector.UMAXMaskedLanes   ops/ms  77.65   103.25  3161.50         4.45    40.72
Long64Vector.UMAXLanes          ops/ms  47.72   65.38   46.41           50.38   0.97
Long64Vector.UMAXMaskedLanes    ops/ms  45.26   47.46   45.13           47.23   1.00
Short128Vector.UMAXLanes        ops/ms  316.09  429.34  14748.07        14.78   46.66
Short128Vector.UMAXMaskedLanes  ops/ms  307.70  342.54  14359.11        44.99   46.67
Short64Vector.UMAXLanes         ops/ms  187.67  253.01  8180.63         178.65  43.59
Short64Vector.UMAXMaskedLanes   ops/ms  191.10  33.51   7949.19         108.65  41.60

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28693/head:pull/28693
$ git checkout pull/28693

Update a local copy of the PR:
$ git checkout pull/28693
$ git pull https://git.openjdk.org/jdk.git pull/28693/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28693

View PR using the GUI difftool:
$ git pr show -t 28693

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28693.diff

Using Webrev

Link to Webrev Comment

…tions The original implementation of UMIN/UMAX reductions in JDK-8346174 used incorrect identity values in the Java implementation and test code. Problem: -------- UMIN was using MAX_OR_INF (signed maximum value) as the identity: - Byte.MAX_VALUE (127) instead of max unsigned byte (255) - Short.MAX_VALUE (32767) instead of max unsigned short (65535) - Integer.MAX_VALUE instead of max unsigned int (-1) - Long.MAX_VALUE instead of max unsigned long (-1) UMAX was using MIN_OR_INF (signed minimum value) as the identity: - Byte.MIN_VALUE (-128) instead of 0 - Short.MIN_VALUE (-32768) instead of 0 - Integer.MIN_VALUE instead of 0 - Long.MIN_VALUE instead of 0 This caused incorrect result. For example: UMAX([42,42,...,42]) returned 128 instead of 42 Solution: --------- Use correct unsigned identity values: - UMIN: ($type$)-1 (maximum unsigned value) - UMAX: ($type$)0 (minimum unsigned value) Changes: -------- - X-Vector.java.template: Fixed identity values in reductionOperations - gen-template.sh: Fixed identity values for test code generation - templates/Unit-header.template: Updated copyright year to 2025 - Regenerated all Vector classes and test files Testing: -------- All types (byte/short/int/long) now return correct results in both interpreter mode (-Xint) and compiled mode.

…max reduction operations This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance. Changes: -------- 1. C2 mid-end: - Added UMinReductionVNode and UMaxReductionVNode 2. AArch64 Backend: - Added uminp/umaxp/sve_uminv/sve_umaxv instructions - Updated match rules for all vector sizes and element types - Both NEON and SVE implementation are supported 3. Test: - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java - Added assembly tests in aarch64-asmtest.py for new instructions - Added a JTReg test file VectorUMinMaxReductionTest.java Different configurations were tested on aarch64 and x86 machines, and all tests passed. Test results of JMH benchmarks from the panama-vector project: -------- On a Nvidia Grace machine with 128-bit SVE: ``` Benchmark Unit Before Error After Error Uplift Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01 Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82 23.65 46.79 Short128Vector.UMAXMaskedLanes ops/ms 308.90 351.78 15155.26 31.03 49.06 Short64Vector.UMAXLanes ops/ms 190.38 245.09 8022.46 14.30 42.14 Short64Vector.UMAXMaskedLanes ops/ms 195.54 36.15 7930.28 11.88 40.56 ``` On a Nvidia Grace machine with 128-bit NEON: ``` Benchmark Unit Before Error After Error Uplift Byte128Vector.UMAXLanes ops/ms 414.69 42.52 25257.61 25.91 60.91 Byte128Vector.UMAXMaskedLanes ops/ms 552.00 56.61 23063.14 304.45 41.78 Byte128Vector.UMINLanes ops/ms 634.98 849.04 28444.37 180.80 44.80 Byte128Vector.UMINMaskedLanes ops/ms 612.88 735.18 26127.07 27.99 42.63 Byte64Vector.UMAXLanes ops/ms 291.53 32.19 13893.62 28.09 47.66 Byte64Vector.UMAXMaskedLanes ops/ms 363.34 48.17 13290.59 12.53 36.58 Byte64Vector.UMINLanes ops/ms 368.70 433.60 15416.90 15.80 41.81 Byte64Vector.UMINMaskedLanes ops/ms 350.46 371.05 14524.29 121.63 41.44 Int128Vector.UMAXLanes ops/ms 177.67 201.38 10182.82 20.21 57.31 Int128Vector.UMAXMaskedLanes ops/ms 155.25 187.88 9194.13 393.35 59.22 Int64Vector.UMAXLanes ops/ms 93.93 115.02 5106.79 4.54 54.37 Int64Vector.UMAXMaskedLanes ops/ms 87.01 88.50 4405.87 8.06 50.63 Long128Vector.UMAXLanes ops/ms 80.32 98.50 3229.80 40.53 40.21 Long128Vector.UMAXMaskedLanes ops/ms 77.65 103.25 3161.50 4.45 40.72 Long64Vector.UMAXLanes ops/ms 47.72 65.38 46.41 50.38 0.97 Long64Vector.UMAXMaskedLanes ops/ms 45.26 47.46 45.13 47.23 1.00 Short128Vector.UMAXLanes ops/ms 316.09 429.34 14748.07 14.78 46.66 Short128Vector.UMAXMaskedLanes ops/ms 307.70 342.54 14359.11 44.99 46.67 Short64Vector.UMAXLanes ops/ms 187.67 253.01 8180.63 178.65 43.59 Short64Vector.UMAXMaskedLanes ops/ms 191.10 33.51 7949.19 108.65 41.60 ```

bridgekeeper · 2025-12-08T03:30:48Z

👋 Welcome back erfang! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-12-08T03:31:05Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-12-08T03:32:11Z

@erifan The following labels will be automatically applied to this pull request:

core-libs
hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-12-08T03:35:49Z

Webrevs

00: Full (04216bb3)

erifan added 2 commits December 8, 2025 03:20

openjdk bot added hotspot-compiler hotspot-compiler-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Dec 8, 2025

openjdk bot added the rfr Pull request is ready for review label Dec 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations #28693

8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations #28693

Uh oh!

erifan commented Dec 8, 2025 •

edited by openjdk bot

Loading

Uh oh!

bridgekeeper bot commented Dec 8, 2025

Uh oh!

openjdk bot commented Dec 8, 2025

Uh oh!

openjdk bot commented Dec 8, 2025

Uh oh!

mlbridge bot commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations #28693

Are you sure you want to change the base?

8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations #28693

Uh oh!

Conversation

erifan commented Dec 8, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes:

Test results of JMH benchmarks from the panama-vector project:

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Dec 8, 2025

Uh oh!

openjdk bot commented Dec 8, 2025

Uh oh!

openjdk bot commented Dec 8, 2025

Uh oh!

mlbridge bot commented Dec 8, 2025

Webrevs

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

erifan commented Dec 8, 2025 •

edited by openjdk bot

Loading