Skip to content

Conversation

@Bhavana-Kilambi
Copy link
Contributor

@Bhavana-Kilambi Bhavana-Kilambi commented Dec 5, 2025

This patch adds an SVE implementation of primitive array sorting (Arrays.sort()) on AArch64 systems that support SVE. On non-SVE machines, we fall back to the existing Java implementation.

For smaller arrays (length <= 64), we use insertion sort; for larger arrays we use an SVE-vectorized quicksort partitioner followed by an odd-even transposition cleanup pass.

The SVE path is enabled by default for int type. For float type, it is available through the experimental flag :

-XX:+UnlockExperimentalVMOptions -XX:+UseSVELibSimdSortForFP
Without this flag being enabled, the default Java implementation would be executed for floats (the flag is disabled by default).

Float is gated due to observed regressions on some small/medium sizes. On larger arrays, the SVE float path shows upto 1.47x speedup on Neoverse V2 and 2.12x on Neoverse V1.

Following are the performance numbers for ArraysSort JMH benchmark -

Case A: Ratio between the scores of master branch and UseSVELibSimdSortForFP flag disabled (which is the default).
Case B: Ratio between the scores of master branch and UseSVELibSimdSortForFP flag enabled (the int numbers will be the same but this now enables SVE vectorized sorting for floats).
We would want the ratios to be >= 1 to be at par or better than the default Java implementation (master branch).

On Neoverse V1:

Benchmark                       (size)   Mode    Cnt    A       B
ArraysSort.floatParallelSort    10       avgt    3      0.98    0.98
ArraysSort.floatParallelSort    25       avgt    3      1.01    0.83
ArraysSort.floatParallelSort    50       avgt    3      0.99    0.55
ArraysSort.floatParallelSort    75       avgt    3      0.99    0.66
ArraysSort.floatParallelSort    100      avgt    3      0.98    0.66
ArraysSort.floatParallelSort    1000     avgt    3      1.00    0.84
ArraysSort.floatParallelSort    10000    avgt    3      1.03    1.52
ArraysSort.floatParallelSort    100000   avgt    3      1.03    1.46
ArraysSort.floatParallelSort    1000000  avgt    3      0.98    1.81
ArraysSort.floatSort            10       avgt    3      1.00    0.98
ArraysSort.floatSort            25       avgt    3      1.00    0.81
ArraysSort.floatSort            50       avgt    3      0.99    0.56
ArraysSort.floatSort            75       avgt    3      0.99    0.65
ArraysSort.floatSort            100      avgt    3      0.98    0.70
ArraysSort.floatSort            1000     avgt    3      0.99    0.84
ArraysSort.floatSort            10000    avgt    3      0.99    1.72
ArraysSort.floatSort            100000   avgt    3      1.00    1.94
ArraysSort.floatSort            1000000  avgt    3      1.00    2.13
ArraysSort.intParallelSort      10       avgt    3      1.08    1.08
ArraysSort.intParallelSort      25       avgt    3      1.04    1.05
ArraysSort.intParallelSort      50       avgt    3      1.29    1.30
ArraysSort.intParallelSort      75       avgt    3      1.16    1.16
ArraysSort.intParallelSort      100      avgt    3      1.07    1.07
ArraysSort.intParallelSort      1000     avgt    3      1.13    1.13
ArraysSort.intParallelSort      10000    avgt    3      1.49    1.38
ArraysSort.intParallelSort      100000   avgt    3      1.64    1.62
ArraysSort.intParallelSort      1000000  avgt    3      2.26    2.27
ArraysSort.intSort              10       avgt    3      1.08    1.08
ArraysSort.intSort              25       avgt    3      1.02    1.02
ArraysSort.intSort              50       avgt    3      1.25    1.25
ArraysSort.intSort              75       avgt    3      1.16    1.20
ArraysSort.intSort              100      avgt    3      1.07    1.07
ArraysSort.intSort              1000     avgt    3      1.12    1.13
ArraysSort.intSort              10000    avgt    3      1.94    1.95
ArraysSort.intSort              100000   avgt    3      1.86    1.86
ArraysSort.intSort              1000000  avgt    3      2.09    2.09

On Neoverse V2:

Benchmark                       (size)   Mode    Cnt    A       B
ArraysSort.floatParallelSort    10       avgt    3      1.02    1.02
ArraysSort.floatParallelSort    25       avgt    3      0.97    0.71
ArraysSort.floatParallelSort    50       avgt    3      0.94    0.65
ArraysSort.floatParallelSort    75       avgt    3      0.96    0.82
ArraysSort.floatParallelSort    100      avgt    3      0.95    0.84
ArraysSort.floatParallelSort    1000     avgt    3      1.01    0.94
ArraysSort.floatParallelSort    10000    avgt    3      1.01    1.25
ArraysSort.floatParallelSort    100000   avgt    3      1.01    1.09
ArraysSort.floatParallelSort    1000000  avgt    3      1.00    1.10
ArraysSort.floatSort            10       avgt    3      1.02    1.00
ArraysSort.floatSort            25       avgt    3      0.99    0.76
ArraysSort.floatSort            50       avgt    3      0.97    0.66
ArraysSort.floatSort            75       avgt    3      1.01    0.83
ArraysSort.floatSort            100      avgt    3      1.00    0.85
ArraysSort.floatSort            1000     avgt    3      0.99    0.93
ArraysSort.floatSort            10000    avgt    3      1.00    1.28
ArraysSort.floatSort            100000   avgt    3      1.00    1.37
ArraysSort.floatSort            1000000  avgt    3      1.00    1.48
ArraysSort.intParallelSort      10       avgt    3      1.05    1.05
ArraysSort.intParallelSort      25       avgt    3      0.99    0.84
ArraysSort.intParallelSort      50       avgt    3      1.03    1.14
ArraysSort.intParallelSort      75       avgt    3      0.91    0.99
ArraysSort.intParallelSort      100      avgt    3      0.98    0.96
ArraysSort.intParallelSort      1000     avgt    3      1.32    1.30
ArraysSort.intParallelSort      10000    avgt    3      1.40    1.40
ArraysSort.intParallelSort      100000   avgt    3      1.00    1.04
ArraysSort.intParallelSort      1000000  avgt    3      1.15    1.14
ArraysSort.intSort              10       avgt    3      1.05    1.05
ArraysSort.intSort              25       avgt    3      1.03    1.03
ArraysSort.intSort              50       avgt    3      1.08    1.14
ArraysSort.intSort              75       avgt    3      0.88    0.98
ArraysSort.intSort              100      avgt    3      1.01    0.99
ArraysSort.intSort              1000     avgt    3      1.3     1.32
ArraysSort.intSort              10000    avgt    3      1.43    1.43
ArraysSort.intSort              100000   avgt    3      1.30    1.30
ArraysSort.intSort              1000000  avgt    3      1.37    1.37

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8371711: AArch64: SVE intrinsics for Arrays.sort methods (int, float) (Enhancement - P4)

Contributors

  • Yanqin Wei <yanqin.wei@arm.com>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28675/head:pull/28675
$ git checkout pull/28675

Update a local copy of the PR:
$ git checkout pull/28675
$ git pull https://git.openjdk.org/jdk.git pull/28675/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28675

View PR using the GUI difftool:
$ git pr show -t 28675

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28675.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 5, 2025

👋 Welcome back bkilambi! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 5, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added build build-dev@openjdk.org hotspot hotspot-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Dec 5, 2025
@openjdk
Copy link

openjdk bot commented Dec 5, 2025

@Bhavana-Kilambi The following labels will be automatically applied to this pull request:

  • build
  • core-libs
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@Bhavana-Kilambi
Copy link
Contributor Author

/contributor add Yanqin Wei yanqin.wei@arm.com

@openjdk
Copy link

openjdk bot commented Dec 5, 2025

@Bhavana-Kilambi
Contributor Yanqin Wei <yanqin.wei@arm.com> successfully added.

Separated the libsimdsort implementation for aarch64 and x86 in two
different folders under src/java.base/linux/native/libsimdsort which
might help in better future maintenance of AArch64 and x86
implementations.

New layout -

src/java.base/linux/native/libsimdsort/aarch64/…
src/java.base/linux/native/libsimdsort/x86/…

Moved the following files into the libsimdsort/x86 folder -

src/java.base/linux/native/libsimdsort/x86/avx2-32bit-qsort.hpp
src/java.base/linux/native/libsimdsort/x86/avx2-emu-funcs.hpp
src/java.base/linux/native/libsimdsort/x86/avx2-linux-qsort.cpp
src/java.base/linux/native/libsimdsort/x86/avx512-32bit-qsort.hpp
src/java.base/linux/native/libsimdsort/x86/avx512-64bit-qsort.hpp
src/java.base/linux/native/libsimdsort/x86/avx512-linux-qsort.cpp
src/java.base/linux/native/libsimdsort/x86/simdsort-support.hpp
src/java.base/linux/native/libsimdsort/x86/xss-common-includes.h
src/java.base/linux/native/libsimdsort/x86/xss-common-qsort.h
src/java.base/linux/native/libsimdsort/x86/xss-network-qsort.hpp
src/java.base/linux/native/libsimdsort/x86/xss-optimal-networks.hpp
src/java.base/linux/native/libsimdsort/x86/xss-pivot-selection.hpp

Copied the following files from libsimdsort/x86 to libsimdsort/aarch64
folder -

x86/xss-pivot-selection.hpp -> aarch64/pivot-selection.hpp
x86/simdsort-support.hpp -> aarch64/simdsort-support.hpp
x86/xss-common-qsort.h -> aarch64/sve-common-qsort.hpp
x86/avx2-linux-qsort.cpp -> aarch64/sve-linux-qsort.cpp
x86/avx2-32bit-qsort.hpp -> aarch64/sve-qsort.hpp
This patch adds an SVE implementation of primitive array sorting
(Arrays.sort()) on AArch64 systems that support SVE. On non-SVE machines,
we fall back to the existing Java implementation.

For smaller arrays (length <= 64), we use insertion sort;
for larger arrays we use an SVE-vectorized quicksort partitioner
followed by an odd-even transposition cleanup pass.

The SVE path is enabled by default for int type.
For float type, it is available through the experimental flag :

-XX:+UnlockExperimentalVMOptions -XX:+UseSVELibSimdSortForFP

Without this flag being enabled, the default Java implementation would
be executed for floats (the flag is disabled by default).

Float is gated due to observed regressions on some small/medium sizes.
On larger arrays, the SVE float path shows upto 1.47x speedup on
Neoverse V2 and 2.12x on Neoverse V1.

Following are the performance numbers for ArraysSort JMH benchmark -

Case A: Ratio between the scores of master branch and
UseSVELibSimdSortForFP flag disabled (which is the default).
Case B: Ratio between the scores of master branch and
UseSVELibSimdSortForFP flag enabled (the int numbers will be the same
but this now enables SVE vectorized sorting for floats).

We would want the ratios to be >= 1 to be at par or better than the
default Java implementation (master branch).

On Neoverse V1:

Benchmark                       (size)   Mode    Cnt    A       B
ArraysSort.floatParallelSort    10       avgt    3      0.98    0.98
ArraysSort.floatParallelSort    25       avgt    3      1.01    0.83
ArraysSort.floatParallelSort    50       avgt    3      0.99    0.55
ArraysSort.floatParallelSort    75       avgt    3      0.99    0.66
ArraysSort.floatParallelSort    100      avgt    3      0.98    0.66
ArraysSort.floatParallelSort    1000     avgt    3      1.00    0.84
ArraysSort.floatParallelSort    10000    avgt    3      1.03    1.52
ArraysSort.floatParallelSort    100000   avgt    3      1.03    1.46
ArraysSort.floatParallelSort    1000000  avgt    3      0.98    1.81
ArraysSort.floatSort            10       avgt    3      1.00    0.98
ArraysSort.floatSort            25       avgt    3      1.00    0.81
ArraysSort.floatSort            50       avgt    3      0.99    0.56
ArraysSort.floatSort            75       avgt    3      0.99    0.65
ArraysSort.floatSort            100      avgt    3      0.98    0.70
ArraysSort.floatSort            1000     avgt    3      0.99    0.84
ArraysSort.floatSort            10000    avgt    3      0.99    1.72
ArraysSort.floatSort            100000   avgt    3      1.00    1.94
ArraysSort.floatSort            1000000  avgt    3      1.00    2.13
ArraysSort.intParallelSort      10       avgt    3      1.08    1.08
ArraysSort.intParallelSort      25       avgt    3      1.04    1.05
ArraysSort.intParallelSort      50       avgt    3      1.29    1.30
ArraysSort.intParallelSort      75       avgt    3      1.16    1.16
ArraysSort.intParallelSort      100      avgt    3      1.07    1.07
ArraysSort.intParallelSort      1000     avgt    3      1.13    1.13
ArraysSort.intParallelSort      10000    avgt    3      1.49    1.38
ArraysSort.intParallelSort      100000   avgt    3      1.64    1.62
ArraysSort.intParallelSort      1000000  avgt    3      2.26    2.27
ArraysSort.intSort              10       avgt    3      1.08    1.08
ArraysSort.intSort              25       avgt    3      1.02    1.02
ArraysSort.intSort              50       avgt    3      1.25    1.25
ArraysSort.intSort              75       avgt    3      1.16    1.20
ArraysSort.intSort              100      avgt    3      1.07    1.07
ArraysSort.intSort              1000     avgt    3      1.12    1.13
ArraysSort.intSort              10000    avgt    3      1.94    1.95
ArraysSort.intSort              100000   avgt    3      1.86    1.86
ArraysSort.intSort              1000000  avgt    3      2.09    2.09
On Neoverse V2:

Benchmark                       (size)   Mode    Cnt    A       B
ArraysSort.floatParallelSort    10       avgt    3      1.02    1.02
ArraysSort.floatParallelSort    25       avgt    3      0.97    0.71
ArraysSort.floatParallelSort    50       avgt    3      0.94    0.65
ArraysSort.floatParallelSort    75       avgt    3      0.96    0.82
ArraysSort.floatParallelSort    100      avgt    3      0.95    0.84
ArraysSort.floatParallelSort    1000     avgt    3      1.01    0.94
ArraysSort.floatParallelSort    10000    avgt    3      1.01    1.25
ArraysSort.floatParallelSort    100000   avgt    3      1.01    1.09
ArraysSort.floatParallelSort    1000000  avgt    3      1.00    1.10
ArraysSort.floatSort            10       avgt    3      1.02    1.00
ArraysSort.floatSort            25       avgt    3      0.99    0.76
ArraysSort.floatSort            50       avgt    3      0.97    0.66
ArraysSort.floatSort            75       avgt    3      1.01    0.83
ArraysSort.floatSort            100      avgt    3      1.00    0.85
ArraysSort.floatSort            1000     avgt    3      0.99    0.93
ArraysSort.floatSort            10000    avgt    3      1.00    1.28
ArraysSort.floatSort            100000   avgt    3      1.00    1.37
ArraysSort.floatSort            1000000  avgt    3      1.00    1.48
ArraysSort.intParallelSort      10       avgt    3      1.05    1.05
ArraysSort.intParallelSort      25       avgt    3      0.99    0.84
ArraysSort.intParallelSort      50       avgt    3      1.03    1.14
ArraysSort.intParallelSort      75       avgt    3      0.91    0.99
ArraysSort.intParallelSort      100      avgt    3      0.98    0.96
ArraysSort.intParallelSort      1000     avgt    3      1.32    1.30
ArraysSort.intParallelSort      10000    avgt    3      1.40    1.40
ArraysSort.intParallelSort      100000   avgt    3      1.00    1.04
ArraysSort.intParallelSort      1000000  avgt    3      1.15    1.14
ArraysSort.intSort              10       avgt    3      1.05    1.05
ArraysSort.intSort              25       avgt    3      1.03    1.03
ArraysSort.intSort              50       avgt    3      1.08    1.14
ArraysSort.intSort              75       avgt    3      0.88    0.98
ArraysSort.intSort              100      avgt    3      1.01    0.99
ArraysSort.intSort              1000     avgt    3      1.3     1.32
ArraysSort.intSort              10000    avgt    3      1.43    1.43
ArraysSort.intSort              100000   avgt    3      1.30    1.30
ArraysSort.intSort              1000000  avgt    3      1.37    1.37
@Bhavana-Kilambi Bhavana-Kilambi marked this pull request as ready for review December 5, 2025 14:40
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 5, 2025
@mlbridge
Copy link

mlbridge bot commented Dec 5, 2025

Webrevs

Comment on lines +209 to +225
ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, aarch64)+$(INCLUDE_COMPILER2)+$(filter $(TOOLCHAIN_TYPE), gcc), true+true+true+gcc)
$(eval $(call SetupJdkLibrary, BUILD_LIBSIMD_SORT, \
NAME := simdsort, \
TOOLCHAIN := TOOLCHAIN_LINK_CXX, \
OPTIMIZATION := HIGH, \
SRC := $(SIMDSORT_BASE_DIR)/aarch64, \
CFLAGS := $(CFLAGS_JDKLIB) -march=armv8.2-a+sve, \
CXXFLAGS := $(CXXFLAGS_JDKLIB) -march=armv8.2-a+sve -std=c++17, \
LDFLAGS := $(LDFLAGS_JDKLIB) \
$(call SET_SHARED_LIBRARY_ORIGIN), \
LIBS := $(LIBCXX), \
DISABLED_WARNINGS_gcc := unused-variable, \
LIBS_linux := -lc -lm -ldl, \
))

TARGETS += $(BUILD_LIBSIMD_SORT)
endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole block should be combined with the existing block above, something like this:

ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, x86_64 aarch64)+$(INCLUDE_COMPILER2)+$(filter $(TOOLCHAIN_TYPE), gcc), true+true+true+gcc)
  ##############################################################################
  ## Build libsimdsort
  ##############################################################################

  $(eval $(call SetupJdkLibrary, BUILD_LIBSIMD_SORT, \
      NAME := simdsort, \
      LINK_TYPE := C++, \
      OPTIMIZATION := HIGH, \
      INCLUDES := $(OPENJDK_TARGET_CPU_ARCH), \
      CXXFLAGS := -std=c++17, \
      CXXFLAGS_linux_aarch64 := -march=armv8.2-a+sve, \
      DISABLED_WARNINGS_gcc := unused-variable, \
      LIBS_linux := $(LIBM), \
  ))

  TARGETS += $(BUILD_LIBSIMD_SORT)
endif

Unfortunately we don't currently support CXXFLAGS_, just CFLAGS_, but this can be fixed and I think it should be since we now have a need for it.

diff --git a/make/common/native/Flags.gmk b/make/common/native/Flags.gmk
index efb4c08e74c..2f3680af7c7 100644
--- a/make/common/native/Flags.gmk
+++ b/make/common/native/Flags.gmk
@@ -106,10 +106,12 @@ define SetupCompilerFlags
     $1_EXTRA_CFLAGS += -DSTATIC_BUILD=1
   endif
 
-  # Pickup extra OPENJDK_TARGET_OS_TYPE, OPENJDK_TARGET_OS and/or TOOLCHAIN_TYPE
-  # dependent variables for CXXFLAGS.
+  # Pickup extra OPENJDK_TARGET_OS_TYPE, OPENJDK_TARGET_OS, TOOLCHAIN_TYPE and
+  # OPENJDK_TARGET_OS plus OPENJDK_TARGET_CPU pair dependent variables for
+  # CXXFLAGS.
   $1_EXTRA_CXXFLAGS := $$($1_CXXFLAGS_$(OPENJDK_TARGET_OS_TYPE)) $$($1_CXXFLAGS_$(OPENJDK_TARGET_OS)) \
-      $$($1_CXXFLAGS_$(TOOLCHAIN_TYPE))
+      $$($1_CXXFLAGS_$(TOOLCHAIN_TYPE)) \
+      $$($1_CXXFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU))
 
   ifneq ($(DEBUG_LEVEL), release)
     # Pickup extra debug dependent variables for CXXFLAGS

The above at least compiles for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build build-dev@openjdk.org core-libs core-libs-dev@openjdk.org hotspot hotspot-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

2 participants