Commit a14a3a0
committed
8371711: AArch64: SVE intrinsics for Arrays.sort methods (int, float)
This patch adds an SVE implementation of primitive array sorting
(Arrays.sort()) on AArch64 systems that support SVE. On non-SVE
machines, we fall back to the existing Java implementation.
For smaller arrays (length <= 64), we use insertion sort; for larger
arrays we use an SVE-vectorized quicksort partitioner followed by an
odd-even transposition cleanup pass.
The SVE path is enabled by default for int type. For float type, it is
available through the experimental flag :
-XX:+UnlockExperimentalVMOptions -XX:+UseSVELibSimdSortForFP
Without this flag being enabled, the default Java implementation would
be executed for floats (the flag is disabled by default).
Float is gated due to observed regressions on some small/medium sizes.
On larger arrays, the SVE float path shows upto 1.47x speedup on Neoverse
V2 and 2.12x on Neoverse V1.
Following are the performance numbers for ArraysSort JMH benchmark -
Case A: Ratio between the scores of master branch and UseSVELibSimdSortForFP
flag disabled (which is the default).
Case B: Ratio between the scores of master branch and
UseSVELibSimdSortForFP flag enabled (the int numbers will be the same
but this now enables SVE vectorized sorting for floats).
We would want the ratios to be >= 1 to be at par or better than the
default Java implementation (master branch).
On Neoverse V1:
Benchmark (size) Mode Cnt A B
ArraysSort.floatParallelSort 10 avgt 3 0.98 0.98
ArraysSort.floatParallelSort 25 avgt 3 1.01 0.83
ArraysSort.floatParallelSort 50 avgt 3 0.99 0.55
ArraysSort.floatParallelSort 75 avgt 3 0.99 0.66
ArraysSort.floatParallelSort 100 avgt 3 0.98 0.66
ArraysSort.floatParallelSort 1000 avgt 3 1.00 0.84
ArraysSort.floatParallelSort 10000 avgt 3 1.03 1.52
ArraysSort.floatParallelSort 100000 avgt 3 1.03 1.46
ArraysSort.floatParallelSort 1000000 avgt 3 0.98 1.81
ArraysSort.floatSort 10 avgt 3 1.00 0.98
ArraysSort.floatSort 25 avgt 3 1.00 0.81
ArraysSort.floatSort 50 avgt 3 0.99 0.56
ArraysSort.floatSort 75 avgt 3 0.99 0.65
ArraysSort.floatSort 100 avgt 3 0.98 0.70
ArraysSort.floatSort 1000 avgt 3 0.99 0.84
ArraysSort.floatSort 10000 avgt 3 0.99 1.72
ArraysSort.floatSort 100000 avgt 3 1.00 1.94
ArraysSort.floatSort 1000000 avgt 3 1.00 2.13
ArraysSort.intParallelSort 10 avgt 3 1.08 1.08
ArraysSort.intParallelSort 25 avgt 3 1.04 1.05
ArraysSort.intParallelSort 50 avgt 3 1.29 1.30
ArraysSort.intParallelSort 75 avgt 3 1.16 1.16
ArraysSort.intParallelSort 100 avgt 3 1.07 1.07
ArraysSort.intParallelSort 1000 avgt 3 1.13 1.13
ArraysSort.intParallelSort 10000 avgt 3 1.49 1.38
ArraysSort.intParallelSort 100000 avgt 3 1.64 1.62
ArraysSort.intParallelSort 1000000 avgt 3 2.26 2.27
ArraysSort.intSort 10 avgt 3 1.08 1.08
ArraysSort.intSort 25 avgt 3 1.02 1.02
ArraysSort.intSort 50 avgt 3 1.25 1.25
ArraysSort.intSort 75 avgt 3 1.16 1.20
ArraysSort.intSort 100 avgt 3 1.07 1.07
ArraysSort.intSort 1000 avgt 3 1.12 1.13
ArraysSort.intSort 10000 avgt 3 1.94 1.95
ArraysSort.intSort 100000 avgt 3 1.86 1.86
ArraysSort.intSort 1000000 avgt 3 2.09 2.09
On Neoverse V2:
Benchmark (size) Mode Cnt A B
ArraysSort.floatParallelSort 10 avgt 3 1.02 1.02
ArraysSort.floatParallelSort 25 avgt 3 0.97 0.71
ArraysSort.floatParallelSort 50 avgt 3 0.94 0.65
ArraysSort.floatParallelSort 75 avgt 3 0.96 0.82
ArraysSort.floatParallelSort 100 avgt 3 0.95 0.84
ArraysSort.floatParallelSort 1000 avgt 3 1.01 0.94
ArraysSort.floatParallelSort 10000 avgt 3 1.01 1.25
ArraysSort.floatParallelSort 100000 avgt 3 1.01 1.09
ArraysSort.floatParallelSort 1000000 avgt 3 1.00 1.10
ArraysSort.floatSort 10 avgt 3 1.02 1.00
ArraysSort.floatSort 25 avgt 3 0.99 0.76
ArraysSort.floatSort 50 avgt 3 0.97 0.66
ArraysSort.floatSort 75 avgt 3 1.01 0.83
ArraysSort.floatSort 100 avgt 3 1.00 0.85
ArraysSort.floatSort 1000 avgt 3 0.99 0.93
ArraysSort.floatSort 10000 avgt 3 1.00 1.28
ArraysSort.floatSort 100000 avgt 3 1.00 1.37
ArraysSort.floatSort 1000000 avgt 3 1.00 1.48
ArraysSort.intParallelSort 10 avgt 3 1.05 1.05
ArraysSort.intParallelSort 25 avgt 3 0.99 0.84
ArraysSort.intParallelSort 50 avgt 3 1.03 1.14
ArraysSort.intParallelSort 75 avgt 3 0.91 0.99
ArraysSort.intParallelSort 100 avgt 3 0.98 0.96
ArraysSort.intParallelSort 1000 avgt 3 1.32 1.30
ArraysSort.intParallelSort 10000 avgt 3 1.40 1.40
ArraysSort.intParallelSort 100000 avgt 3 1.00 1.04
ArraysSort.intParallelSort 1000000 avgt 3 1.15 1.14
ArraysSort.intSort 10 avgt 3 1.05 1.05
ArraysSort.intSort 25 avgt 3 1.03 1.03
ArraysSort.intSort 50 avgt 3 1.08 1.14
ArraysSort.intSort 75 avgt 3 0.88 0.98
ArraysSort.intSort 100 avgt 3 1.01 0.99
ArraysSort.intSort 1000 avgt 3 1.3 1.32
ArraysSort.intSort 10000 avgt 3 1.43 1.43
ArraysSort.intSort 100000 avgt 3 1.30 1.30
ArraysSort.intSort 1000000 avgt 3 1.37 1.37
This patch is part of a series of patches to add support for vectorized
array sorting for AArch64 (including fixing the regressions for
small/medium float arrays, support for double/long etc).1 parent 2735140 commit a14a3a0
File tree
17 files changed
+1088
-0
lines changed- src/java.base/linux/native/libsimdsort
- aarch64
- x86
17 files changed
+1088
-0
lines changedLines changed: 367 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
0 commit comments