Skip to content

Conversation

@mkroening
Copy link
Member

@mkroening mkroening commented Dec 3, 2025

This PR replaces the kernel-internal mandatory memory_barrier with weaker memory barriers from the new mem-barriers crate.

This will probably not make a big difference, but this should be an improvement.

What I also noticed is that for packed virtqueues, we currently don't have any memory barriers at all, which should be fixed in the future. For split virtqueues, I think we also have too few at the moment.

@mkroening mkroening changed the title perf: use weaker memory barriers from the mem-barrier crate perf(virtio): use weaker memory barriers from the mem-barrier crate Dec 3, 2025
@mkroening mkroening marked this pull request as ready for review December 3, 2025 12:46
@mkroening mkroening requested review from Gelbpunkt and cagatay-y and removed request for cagatay-y December 3, 2025 12:46
@mkroening mkroening self-assigned this Dec 3, 2025
@mkroening mkroening requested a review from cagatay-y December 3, 2025 12:50
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Benchmark Current: 4299cb6 Previous: befcf26 Performance Ratio
startup_benchmark Build Time 116.74 s 111.11 s 1.05
startup_benchmark File Size 0.87 MB 0.87 MB 1.00
Startup Time - 1 core 1.02 s (±0.02 s) 0.98 s (±0.02 s) 1.04
Startup Time - 2 cores 1.01 s (±0.03 s) 0.99 s (±0.03 s) 1.01
Startup Time - 4 cores 1.00 s (±0.03 s) 0.99 s (±0.02 s) 1.01
multithreaded_benchmark Build Time 116.63 s 111.66 s 1.04
multithreaded_benchmark File Size 0.97 MB 0.97 MB 1.00
Multithreaded Pi Efficiency - 2 Threads 93.17 % (±8.19 %) 87.45 % (±8.10 %) 1.07
Multithreaded Pi Efficiency - 4 Threads 44.74 % (±2.94 %) 42.77 % (±3.00 %) 1.05
Multithreaded Pi Efficiency - 8 Threads 25.66 % (±1.52 %) 25.18 % (±1.60 %) 1.02
micro_benchmarks Build Time 300.82 s 296.75 s 1.01
micro_benchmarks File Size 0.98 MB 0.98 MB 1.00
Scheduling time - 1 thread 180.88 ticks (±32.99 ticks) 176.35 ticks (±34.79 ticks) 1.03
Scheduling time - 2 threads 100.91 ticks (±19.37 ticks) 104.10 ticks (±21.43 ticks) 0.97
Micro - Time for syscall (getpid) 11.38 ticks (±4.59 ticks) 11.69 ticks (±5.33 ticks) 0.97
Memcpy speed - (built_in) block size 4096 56883.92 MByte/s (±41662.81 MByte/s) 61256.19 MByte/s (±45369.75 MByte/s) 0.93
Memcpy speed - (built_in) block size 1048576 14792.56 MByte/s (±12874.46 MByte/s) 14154.64 MByte/s (±11768.17 MByte/s) 1.05
Memcpy speed - (built_in) block size 16777216 9290.99 MByte/s (±7457.02 MByte/s) 9650.18 MByte/s (±7829.51 MByte/s) 0.96
Memset speed - (built_in) block size 4096 57150.54 MByte/s (±41824.50 MByte/s) 62658.40 MByte/s (±45879.30 MByte/s) 0.91
Memset speed - (built_in) block size 1048576 15064.46 MByte/s (±12983.79 MByte/s) 14567.64 MByte/s (±12031.17 MByte/s) 1.03
Memset speed - (built_in) block size 16777216 9511.80 MByte/s (±7586.08 MByte/s) 9902.13 MByte/s (±7993.63 MByte/s) 0.96
Memcpy speed - (rust) block size 4096 52416.85 MByte/s (±38777.31 MByte/s) 55637.12 MByte/s (±41593.40 MByte/s) 0.94
Memcpy speed - (rust) block size 1048576 13697.10 MByte/s (±11164.80 MByte/s) 13921.96 MByte/s (±11517.33 MByte/s) 0.98
Memcpy speed - (rust) block size 16777216 9552.87 MByte/s (±7716.97 MByte/s) 9776.63 MByte/s (±7947.05 MByte/s) 0.98
Memset speed - (rust) block size 4096 53016.95 MByte/s (±39192.55 MByte/s) 56255.41 MByte/s (±41950.49 MByte/s) 0.94
Memset speed - (rust) block size 1048576 14026.00 MByte/s (±11349.89 MByte/s) 14238.58 MByte/s (±11680.58 MByte/s) 0.99
Memset speed - (rust) block size 16777216 9821.81 MByte/s (±7892.11 MByte/s) 10072.00 MByte/s (±8153.23 MByte/s) 0.98
alloc_benchmarks Build Time 296.15 s 293.98 s 1.01
alloc_benchmarks File Size 0.95 MB 0.95 MB 1.00
Allocations - Allocation success 100.00 % 100.00 % 1
Allocations - Deallocation success 100.00 % 100.00 % 1
Allocations - Pre-fail Allocations 100.00 % 100.00 % 1
Allocations - Average Allocation time 25753.01 Ticks (±1054.80 Ticks) 25979.67 Ticks (±1072.71 Ticks) 0.99
Allocations - Average Allocation time (no fail) 25753.01 Ticks (±1054.80 Ticks) 25979.67 Ticks (±1072.71 Ticks) 0.99
Allocations - Average Deallocation time 3076.12 Ticks (±1076.97 Ticks) 3078.39 Ticks (±1348.13 Ticks) 1.00
mutex_benchmark Build Time 298.48 s 295.48 s 1.01
mutex_benchmark File Size 0.98 MB 0.98 MB 1.00
Mutex Stress Test Average Time per Iteration - 1 Threads 36.56 ns (±4.36 ns) 36.76 ns (±3.72 ns) 0.99
Mutex Stress Test Average Time per Iteration - 2 Threads 30.46 ns (±3.56 ns) 29.88 ns (±3.19 ns) 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants