Skip to content

Commit 4cb5ac2

Browse files
author
Peter Zijlstra
committed
futex: Optimize per-cpu reference counting
Shrikanth noted that the per-cpu reference counter was still some 10% slower than the old immutable option (which removes the reference counting entirely). Further optimize the per-cpu reference counter by: - switching from RCU to preempt; - using __this_cpu_*() since we now have preempt disabled; - switching from smp_load_acquire() to READ_ONCE(). This is all safe because disabling preemption inhibits the RCU grace period exactly like rcu_read_lock(). Having preemption disabled allows using __this_cpu_*() provided the only access to the variable is in task context -- which is the case here. Furthermore, since we know changing fph->state to FR_ATOMIC demands a full RCU grace period we can rely on the implied smp_mb() from that to replace the acquire barrier(). This is very similar to the percpu_down_read_internal() fast-path. The reason this is significant for PowerPC is that it uses the generic this_cpu_*() implementation which relies on local_irq_disable() (the x86 implementation relies on it being a single memop instruction to be IRQ-safe). Switching to preempt_disable() and __this_cpu*() avoids this IRQ state swizzling. Also, PowerPC needs LWSYNC for the ACQUIRE barrier, not having to use explicit barriers safes a bunch. Combined this reduces the performance gap by half, down to some 5%. Fixes: 760e6f7 ("futex: Remove support for IMMUTABLE") Reported-by: Shrikanth Hegde <sshegde@linux.ibm.com> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20251106092929.GR4067720@noisy.programming.kicks-ass.net
1 parent 6146a0f commit 4cb5ac2

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

kernel/futex/core.c

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1680,10 +1680,10 @@ static bool futex_ref_get(struct futex_private_hash *fph)
16801680
{
16811681
struct mm_struct *mm = fph->mm;
16821682

1683-
guard(rcu)();
1683+
guard(preempt)();
16841684

1685-
if (smp_load_acquire(&fph->state) == FR_PERCPU) {
1686-
this_cpu_inc(*mm->futex_ref);
1685+
if (READ_ONCE(fph->state) == FR_PERCPU) {
1686+
__this_cpu_inc(*mm->futex_ref);
16871687
return true;
16881688
}
16891689

@@ -1694,10 +1694,10 @@ static bool futex_ref_put(struct futex_private_hash *fph)
16941694
{
16951695
struct mm_struct *mm = fph->mm;
16961696

1697-
guard(rcu)();
1697+
guard(preempt)();
16981698

1699-
if (smp_load_acquire(&fph->state) == FR_PERCPU) {
1700-
this_cpu_dec(*mm->futex_ref);
1699+
if (READ_ONCE(fph->state) == FR_PERCPU) {
1700+
__this_cpu_dec(*mm->futex_ref);
17011701
return false;
17021702
}
17031703

0 commit comments

Comments
 (0)