Skip to content

Conversation

@theol0403
Copy link
Contributor

Hey, it's me again :).

Following our conversation on matrix regarding realtime performance on linux, I was able to tune the performance of my system by using blocking timers that inherit the PREEMPT_RT priority of my PDI loop thread instead of smol's timers that delegate timing to a non-RT helper thread.

Before:
CleanShot 2025-08-04 at 12 26 09@2x

After:
CleanShot 2025-08-04 at 12 26 54@2x

That looks way better... but wait, why is there still a spike!?

Via debugging, I was able to determine that occasionally, let mut pdi_lock = self.pdi.write(); in subdevice_group/mod.rs would take 2-20ms acquire the lock, despite being a high priority thread. We've run into the problem of priority inversion, where my application-layer non-RT threads that want to write and read to the PDI (theoretically an instantaneous operation) get pre-empted while still holding the lock, causing the PDI loop thread to hang.

There are various approaches to solving priority inversion, including os-provided mutexes that upgrade/inherit the priority of any lower-priority thread currently holding the lock (eg futex), or RT-specific algorithms that do dependency analysis or simply yield on lock instead of spinning.

Some available mutexes that address this issue to various extents (with mixed results):

For example, patching ethercrab to use rtsc::pi::Mutex, we get the following!

CleanShot 2025-08-04 at 12 37 44@2x

However, I would like to reduce the number of patches I need to maintain in a fork.

This PR takes advantage of lock_api to expose a user-configurable PDI lock in ethercrab that defaults to the current spin lock. This way, various locking implementations (rt, non-rt, critical section, etc) can be easily swapped out.

@theol0403 theol0403 force-pushed the lock-api-pdi branch 2 times, most recently from 6bb9e61 to 313966b Compare August 4, 2025 19:56
@theol0403
Copy link
Contributor Author

theol0403 commented Aug 4, 2025

Might be able to avoid any breaking changes by having R be the last parameter in SubdeviceGroup, defaulted to crate::DefaultLock, but it feels off as S and DC are marker types whereas currently the first three are proper parameters.

On the other hand, it is more common to specify Op and HasDc if you want to store the groups somewhere, but less common to override the lock type, so maybe I should put R as the last parameter.

@jamwaffles jamwaffles changed the title feat: use lock_api trait for configurable PDI lock Use lock_api trait for configurable PDI lock Nov 16, 2025
@jamwaffles
Copy link
Collaborator

The analysis and investigation in your opening comment is fantastic! Nice work finding the root cause and... oof... that default locking behaviour is terrible.

Concerning the order of generics, I'd like to at least explore removing the S typestate as it's making the API a bit too restrictive IMO, so I don't think we need to worry about the generics order for now.

I've got a few changes I'd like to make, but other wise this is great stuff, thanks!

@jamwaffles jamwaffles enabled auto-merge (squash) November 23, 2025 15:56
@jamwaffles jamwaffles merged commit 75959b1 into ethercrab-rs:main Nov 23, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants