Skip to content

Conversation

@cardmagic
Copy link
Owner

Summary

  • Add Mutex_m inclusion to both Bayes and LSI classifiers
  • Wrap all public methods accessing shared state in synchronize blocks
  • Add marshal_dump/marshal_load for serialization (Mutex can't be marshaled)

Problem

Concurrent train() and classify() calls corrupt internal state:

classifier = Classifier::Bayes.new('Spam', 'Ham')

threads = 10.times.map do
  Thread.new do
    100.times do
      classifier.train_spam("buy now cheap")
      classifier.classify("special offer")
    end
  end
end

threads.each(&:join)  # Corrupted state

Solution

Use the Mutex_m mixin (already a gem dependency) with a pattern that avoids deadlock:

def public_method(...)
  synchronize { private_unlocked_method(...) }
end

private

def private_unlocked_method(...)
  # actual implementation
end

Test plan

  • All existing tests pass with bundle exec rake test
  • All tests pass with NATIVE_VECTOR=true bundle exec rake test
  • Marshal serialization still works (test_serialize_safe)

Closes #66

Both classifiers now use Mutex_m for thread-safe operations. Concurrent
train() and classify() calls no longer corrupt internal state.

Key changes:
- Include Mutex_m in both Bayes and LSI classes
- Wrap all public methods accessing shared state in synchronize blocks
- Add unlocked private variants for internal calls to avoid deadlock
- Implement marshal_dump/marshal_load to handle serialization (Mutex
  cannot be marshaled)

The pattern used avoids deadlock by having public methods delegate to
unlocked private methods when the lock is already held.

Closes #66
The rbs-inline tool generates 'include Mutex_m' in the type
signatures for Bayes and LSI classes, but RBS validation fails
because it cannot find the Mutex_m type definition.

Add vendor type stubs with the core mutex methods and their
standard aliases (synchronize, lock, unlock, etc.) to satisfy
both RBS validation and Steep type checking.
Extract GSL and native matrix vector assignment into helper methods
to reduce build_index method length below the 25-line threshold.

Use anonymous block forwarding (&) per Ruby 3.1+ style preferences
for the unlocked proxy methods.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add thread safety to classifiers

2 participants