Skip to content

Conversation

@cardmagic
Copy link
Owner

Summary

Replace the rb-gsl dependency with a self-contained C extension that implements Vector, Matrix, and Jacobi SVD operations. This eliminates the need for users to install external libraries (like libgsl) while providing significant performance improvements.

  • Add native C extension (~850 lines) with Vector, Matrix, and SVD implementations
  • Port existing Ruby Jacobi SVD algorithm to C for consistent results
  • Auto-detect backend: native extension → pure Ruby fallback
  • Remove all GSL-related code and dependencies
  • Update benchmarks and documentation

Performance

Documents build_index Overall Speedup
5 7x 2.6x
10 25x 4.6x
15 112x 14.5x
20 385x 48.7x
Detailed benchmark (20 documents)
Operation            Pure Ruby     Native C      Speedup
----------------------------------------------------------
build_index            0.5540       0.0014       384.5x
classify               0.0190       0.0060         3.2x
search                 0.0145       0.0037         3.9x
find_related           0.0098       0.0011         8.6x
----------------------------------------------------------
TOTAL                  0.5973       0.0123        48.7x

Test Plan

  • All 113 tests pass with native C extension
  • All 91 tests pass with pure Ruby fallback (NATIVE_VECTOR=true)
  • Benchmark comparison shows expected speedups
  • No compiler warnings in C code
  • No Ruby warnings during test runs

Files Changed

New:

  • ext/classifier/ - C extension source files
  • test/linalg/native_ext_test.rb - Unit tests for native extension

Modified:

  • classifier.gemspec - Add extension configuration
  • lib/classifier/lsi.rb - Backend detection logic
  • README.md / CLAUDE.md - Documentation updates

Removed:

  • lib/classifier/extensions/vector_serialize.rb - GSL serialization (no longer needed)

Closes #87

Replace the rb-gsl dependency with a self-contained C extension that
implements Vector, Matrix, and Jacobi SVD operations. This eliminates
the need for users to install external libraries while providing
significant performance improvements.

The native extension provides 5-50x speedup over pure Ruby, with the
SVD-heavy build_index operation showing up to 384x improvement on
larger document sets. The implementation ports the existing Ruby
Jacobi SVD algorithm to C, ensuring consistent results.

Key changes:
- Add ext/classifier/ with ~850 lines of C code
- Implement Classifier::Linalg::Vector and Matrix classes
- Port Jacobi SVD from Ruby to C
- Auto-detect backend: native extension > pure Ruby fallback
- Remove GSL-related code and dependencies
- Update benchmarks to compare native C vs pure Ruby

Closes #87
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Explore minimal GSL C extension to replace rb-gsl dependency

2 participants