Sparse emd implementation #781

nathanneike · 2025-12-01T16:12:53Z

Types of changes

Bug fix (non-breaking change which fixes an issue)
Improvement (non-breaking change which improves the codebase)

Motivation and context / Related issue

Modernizes the POT backend by migrating from the deprecated scipy.sparse.coo_matrix to the modern scipy.sparse.coo_array API. This ensures future compatibility as SciPy moves away from matrix-based sparse classes in favor of array-based ones.

How has this been tested (if it applies)

Updated NumPy backend coo_matrix() method to return coo_array instead of coo_matrix
Updated sparse_coo_data() to handle both coo_array and coo_matrix for backward compatibility
Added test case verifying coo_array element-wise multiplication functionality
All 2972 tests pass successfully

PR checklist

I have read the CONTRIBUTING document.
The documentation is up-to-date with the changes I made (check build artifacts).
All tests passed, and additional code has been covered with new tests.
I have added the PR and Issue fix to the RELEASES.md file.

- Implement sparse bipartite graph EMD solver in C++ - Add Python bindings for sparse solver (emd_wrap.pyx, _network_simplex.py) - Add unit tests to verify sparse and dense solvers produce identical results - Tests use augmented k-NN approach to ensure fair comparison - Update setup.py to include sparse solver compilation Both test_emd_sparse_vs_dense() and test_emd2_sparse_vs_dense() verify: * Identical costs between sparse and dense solvers * Marginal constraint satisfaction for both solvers

This PR implements a sparse bipartite graph EMD solver for memory-efficient optimal transport when the cost matrix has many infinite or forbidden edges. Changes: - Implement sparse bipartite graph EMD solver in C++ - Add Python bindings for sparse solver (emd_wrap.pyx, _network_simplex.py) - Add unit tests to verify sparse and dense solvers produce identical results - Tests use augmented k-NN approach to ensure fair comparison Tests verify correctness: * test_emd_sparse_vs_dense() - verifies identical costs and marginal constraints * test_emd2_sparse_vs_dense() - verifies cost-only version Status: WIP - seeking feedback on implementation approach TODO: Add example script and documentation

…trix parameter from emd and fix linting issues

- Remove tuple format support for sparse matrices (use scipy.sparse only) - Change index types from int64_t to uint64_t throughout (indices are never negative) - Refactor emd() and emd2() with clear sparse/dense code path separation - Add sparse_bipartitegraph.h to MANIFEST.in to fix build - Add test_emd_sparse_backends() to verify backend compatibility

…nal file

Refactor sparse optimal transport implementation to work across different backends (NumPy/scipy.sparse, PyTorch/torch.sparse). Key changes: - Add `sparse_coo_data()` method to backend layer for uniform sparse matrix handling across NumPy, PyTorch, JAX, and TensorFlow backends - Update `emd()` and `emd2()` to return transport plans in backend-native sparse format (scipy.sparse for NumPy, torch.sparse for PyTorch) - Refactor `plot2D_samples_mat()` to efficiently visualize both dense and sparse transport plans by detecting format and iterating only over non-zero entries for sparse matrices - Update `plot_sparse_emd.py` example to use new plotting function - Add comprehensive tests for sparse EMD across backends - Update documentation to reflect backend-agnostic sparse support

- Preserve PyTorch sparse tensors through numpy conversion for autograd - Verify gradient w.r.t. M equals transport plan - Add sparse backend compatibility checks and teststhrow error when unsupported backend used for sparse"

- Use sklearn.NearestNeighbors in dist_knn() (1.4x faster) - Remove redundant test code (~50 lines) - Migrate coo_matrix → coo_array - Fix parameter ordering consistency

… proper backend adaptation

…t to test coo_array functionnality

nathanneike and others added 17 commits October 28, 2025 16:01

Fix int64_t type compatibility for Linux, remove sparse and return ma…

022720b

…trix parameter from emd and fix linting issues

fix : Quick test file fix

fae9f02

Merge branch 'master' into sparse-emd-implementation

1e28771

Added Example for documentation and modified back setup file to origi…

b184cd4

…nal file

throw error when unsupported backend used for sparse

1a3dc41

Fix sparse tensor gradients and add backend checks

54479d5

- Preserve PyTorch sparse tensors through numpy conversion for autograd - Verify gradient w.r.t. M equals transport plan - Add sparse backend compatibility checks and teststhrow error when unsupported backend used for sparse"

Optimize sparse EMD with sklearn and code cleanup

4885368

- Use sklearn.NearestNeighbors in dist_knn() (1.4x faster) - Remove redundant test code (~50 lines) - Migrate coo_matrix → coo_array - Fix parameter ordering consistency

Simplified test comparing sparse to dense to be more approchable with…

a298a86

… proper backend adaptation

moved sparse matrix generation code to test file out of utils

f8ca89e

Added modifications to release file

1287e18

Added to contributors and citation and got rid of email in example

4e0477a

Apply suggestion from @rflamary

8389017

Replaced coo_matrix with coo_array better compatability and added tes…

2e133c4

…t to test coo_array functionnality

github-actions bot added Tests ot.utils ot.lp ot.backend ot.plot Examples labels Dec 1, 2025

nathanneike closed this Dec 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sparse emd implementation #781

Sparse emd implementation #781

Uh oh!

nathanneike commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sparse emd implementation #781

Sparse emd implementation #781

Uh oh!

Conversation

nathanneike commented Dec 1, 2025

Types of changes

Motivation and context / Related issue

How has this been tested (if it applies)

PR checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants