-
Notifications
You must be signed in to change notification settings - Fork 349
perf(geospatial): optimize grid.intersect() with KD-tree + ConvexHull #2654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
perf(geospatial): optimize grid.intersect() with KD-tree + ConvexHull #2654
Conversation
914df20 to
0fffdeb
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #2654 +/- ##
===========================================
+ Coverage 55.5% 72.6% +17.1%
===========================================
Files 644 669 +25
Lines 124135 129738 +5603
===========================================
+ Hits 68947 94263 +25316
+ Misses 55188 35475 -19713
🚀 New features to boost your workflow:
|
5fe0dab to
78073f8
Compare
|
@wpbonelli I’m looking for some guidance on how to handle these tests. The issue arises when a point lies exactly on the vertices of grid cells, making it ambiguous which of the 3-n cells it should belong to:
In these non-deterministic cases, different methods return different cell indices. I can force the kdtree-based methods to return the expected value, but that comes with a performance cost. It then causes another test failure (assert DIS == DISV), since those methods also behave differently. I could also update the structured grid path to enforce consistent behavior so all tests pass. Before making those changes, I’d like your input on the preferred approach to resolve these cascading test failures. |
|
No performance hit with changing tiebreaking |
009e70a to
8fdface
Compare
@aacovski in the gridintersect module, in case of ties, we decided to always return the lowest grid cell number. But I think that any approach as long as it is clearly documented (and maybe consistent across runs) should be fine? |
|
Agree with @dbrakenhoff I think lowest number is the existing convention so might as well stick to it if there is no performance cost? |
|
Great, that's what I did. I'll consider adding more tests for coverage, but otherwise I think the PR is ready. Thanks |
does the latter mean the new grid index and related performance boost don't support DISU? |
As I understand it, “fully 3D unstructured” grids are more characteristic of FEM models, and I’m not aware of MODFLOW supporting that kind of discretization. That said, my code does support DISU in the sense that |
|
I am going to clean up docstrings later today as I noticed some old info. Sorry about that. I want to add some more tests too. |
17abb25 to
328fa37
Compare
…ructuredGrid Add KD-tree based spatial indexing with ConvexHull containment for fast point-to-cell queries on unstructured grids. Key features: - KD-tree spatial indexing using cell centroids and vertices - ConvexHull containment test for accurate point-in-cell detection - Full 3D support with layer-aware queries using z-coordinate - Consistent tie-breaking: lowest cell ID wins for boundary points - 1:1 point-to-cell mapping (same number of outputs as inputs) - Consistent return types: nan for outside, int/int64 for inside Changes: - Remove StructuredGrid support (uses its own optimized searchsorted) - StructuredGrid.intersect now returns (nan, nan, nan) for outside with z - query_point returns np.nan for outside (not None) - query_points returns int64 when all inside, float64 when nan present - Comprehensive test coverage for return type consistency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
328fa37 to
90209e1
Compare
|
@aacovski I guess I just didn't properly understand what you meant by "always layered" |
wpbonelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's maybe worth noting that the index assumes that x/y cell centers are in fact centroid coords? presumably the index requires this, rather than simple geometric mean, for polygons for which they aren't equivalent? maybe we should clarify in the docstrings on the grid classes as "cell centers" is ambiguous?
I hadn’t considered that the provided cell centers might not correspond to geometric centroids even for convex polygons. The index shouldn’t assume that, so I’ll update to clarify how cell centers are interpreted. Thanks for pointing it out! |
Update docstrings in GeospatialIndex, VertexGrid, and UnstructuredGrid to clarify that xcellcenters/ycellcenters are user-provided cell center coordinates, not necessarily true geometric centroids. For convex cells the difference is negligible, but this clarifies the semantic meaning. Also removes structured grid tests from test_geospatial_index.py since GeospatialIndex only supports vertex/unstructured grids (tests already exist in test_grid.py). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add reusable grid factory methods to test_grid_cases.py (vertex_minimal, unstructured_minimal, vertex_rectangular, etc.) - Add shared fixtures to conftest.py wrapping GridCases methods - Move unique GeospatialIndex tests to test_grid.py (test_index_build, test_index_repr, test_index_k_parameter, test_index_thin_sliver_cell) - Remove redundant tests already covered by existing intersect tests - Delete test_geospatial_index.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
… xc/yc - Make is_3d a property that uses grid.grid_varies_by_layer directly - Fix _build_3d_index to map node IDs to layer-local cell IDs when xcellcenters/ycellcenters are per-layer (fewer values than nnodes) - Vectorize centroid building in _build_3d_index - Fix unstructured_layered() test grid to use grid_varies_by_layer=False (2D indexing with layer search) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Ah, ok- and the index doesn't even assume the center is inside the cell? It just performs better if so?
I think the intention was/is for it to be cell centroid, even though we don't check this when provided by the user or read from a binary grid file. Evidence:
Maybe something to clarify in the MF6IO guide. For DIS it doesn't matter but it becomes relevant for DISV/U. |
|
@aacovski it wasn't really clear from previous comment but I think it's best to revert the docstrings to say centroid rather than leaving it ambiguous. my first comment was really to point out the (unwanted?) ambiguity in the existing grid classes, not to say your new ones should change |
|
Another thought: why not move the structured grid searchsorted query implementation to Also just want to say, thanks for working on this. This seems like it will be a great asset both in the near term and longer term- in 4.x it could sit behind xarray-based user APIs, whether builtin |
|
I'll work on implementing that: moving intersect to base class and remove the subclass intersect methods. Regarding the docstrings: since there is a difference between centroid and centers, based on everything I've read, it makes more sense to use the more general case (centers) rather than centroids which could be inaccurate. |
|
I'm pretty sure "cell centers" is meant as centroids, that's why I'm suggesting to change the doctrings up in the grids themselves. But I could be wrong. Hopefully someone with more tenure can chime in. |
Summary
Optimize
grid.intersect()performance across all grid types using KD-tree spatial indexing with pre-computed ConvexHull equations and fully vectorized structured grid withsearchsorted. This is all on the assumption that grids will never be non-convex and always layered.Performance Improvements
geospatial_performance_results.csv
~50-300x speedup
Key Optimizations
_copy_cacheduring index build to avoid bottleneckChanges
flopy/utils/geospatial_index.pywith GeospatialIndex classvaries_by_layers=False- 2D kdtree, 2D convexhull and layer searchvaries_by_layers=True- 3D kdtree, 3D convexhullTest coverage
Regressions