Skip to content

Conversation

@aacovski
Copy link
Contributor

@aacovski aacovski commented Nov 23, 2025

Summary

Optimize grid.intersect() performance across all grid types using KD-tree spatial indexing with pre-computed ConvexHull equations and fully vectorized structured grid with searchsorted. This is all on the assumption that grids will never be non-convex and always layered.

Performance Improvements

geospatial_performance_results.csv
~50-300x speedup

Key Optimizations

  1. KD-tree + vertices indexing: O(log n) candidate cell search using centroids + vertices
  2. Pre-computed ConvexHull equations: Vectorized point-in-polygon testing via hyperplane equations
  3. Bounding box rejection: Fast elimination before expensive polygon tests
  4. Deepcopy fix: Disabled _copy_cache during index build to avoid bottleneck
  5. Vectorized result processing: Eliminated Python loops in grid.intersect() methods

Changes

  • Add flopy/utils/geospatial_index.py with GeospatialIndex class
  • Optimize VertexGrid.intersect() - 2D kdtree, 2D convexhull and layer search
  • Optimize UnstructuredGrid.intersect() varies_by_layers=False - 2D kdtree, 2D convexhull and layer search
  • Optimize UnstructuredGrid.intersect() varies_by_layers=True - 3D kdtree, 3D convexhull
  • Optimize StructuredGrid.intersect() with vectorized row/col finding - searchsorted
  • Add edge case tests for thin sliver cells, boundaries, corners
  • GeospatialIndex now raises ValueError if passed a StructuredGrid
  • All methods now return np.nan for outside points (previously StructuredGrid returned (None, nan, nan) with z, and GeospatialIndex.query_point returned None)
  • Array queries return int64 when all points inside, float64 when nan values present
  • Matches behavior of native VertexGrid.intersect() and UnstructuredGrid.intersect() on develop
  • 1:1 point-to-cell mapping
  • All methods guarantee same number of outputs as inputs
  • Outside points get nan instead of being dropped or raising errors (with forgive=True)

Test coverage

  • ~38 tests covering return type consistency, tie-breaking, 3D layered grids, and boundary handling

Regressions

  • grids with less than 10 cells or points perform worse due to overhead associated with new methods, but execute in less than 0.5 ms so it doesn't matter.

@aacovski aacovski marked this pull request as draft November 23, 2025 05:43
@aacovski aacovski force-pushed the feat/unified-geospatial-index branch from 914df20 to 0fffdeb Compare November 23, 2025 20:57
@codecov
Copy link

codecov bot commented Nov 23, 2025

Codecov Report

❌ Patch coverage is 68.70748% with 92 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.6%. Comparing base (556c088) to head (f1996f6).
⚠️ Report is 87 commits behind head on develop.

Files with missing lines Patch % Lines
flopy/utils/geospatial_index.py 65.8% 80 Missing ⚠️
flopy/discretization/structuredgrid.py 87.8% 4 Missing ⚠️
flopy/discretization/unstructuredgrid.py 66.6% 4 Missing ⚠️
flopy/discretization/vertexgrid.py 73.3% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2654      +/-   ##
===========================================
+ Coverage     55.5%    72.6%   +17.1%     
===========================================
  Files          644      669      +25     
  Lines       124135   129738    +5603     
===========================================
+ Hits         68947    94263   +25316     
+ Misses       55188    35475   -19713     
Files with missing lines Coverage Δ
flopy/discretization/structuredgrid.py 53.2% <87.8%> (+5.8%) ⬆️
flopy/discretization/unstructuredgrid.py 75.7% <66.6%> (-5.7%) ⬇️
flopy/discretization/vertexgrid.py 79.2% <73.3%> (-4.4%) ⬇️
flopy/utils/geospatial_index.py 65.8% <65.8%> (ø)

... and 557 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@aacovski aacovski force-pushed the feat/unified-geospatial-index branch from 5fe0dab to 78073f8 Compare November 25, 2025 03:53
@aacovski
Copy link
Contributor Author

aacovski commented Nov 25, 2025

@wpbonelli I’m looking for some guidance on how to handle these tests. The issue arises when a point lies exactly on the vertices of grid cells, making it ambiguous which of the 3-n cells it should belong to:

FAILED test_grid.py::test_intersection - assert 268 == 267

In these non-deterministic cases, different methods return different cell indices. I can force the kdtree-based methods to return the expected value, but that comes with a performance cost. It then causes another test failure (assert DIS == DISV), since those methods also behave differently. I could also update the structured grid path to enforce consistent behavior so all tests pass.

Before making those changes, I’d like your input on the preferred approach to resolve these cascading test failures.

@aacovski
Copy link
Contributor Author

No performance hit with changing tiebreaking

@aacovski aacovski force-pushed the feat/unified-geospatial-index branch from 009e70a to 8fdface Compare November 26, 2025 08:11
@aacovski aacovski marked this pull request as ready for review November 26, 2025 08:29
@dbrakenhoff
Copy link
Contributor

@wpbonelli I’m looking for some guidance on how to handle these tests. The issue arises when a point lies exactly on the vertices of grid cells, making it ambiguous which of the 3-n cells it should belong to:

@aacovski in the gridintersect module, in case of ties, we decided to always return the lowest grid cell number. But I think that any approach as long as it is clearly documented (and maybe consistent across runs) should be fine?

@wpbonelli
Copy link
Member

wpbonelli commented Nov 26, 2025

Agree with @dbrakenhoff I think lowest number is the existing convention so might as well stick to it if there is no performance cost?

@aacovski
Copy link
Contributor Author

Great, that's what I did. I'll consider adding more tests for coverage, but otherwise I think the PR is ready. Thanks

@wpbonelli
Copy link
Member

across all grid types

grids will never be non-convex and always layered

does the latter mean the new grid index and related performance boost don't support DISU?

@aacovski
Copy link
Contributor Author

aacovski commented Nov 26, 2025

does the latter mean the new grid index and related performance boost don't support DISU?

As I understand it, “fully 3D unstructured” grids are more characteristic of FEM models, and I’m not aware of MODFLOW supporting that kind of discretization. That said, my code does support DISU in the sense that varies_by_layers can be set to either True or False... I'm not aware of any FDM or FEM software supporting non-convex elemnts/grids. Please let me know if I'm missing anything.

@aacovski aacovski marked this pull request as draft November 26, 2025 16:06
@aacovski
Copy link
Contributor Author

aacovski commented Nov 26, 2025

I am going to clean up docstrings later today as I noticed some old info. Sorry about that. I want to add some more tests too.

@aacovski aacovski marked this pull request as ready for review November 28, 2025 03:14
@aacovski aacovski force-pushed the feat/unified-geospatial-index branch from 17abb25 to 328fa37 Compare November 28, 2025 03:42
…ructuredGrid

Add KD-tree based spatial indexing with ConvexHull containment for fast
point-to-cell queries on unstructured grids.

Key features:
- KD-tree spatial indexing using cell centroids and vertices
- ConvexHull containment test for accurate point-in-cell detection
- Full 3D support with layer-aware queries using z-coordinate
- Consistent tie-breaking: lowest cell ID wins for boundary points
- 1:1 point-to-cell mapping (same number of outputs as inputs)
- Consistent return types: nan for outside, int/int64 for inside

Changes:
- Remove StructuredGrid support (uses its own optimized searchsorted)
- StructuredGrid.intersect now returns (nan, nan, nan) for outside with z
- query_point returns np.nan for outside (not None)
- query_points returns int64 when all inside, float64 when nan present
- Comprehensive test coverage for return type consistency

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@aacovski aacovski force-pushed the feat/unified-geospatial-index branch from 328fa37 to 90209e1 Compare November 28, 2025 03:44
@wpbonelli
Copy link
Member

@aacovski I guess I just didn't properly understand what you meant by "always layered"

Copy link
Member

@wpbonelli wpbonelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's maybe worth noting that the index assumes that x/y cell centers are in fact centroid coords? presumably the index requires this, rather than simple geometric mean, for polygons for which they aren't equivalent? maybe we should clarify in the docstrings on the grid classes as "cell centers" is ambiguous?

@aacovski
Copy link
Contributor Author

aacovski commented Nov 29, 2025

it's maybe worth noting that the index assumes that x/y cell centers are in fact centroid coords?

I hadn’t considered that the provided cell centers might not correspond to geometric centroids even for convex polygons. The index shouldn’t assume that, so I’ll update to clarify how cell centers are interpreted. Thanks for pointing it out!

adacovsk and others added 2 commits November 29, 2025 09:42
Update docstrings in GeospatialIndex, VertexGrid, and UnstructuredGrid
to clarify that xcellcenters/ycellcenters are user-provided cell center
coordinates, not necessarily true geometric centroids. For convex cells
the difference is negligible, but this clarifies the semantic meaning.

Also removes structured grid tests from test_geospatial_index.py since
GeospatialIndex only supports vertex/unstructured grids (tests already
exist in test_grid.py).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add reusable grid factory methods to test_grid_cases.py (vertex_minimal,
  unstructured_minimal, vertex_rectangular, etc.)
- Add shared fixtures to conftest.py wrapping GridCases methods
- Move unique GeospatialIndex tests to test_grid.py (test_index_build,
  test_index_repr, test_index_k_parameter, test_index_thin_sliver_cell)
- Remove redundant tests already covered by existing intersect tests
- Delete test_geospatial_index.py

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@aacovski aacovski marked this pull request as draft November 30, 2025 22:17
… xc/yc

- Make is_3d a property that uses grid.grid_varies_by_layer directly
- Fix _build_3d_index to map node IDs to layer-local cell IDs when
  xcellcenters/ycellcenters are per-layer (fewer values than nnodes)
- Vectorize centroid building in _build_3d_index
- Fix unstructured_layered() test grid to use grid_varies_by_layer=False
  (2D indexing with layer search)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@aacovski aacovski marked this pull request as ready for review December 1, 2025 00:08
@wpbonelli
Copy link
Member

The index shouldn’t assume that [the provided cell centers correspond to geometric centroids], so I’ll update to clarify how cell centers are interpreted.

Ah, ok- and the index doesn't even assume the center is inside the cell? It just performs better if so?

I hadn’t considered that the provided cell centers might not correspond to geometric centroids even for convex polygons.

I think the intention was/is for it to be cell centroid, even though we don't check this when provided by the user or read from a binary grid file. Evidence:

Maybe something to clarify in the MF6IO guide. For DIS it doesn't matter but it becomes relevant for DISV/U.

@wpbonelli
Copy link
Member

@aacovski it wasn't really clear from previous comment but I think it's best to revert the docstrings to say centroid rather than leaving it ambiguous. my first comment was really to point out the (unwanted?) ambiguity in the existing grid classes, not to say your new ones should change

@wpbonelli
Copy link
Member

Another thought: why not move the structured grid searchsorted query implementation to GeospatialIndex? That way all the Grid.intersect methods could use the index. Since the index receives the grid on init, it knows what kind it is. This seems a bit neater, and also opens the way for cases where a structured grid query might still find the KD tree useful, e.g. ball queries.

Also just want to say, thanks for working on this. This seems like it will be a great asset both in the near term and longer term- in 4.x it could sit behind xarray-based user APIs, whether builtin .sel syntax (point queries) or, for the fancy stuff, maybe an accessor.

@aacovski
Copy link
Contributor Author

aacovski commented Dec 2, 2025

I'll work on implementing that: moving intersect to base class and remove the subclass intersect methods. Regarding the docstrings: since there is a difference between centroid and centers, based on everything I've read, it makes more sense to use the more general case (centers) rather than centroids which could be inaccurate.

@wpbonelli
Copy link
Member

I'm pretty sure "cell centers" is meant as centroids, that's why I'm suggesting to change the doctrings up in the grids themselves. But I could be wrong. Hopefully someone with more tenure can chime in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants