Skip to content

Performance of extract_points vs rasterio sample #81

@martinfleis

Description

@martinfleis

One of the questions in the recent Earthmover's webinar on xvec was about the performance of extract_points compared to rasterio's sample method. I have never tested this before so wanted to give it a go and for a large lazy-loaded raster (digital terrain model), our extract_points is waaaay slower. See https://notebooksharing.space/view/4459f651d27b2f214f8590c30aba0782f7515d4c16b00dcd3283492f19f8e694#displayOptions=

The DTM is from https://geoportalpraha.cz/en/data-and-services/97d2c9c11aa9478cb21b469b8a4f820e in case you'd like to test the same but any raster should do the trick I assume.

Under the hood, extract_points is simply passing the coordinates to .sel with method='nearest', which should be doing exactly the same as rasterio's sample.

xvec/xvec/accessor.py

Lines 1261 to 1263 in 66b541b

subset = self._obj.sel(
{x_coords: x_, y_coords: y_}, method="nearest", tolerance=tolerance
)

This is not optimal.

We could possibly use sample via rioxarray if the raster is loaded via xarray as it is available through dtm_da.rio._manager.acquire().sample(list(zip(x, y))) but that is relying on a private API of rasterio. I'll open an issue there if there's an appetite to expose sample on the rio accessor.

Outside of relying on rasterio, is there a way of speeding it up using some xarray magic?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions