-
-
Notifications
You must be signed in to change notification settings - Fork 366
Add properties to get chunk and shard slices #3573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e3bee0b to
f5ea95a
Compare
| return await self.store_path.store.getsize_prefix(self.store_path.path) | ||
|
|
||
| @property | ||
| def chunk_slices(self) -> Generator[tuple[slice, ...]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpicking the name: i find "slice" to be kind of ambiguous between a verb and a noun, and it also locks us in to returning slice objects, so what if we use the word "region" instead?
and also I think it's helpful if the name of this routine makes it clear that it's an iterator. So what if we call it iter_chunk_regions, i.e. the name of the routine it wraps 😜
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re. name, I think "slices" is unambiguously a noun because it's plural? I looked at NumPy (https://numpy.org/doc/stable/user/basics.indexing.html#slicing-and-striding), and they use the terms "index", "selection tuple", or "slicing tuple". If we try and stay consistent with NumPy, how about "chunk_indices"?
I like "regions", but to me it's ambiguous whether that means the index, or the array data at that index. If we settle on it that could be fixed by documentation and consistent use though.
re. iterator, do you mean generator? A list/string etc. are also iterators, but using a yield makes this into more specifically a generator.
So perhaps generate_chunk_indices? I find that a bit clunky though,
for chunk_index in arr.chunk_indices:is much nicer than
for chunk_index in arr.generate_chunk_indices:(or iter_chunk_indices)
which is why I prefer simply chunk_indices. I don't knkow if there's prior art in other libraries for naming generators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generators are iterators, and my thinking was that putting "iter" in the name conveys that users should expect to iterate over the value returned by calling this method.
when I think of iterating over indices, I think of iterating over tuples of coordinates, e.g., (0, 0, 0), (0, 0, 1), .... which is what _iter_chunk_coords / _iter_shard_coords do right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like "regions", but to me it's ambiguous whether that means the index, or the array data at that index. If we settle on it that could be fixed by documentation and consistent use though.
I worry that this ambiguity will hold for any name we pick :)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3573 +/- ##
==========================================
- Coverage 61.87% 61.83% -0.04%
==========================================
Files 85 85
Lines 10134 10146 +12
==========================================
+ Hits 6270 6274 +4
- Misses 3864 3872 +8
🚀 New features to boost your workflow:
|
Fixes #2454.