Skip to content

Why do we aim for 1MiB (uncompressed) chunks when auto-sharding? #3602

@ilan-gold

Description

@ilan-gold

Zarr version

v3.1.4

Numcodecs version

0.16.5

Python Version

3.12

Operating System

Mac

Installation

uv run

Description

I think this setting:

if shard_shape is None:
_shards_out: None | tuple[int, ...] = None
if chunk_shape == "auto":
_chunks_out = _guess_chunks(array_shape, item_size)
else:
_chunks_out = chunk_shape
else:
if chunk_shape == "auto":
# aim for a 1MiB chunk
_chunks_out = _guess_chunks(array_shape, item_size, max_bytes=1024)
else:
_chunks_out = chunk_shape

of 1MiB target is way too small. I don't really get why it should be different than the non-auto-sharding case

Steps to reproduce

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues

import zarr
import numpy as np

with zarr.config.set({"array.target_shard_size_bytes": 10000000}):
    z = zarr.open("foo.zarr", mode="w")
    arr = z.create_array("arr", shards="auto", data=np.arange(100_000))
    print(arr.chunks) # tiny! 98! That's below a page size on most systems, even uncompressed!

Additional output

platform: macOS-15.1-arm64-arm-64bit
python: 3.12.3
zarr: 3.1.6.dev2+g94d543ccf

**Required dependencies:**
packaging: 25.0
numpy: 2.3.5
numcodecs: 0.16.5
typing_extensions: 4.15.0
donfig: 0.8.1.post1

**Optional dependencies:**
numcodecs: 0.16.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions