-
Notifications
You must be signed in to change notification settings - Fork 459
Open
Labels
Geti Tune BackendIssues related to Geti Tune Studio backendIssues related to Geti Tune Studio backend
Description
After #4995, finalize the implementation of the endpoints to list the items of a dataset revision. Note that dataset revisions are persisted as parquet files (as opposed to Sqlite), so it's necessary to query the polars dataframe to find the records corresponding to the items.
This issue requires some exploration of the technical paths, to answer the following questions:
- Is it better to store the dataset revisions as zip or uncompressed? (ref) How much disk space can we save on average by storing the datasets compressed? And how much overhead (latency) to decompress the dataset, wrt to the dataset size?
- Is it better to implement our query logic with polars, loading the parquet file directly, or should we load the dataset through Datumaro and delegate the filtering to it?
Metadata
Metadata
Assignees
Labels
Geti Tune BackendIssues related to Geti Tune Studio backendIssues related to Geti Tune Studio backend