Skip to content

Implement endpoints to list items from dataset revisions #5052

@leoll2

Description

@leoll2

After #4995, finalize the implementation of the endpoints to list the items of a dataset revision. Note that dataset revisions are persisted as parquet files (as opposed to Sqlite), so it's necessary to query the polars dataframe to find the records corresponding to the items.

This issue requires some exploration of the technical paths, to answer the following questions:

  • Is it better to store the dataset revisions as zip or uncompressed? (ref) How much disk space can we save on average by storing the datasets compressed? And how much overhead (latency) to decompress the dataset, wrt to the dataset size?
  • Is it better to implement our query logic with polars, loading the parquet file directly, or should we load the dataset through Datumaro and delegate the filtering to it?

Metadata

Metadata

Assignees

Labels

Geti Tune BackendIssues related to Geti Tune Studio backend

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions