Automated workflow for model performance evaluation

## Objective

Setup an automated workflow to evaluate the performance of OTX models on a curated set of testing datasets, to facilitate the comparison between different models or different implementations of the training pipeline.

## Motivation

OTX is a framework for training vision models. To keep pace with recent developments in deep learning, engineers are constantly required to implement new features and therefore introduce changes to the training pipeline.
At the same time, it is crucial that such changes do not introduce bugs and regressions in the already supported features and architectures: all supported models must keep working and achieve the expected results in terms of accuracy, training time, and inference speed. To this end, models must be periodically tested and benchmarked: however, the model evaluation is currently carried out _manually_ by the engineers themselves. This is not only very time-consuming, but also error-prone, not transparent and not easy to reproduce. We would like to automate this step, so that engineers can fully focus on the development of new features while ensuring that model performance remains optimal.

## Requirements

- The workflow must automatically pull the required input data (dataset, initial weights) from a publicly available archive.
- The workflow must be compatible with all models, and allow the developers to select a subset of models to test (single model, all models or a list).
- The workflow must run one or more full experiments with a specific model architecture, covering different scenarios
  - Examples: train with default parameters, train with tiling, etc...
- The workflow must create a report of the experiments, including information about the final model performance (e.g. accuracy) and total training time
  - Recommended: the workflow should compare the results with the reports generated from the latest release and the previous run on develop.
- The workflow should be parametrized to run on different runners with different HW setups, ideally covering all classes of supported accelerators (GPU, XPU, ...)
- The workflow may run on a daily/weekly basis or on-demand.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Automated workflow for model performance evaluation #5046

Objective

Motivation

Requirements

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Automated workflow for model performance evaluation #5046

Description

Objective

Motivation

Requirements

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions