Skip to content

Automated workflow for model performance evaluation #5046

@leoll2

Description

@leoll2

Objective

Setup an automated workflow to evaluate the performance of OTX models on a curated set of testing datasets, to facilitate the comparison between different models or different implementations of the training pipeline.

Motivation

OTX is a framework for training vision models. To keep pace with recent developments in deep learning, engineers are constantly required to implement new features and therefore introduce changes to the training pipeline.
At the same time, it is crucial that such changes do not introduce bugs and regressions in the already supported features and architectures: all supported models must keep working and achieve the expected results in terms of accuracy, training time, and inference speed. To this end, models must be periodically tested and benchmarked: however, the model evaluation is currently carried out manually by the engineers themselves. This is not only very time-consuming, but also error-prone, not transparent and not easy to reproduce. We would like to automate this step, so that engineers can fully focus on the development of new features while ensuring that model performance remains optimal.

Requirements

  • The workflow must automatically pull the required input data (dataset, initial weights) from a publicly available archive.
  • The workflow must be compatible with all models, and allow the developers to select a subset of models to test (single model, all models or a list).
  • The workflow must run one or more full experiments with a specific model architecture, covering different scenarios
    • Examples: train with default parameters, train with tiling, etc...
  • The workflow must create a report of the experiments, including information about the final model performance (e.g. accuracy) and total training time
    • Recommended: the workflow should compare the results with the reports generated from the latest release and the previous run on develop.
  • The workflow should be parametrized to run on different runners with different HW setups, ideally covering all classes of supported accelerators (GPU, XPU, ...)
  • The workflow may run on a daily/weekly basis or on-demand.

Metadata

Metadata

Assignees

No one assigned

    Labels

    VALIDATIONAny changes in validation codes

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions