-
Notifications
You must be signed in to change notification settings - Fork 459
Description
Objective
Setup an automated workflow to evaluate the performance of OTX models on a curated set of testing datasets, to facilitate the comparison between different models or different implementations of the training pipeline.
Motivation
OTX is a framework for training vision models. To keep pace with recent developments in deep learning, engineers are constantly required to implement new features and therefore introduce changes to the training pipeline.
At the same time, it is crucial that such changes do not introduce bugs and regressions in the already supported features and architectures: all supported models must keep working and achieve the expected results in terms of accuracy, training time, and inference speed. To this end, models must be periodically tested and benchmarked: however, the model evaluation is currently carried out manually by the engineers themselves. This is not only very time-consuming, but also error-prone, not transparent and not easy to reproduce. We would like to automate this step, so that engineers can fully focus on the development of new features while ensuring that model performance remains optimal.
Requirements
- The workflow must automatically pull the required input data (dataset, initial weights) from a publicly available archive.
- The workflow must be compatible with all models, and allow the developers to select a subset of models to test (single model, all models or a list).
- The workflow must run one or more full experiments with a specific model architecture, covering different scenarios
- Examples: train with default parameters, train with tiling, etc...
- The workflow must create a report of the experiments, including information about the final model performance (e.g. accuracy) and total training time
- Recommended: the workflow should compare the results with the reports generated from the latest release and the previous run on develop.
- The workflow should be parametrized to run on different runners with different HW setups, ideally covering all classes of supported accelerators (GPU, XPU, ...)
- The workflow may run on a daily/weekly basis or on-demand.