While Fine-tuning the distilled model, the performance on the VisDA-C dataset drops, the results are as follows:
seed=2019: 74.98 --> 74.80
seed=2020: 76.17 --> 75.17
seed=2021: 79.8 --> 78.6.
The learning rate setting for VisDA-C dataset may matter.