-
Notifications
You must be signed in to change notification settings - Fork 5
Model Training ‐ Comparison ‐ [Scheduler]
Models | Logs | Graphs | Configs
Scheduler defines the function of LR changing during the training.
Compared values:
-
cosine-BD, -
constant, -
polynomial.

In TensorBoard, in addition to the graphs we have previously discussed, there are also graphs showing the changes in LR for the U-Net and Text Encoder. It's easy to understand which function is which.
DLR(step)


DLR initially gradually increases and then follows the function defined by the Scheduler.
Loss(step)

In the case of GR = 1.02, all the graphs converge into one, which suggests that Scheduler may not have a significant impact on the results.

On the other hand, with GR = ∞, the graphs differ, but it's strange that with constant we get the highest loss, even though intuitively it might seem like it should be the opposite.








As for these grids, it's clear that the assumption was correct - Scheduler doesn't have a significant impact on the result with GR = 1.02. However, with GR = ∞, its influence is hard to ignore. If on the final epochs, cosine and polynomial lead to the model barely learning since DLR gets closer to zero, with constant training continues at full strength.




The results with the cosine and polynomial are similar across all cases, and subjectively, they appear to be better than constant.
Given the similarity between cosine and polynomial schedulers, it's easier to stick with the standard cosine scheduler. In theory, constant scheduler should significantly speed up training as it doesn't slow down over time, but the results may become worse. Also, using constant, we're making our smart adaptive optimizer a silly non-adaptive optimizer.
- Introduction
- Examples
- Dataset Preparation
- Model Training ‐ Introduction
- Model Training ‐ Basics
- Model Training ‐ Comparison - Introduction
Short Way
Long Way
- Model Training ‐ Comparison - [Growth Rate]
- Model Training ‐ Comparison - [Betas]
- Model Training ‐ Comparison - [Weight Decay]
- Model Training ‐ Comparison - [Bias Correction]
- Model Training ‐ Comparison - [Decouple]
- Model Training ‐ Comparison - [Epochs x Repeats]
- Model Training ‐ Comparison - [Resolution]
- Model Training ‐ Comparison - [Aspect Ratio]
- Model Training ‐ Comparison - [Batch Size]
- Model Training ‐ Comparison - [Network Rank]
- Model Training ‐ Comparison - [Network Alpha]
- Model Training ‐ Comparison - [Total Steps]
- Model Training ‐ Comparison - [Scheduler]
- Model Training ‐ Comparison - [Noise Offset]
- Model Training ‐ Comparison - [Min SNR Gamma]
- Model Training ‐ Comparison - [Clip Skip]
- Model Training ‐ Comparison - [Precision]
- Model Training ‐ Comparison - [Number of CPU Threads per Core]
- Model Training ‐ Comparison - [Checkpoint]
- Model Training ‐ Comparison - [Regularisation]
- Model Training ‐ Comparison - [Optimizer]