This is not an issue per say, maybe a modification/extension
I feel there should be an argument to set the learning rate at epoch zero, then gradually increase it to the target learning rate over some number of epochs (=5 in the paper).
Let me know what you guys think