Thanks for sharing the code, i learned a lot from it.
I see that in eval.py evaluation is performed on the target scaled with StandardScaler
I think that evaluation will decrease the actual mse.
I'm new to time series forecasting and don't know if it's reasonable to evaluate it on standardized data.