-
Notifications
You must be signed in to change notification settings - Fork 105
Open
Description
We notice here that the final result of this code is a reward curve, but it shows an upward trend and does not converge to a seemingly dynamically stable value. I personally feel that the convergence effect is a bit poor. Do you have any solutions to this problem? My learning_max_episode has a value of 100 and max_ep_steps has a value of 500
Metadata
Metadata
Assignees
Labels
No labels