-
Notifications
You must be signed in to change notification settings - Fork 222
Hybrid PPO #300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Hybrid PPO #300
Conversation
Created Hybrid distr, populated HybridDirstribution and forward method (tbc) Co-authored-by: simrey <simonereynoso@gmail.com>
collect_rollouts overrides the one of PPO (in turn inherited by OnPolicyAlgorithm). It requires to implement new environment and rollout buffer, as they need to work with multiple actions (discrete and continuous). Co-authored-by: simrey <simonereynoso@gmail.com>
Calls super-method as done in PPO Co-authored-by: simrey <simonereynoso@gmail.com>
Co-authored-by: simrey <simonereynoso@gmail.com>
- Added evaluate_actions in HybridActorCriticPolicy - Completed log_prob and entropy of HybridDistribution Co-authored-by: simrey <simonereynoso@gmail.com>
|
Related https://github.com/adysonmaia/sb3-plus (I rediscovered it recently) |
RolloutBuffer for hybrid actions. HybridPPO.train() not adapted yet Co-authored-by: simrey <simonereynoso@gmail.com>
Co-authored-by: simrey <simonereynoso@gmail.com>
Co-authored-by: simrey <simonereynoso@gmail.com>
Plus fix some imports Co-authored-by: simrey <simonereynoso@gmail.com>
…te with the library PPO and the base algorithm do some validation on the action space. If we want our HybridPPO to be subclass of PPO, we need this wrapper for the integration with the library. Co-authored-by: simrey <simonereynoso@gmail.com>
Co-authored-by: simrey <simonereynoso@gmail.com>
This reverts commit 392e60f.
… integrate with the library" This reverts commit 365bdb9.
|
Hi @araffin, I have a little problem here: Our class
AlternativesNow, we have a couple of alternatives:
My opinionAlternative 1 is better. It adds flexibility without overhead (the parameter is optional). Regardless of how the code in this PR evolves, this change to PPO's parameters would be barely noticeable to sb3's users. Action itemsIf you want to proceed with alternative 1, I'd be happy to open a PR in sb3 for it (including creating an issue to discuss the change beforehand, if necessary). Let me know what you think so we can be unblocked 😄 |
This is fine, I don't think there should be too many duplicated code because of that, no? |
Ok, done in the latest commit. |
Implementation of Hybrid PPO
Description
Closes #202
(Description will follow)
Context
Types of changes
Checklist:
make format(required)make check-codestyleandmake lint(required)make pytestandmake typeboth pass. (required)Note: we are using a maximum length of 127 characters per line