Skip to content

Conversation

@AlexPasqua
Copy link
Contributor

@AlexPasqua AlexPasqua commented Jul 6, 2025

Implementation of Hybrid PPO

Description

Closes #202
(Description will follow)

Context

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist:

  • I've read the CONTRIBUTION guide (required)
  • The functionality/performance matches that of the source (required for new training algorithms or training-related features).
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have included an example of using the feature (required for new features).
  • I have included baseline results (required for new training algorithms or training-related features).
  • I have updated the documentation accordingly.
  • I have updated the changelog accordingly (required).
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)

Note: we are using a maximum length of 127 characters per line

AlexPasqua and others added 10 commits July 6, 2025 16:00
Created Hybrid distr, populated HybridDirstribution and forward method (tbc)

Co-authored-by: simrey <simonereynoso@gmail.com>
collect_rollouts overrides the one of PPO (in turn inherited by OnPolicyAlgorithm).
It requires to implement new environment and rollout buffer, as they need to work with multiple actions (discrete and continuous).

Co-authored-by: simrey <simonereynoso@gmail.com>
Calls super-method as done in PPO

Co-authored-by: simrey <simonereynoso@gmail.com>
Co-authored-by: simrey <simonereynoso@gmail.com>
- Added evaluate_actions in HybridActorCriticPolicy
- Completed log_prob and entropy of HybridDistribution

Co-authored-by: simrey <simonereynoso@gmail.com>
@araffin
Copy link
Member

araffin commented Sep 26, 2025

Related https://github.com/adysonmaia/sb3-plus (I rediscovered it recently)

AlexPasqua and others added 9 commits October 4, 2025 16:55
RolloutBuffer for hybrid actions.
HybridPPO.train() not adapted yet

Co-authored-by: simrey <simonereynoso@gmail.com>
Co-authored-by: simrey <simonereynoso@gmail.com>
Co-authored-by: simrey <simonereynoso@gmail.com>
Plus fix some imports

Co-authored-by: simrey <simonereynoso@gmail.com>
…te with the library

PPO and the base algorithm do some validation on the action space. If we want our HybridPPO to be subclass of PPO, we need this wrapper for the integration with the library.

Co-authored-by: simrey <simonereynoso@gmail.com>
Co-authored-by: simrey <simonereynoso@gmail.com>
… integrate with the library"

This reverts commit 365bdb9.
@AlexPasqua
Copy link
Contributor Author

Hi @araffin, I have a little problem here:

Our class HybridPPO requires an action space made of a gym.spaces.Tuple containing a gym.spaces.MultiDiscrete and a gym.spaces.Box. In short speudocode: spaces.Tuple[spaces.MultiDiscrete, spaces.Box].

HybridPPO inherits from sb3's PPO, which passes supported_action_spaces to its super-class (OnPolicyAlgorithm) in a hard-coded way. Unfortunately spaces.Tuple is not included among the supported spaces, and this causes an error when creating a HybridPPO object.

Alternatives

Now, we have a couple of alternatives:

  1. Add supported_action_spaces as an optional parameter of PPO's constructor;
  2. Make HybridPPO inherit from OnPolicyAlgorithm instead of PPO so we can pass supported_action_spaces as an argument.

My opinion

Alternative 1 is better.

It adds flexibility without overhead (the parameter is optional). Regardless of how the code in this PR evolves, this change to PPO's parameters would be barely noticeable to sb3's users.
Furthermore, if we need to make HybridPPO inherit from OnPolicyAlgorithm (as per alternative 2), this would require us to re-implement a lot of the code that is already present in PPO with minor changes (kind of duplicate code).

Action items

If you want to proceed with alternative 1, I'd be happy to open a PR in sb3 for it (including creating an issue to discuss the change beforehand, if necessary).

Let me know what you think so we can be unblocked 😄

@araffin
Copy link
Member

araffin commented Nov 6, 2025

Make HybridPPO inherit from OnPolicyAlgorithm

This is fine, I don't think there should be too many duplicated code because of that, no?
(that's what we do already for the other variants of PPO)

@AlexPasqua
Copy link
Contributor Author

AlexPasqua commented Nov 9, 2025

Make HybridPPO inherit from OnPolicyAlgorithm

This is fine, I don't think there should be too many duplicated code because of that, no? (that's what we do already for the other variants of PPO)

Ok, done in the latest commit.
You were right, it wasn't so much extra code in the end. Thanks for the quick reply 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Hybrid PPO

2 participants