[Bug]: MaskablePPO Inaccurate update counting when target_kl early exists

### 🐛 Bug

When MaskablePPO early exits due to target_kl, n_updates is still updated by 'self.n_epochs' instead being incremented only on successful epochs. Therefore if it early exits at epoch 5/10, n_updates will be updated by 10 when it should be updated by 5.

To fix:
Line 413 of ppo_mask.py `self._n_updates += self.n_epochs` should be changed to self._n_updates += 1` and be moved to Line 409 inside the loop. To match normal PPO.

### To Reproduce

```python
from sb3_contrib import MaskablePPO
from sb3_contrib.common.maskable.policies import MaskableActorCriticPolicy
from sb3_contrib.common.envs import InvalidActionEnvDiscrete

env = InvalidActionEnvDiscrete(dim=10, n_invalid_actions=3)

model = MaskablePPO(
    policy=MaskableActorCriticPolicy,
    env=env,
    verbose=1,
    target_kl=0.0003,   #set low to ensure early stop        
)

# 4) Train
model.learn(total_timesteps=100_000)

```


### Relevant log output / Error message

```shell

```

### System Info

_No response_

### Checklist

- [x] I have checked that there is no similar [issue](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues) in the repo
- [x] I have read the [documentation](https://sb3-contrib.readthedocs.io/en/master/)
- [x] I have provided a [minimal and working](https://github.com/DLR-RM/stable-baselines3/issues/982#issuecomment-1197044014) example to reproduce the bug
- [x] I've used the [markdown code blocks](https://help.github.com/en/articles/creating-and-highlighting-code-blocks) for both code and stack traces.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: MaskablePPO Inaccurate update counting when target_kl early exists #292

🐛 Bug

To Reproduce

Relevant log output / Error message

System Info

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: MaskablePPO Inaccurate update counting when target_kl early exists #292

Description

🐛 Bug

To Reproduce

Relevant log output / Error message

System Info

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions