Fix n_updates counting with early stopping in MaskablePPO and RecurrentPPO #313

alektebel · 2025-11-16T22:40:38Z

Description

This PR fixes incorrect n_updates counting in both MaskablePPO and RecurrentPPO when early stopping is triggered due to the target_kl threshold.

The Problem:

Previously, n_updates was always incremented by the full n_epochs value (default: 10) regardless of actual epochs completed
When target_kl triggered early stopping (e.g., at epoch 5), n_updates would still increment by 10 instead of 5
This caused inaccurate learning rate scheduling, progress tracking, and inconsistent behavior with regular PPO

The Solution:

Move n_updates increment inside the epoch loop to count only completed epochs
Each epoch now increments n_updates by 1, matching the behavior of base PPO
Early stopping now correctly reflects actual training progress

Files Changed:

sb3_contrib/ppo_mask/ppo_mask.py
sb3_contrib/ppo_recurrent/ppo_recurrent.py

Testing

The fix has been verified with:

Reproduction cases showing the bug in both algorithms
Logs demonstrating correct counting after the fix
Early stopping at various steps (0-7) to validate accurate incrementing

Impact

Accurate learning rate scheduling
Proper training progress tracking
Consistent behavior with Stable-Baselines3 PPO
No breaking changes - only affects counting logic

…pping - Count actual epochs completed instead of full n_epochs - Fix affects both algorithms when target_kl triggers early stopping - Ensures accurate learning rate scheduling and progress tracking

Fix n_updates counting in MaskablePPO and RecurrentPPO with early sto…

ea8a8db

…pping - Count actual epochs completed instead of full n_epochs - Fix affects both algorithms when target_kl triggers early stopping - Ensures accurate learning rate scheduling and progress tracking

alektebel mentioned this pull request Nov 16, 2025

[Bug]: MaskablePPO Inaccurate update counting when target_kl early exists #292

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix n_updates counting with early stopping in MaskablePPO and RecurrentPPO #313

Fix n_updates counting with early stopping in MaskablePPO and RecurrentPPO #313

Uh oh!

alektebel commented Nov 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix n_updates counting with early stopping in MaskablePPO and RecurrentPPO #313

Are you sure you want to change the base?

Fix n_updates counting with early stopping in MaskablePPO and RecurrentPPO #313

Uh oh!

Conversation

alektebel commented Nov 16, 2025

Description

Testing

Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant