Add tests for PR #302 numerical stability fix (issue #81) #311
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Adds comprehensive test coverage for the numerical stability improvements in PR #302, which fixes intermittent
ValueErrorfailures in MaskablePPO training caused by precision issues when using-1e8for masked logits.Tests added:
test_numerical_stability_with_masking: Validates noValueErrorwithvalidate_args=Trueacross various logit ranges and mask patternstest_entropy_with_all_but_one_masked: Edge case verification (single valid action → entropy ≈ 0)test_repeated_masking_stability: Ensures no numerical drift from repeated mask applicationstest_masked_actions_have_zero_probability: Verifies masked actions get zero probabilitytest_entropy_numerical_stability_with_masking: Tests entropy computation with diverse masking scenariosNote: The original bug was intermittent and difficult to reproduce deterministically. These tests verify correct behavior and serve as regression tests rather than attempting to trigger the original bug.
Context
Related to issue #81 and PR #302.
Types of changes
Checklist:
make format(required)make check-codestyleandmake lint(required)make pytestandmake typeboth pass. (required)Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.