Fix FP8 linear layer dimension check to prevent runtime error #6393
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #6390
Problem
When
use_fp8=Trueis enabled inHybridParallelPluginand the model has output layers with dimensions not divisible by 16 (e.g., binary classification with 2 outputs), the training fails with:Root Cause
torch._scaled_mmrequires both dimensions of the weight matrix to be divisible by 16. The existing check inlinear_fp8()only validated:input.shape[-1])np.prod(input.shape[:-1]))But it did not check the output dimension (
weight.shape[0]).When using
GPT2ForSequenceClassificationwithnum_labels=2, the score layer has weight shape[768, 2], where 2 is not divisible by 16.Solution
Added a check for
weight.shape[0] % 16 != 0to fallback to regularF.linearwhen the output dimension is not compatible with FP8.Testing
This fix allows the model to:
The change is backward compatible and doesn't affect existing working configurations.