Skip to content

Conversation

@jcaip
Copy link
Contributor

@jcaip jcaip commented Dec 5, 2025

This PR adds in a new static quant workflow based off of Int8Tensor (#3407).

It introduces a new config, Int8StaticActivationInt8WeightConfig which requires a scale tensor and granularity

static_config = Int8StaticActivationInt8WeightConfig(
    scale=int8_input.scale.detach(), granularity=PerRow
)
quantize_(model_static_quant, static_config)

Currently PerRow and PerTensor symmetric quant is support only.

This scale tensor is stored on the weight Int8Tensor under activation_scale, and is used to create a new activation Int8Tensor for static quantization.

It would be nice to store this scale tensor in QuantizeTensorToInt8Kwargs but unfortunately this breaks dynamo tracing, as we store the quant kwargs as an object for the weight tensor and we are unable to fakeify them properly.

As a result, we need to keep track and pass scale outside of this Kwargs object.

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3442

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b5309eb with merge base c4273fe (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 5, 2025
@jcaip jcaip changed the base branch from main to jcaip/int8-tensor December 5, 2025 00:27
@jcaip jcaip added the topic: new feature Use this tag if this PR adds a new feature label Dec 7, 2025
@jcaip jcaip marked this pull request as ready for review December 7, 2025 00:05
@jcaip jcaip changed the title int8 static quant Add int8 static quantization workflow Dec 7, 2025
@jcaip jcaip changed the base branch from jcaip/int8-tensor to main December 7, 2025 00:07
@jcaip jcaip closed this Dec 7, 2025
@jcaip jcaip reopened this Dec 7, 2025
else:
# Scale can be provided in the case of static quant
assert scale.ndim == hp_tensor.ndim
if isinstance(granularity, PerTensor):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I changed these checks in #3468

Copy link
Contributor

@jerryzh168 jerryzh168 Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we typically also check the shape of scale tensor as well, like these

def _is_rowwise_scaled(x: torch.Tensor) -> bool:
"""Checks if a quantized tensor is rowwise scaled
Args:
x: quantized tensor (should have `block_size` attribute)
"""
assert hasattr(x, "block_size"), "Expecting input to have `block_size` attribute"
return tuple(x.block_size) == (1,) * (x.dim() - 1) + (x.shape[-1],)
def _is_tensorwise_scaled(x: torch.Tensor) -> bool:
"""Checks if a quantized tensor is rowwise scaled
Args:
x: quantized tensor (should have `block_size` attribute)
"""
assert hasattr(x, "block_size"), "Expecting input to have `block_size` attribute"
return all(
x.block_size[i] == -1 or x.block_size[i] == x.shape[i] for i in range(x.ndim)
)

@jcaip jcaip requested a review from jerryzh168 December 8, 2025 22:28
Comment on lines 1671 to 1672
if isinstance(self.granularity, PerTensor):
assert self.scale.numel() == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: also check the shapes, and check PerRow as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to enable scales to be None, for passing Int8StaticActivationInt8WeightConfig() as a base config, we can discuss on #3468

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK makes sense, sounds good, this is easier for user I think, otherwise they have to do a separate flow to get the scale here

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg, see some comments inline

@jcaip jcaip merged commit f99105a into main Dec 9, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: new feature Use this tag if this PR adds a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants