Revamp int4 MPS support in torchao

Request from @SunMarc:

Huggingface transformer would like to get better support for MacOS users through torchao, we have some prototype/experimental API and kernels but there are a few issues we need to resolve to make it useful for huggingface transformer users:

```
* installation -> requires users to checkout to v0.13 and build the lib for macos
* api -> don't work out of the box with our integration but we can write our own implementation using directly UIntxWeightOnlyQuantizedLinear is needed. This will enable us to quantize on the fly.
* no bias needed. Right now, this implementation only works with models that don't have bias. It would be nice to remove this restriction to enable more models (e.g qwen) .
```

For installation:
```
USE_CPP=1 BUILD_TORCHAO_EXPERIMENTAL=1 TORCHAO_BUILD_EXPERIMENTAL_MPS=1 TORCHAO_BUILD_MPS_OPS=1 pip install -e . --no-build-isolation --no-cache-dir
```

gives:

```
  File "/Users/marcsun/Desktop/HF/transformers/src/transformers/cli/serve.py", line 1839, in _load_model_and_data_processor
    model = quantizer.quantize(model)
            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marcsun/Desktop/HF/ao/torchao/experimental/quant_api.py", line 183, in quantize
    _replace_linear_with_quantized_linear_mps(
  File "/Users/marcsun/Desktop/HF/ao/torchao/experimental/quant_api.py", line 125, in _replace_linear_with_quantized_linear_mps
    _replace_linear_with_quantized_linear_mps(child, kwargs)
  File "/Users/marcsun/Desktop/HF/ao/torchao/experimental/quant_api.py", line 125, in _replace_linear_with_quantized_linear_mps
    _replace_linear_with_quantized_linear_mps(child, kwargs)
  File "/Users/marcsun/Desktop/HF/ao/torchao/experimental/quant_api.py", line 125, in _replace_linear_with_quantized_linear_mps
    _replace_linear_with_quantized_linear_mps(child, kwargs)
  [Previous line repeated 1 more time]
  File "/Users/marcsun/Desktop/HF/ao/torchao/experimental/quant_api.py", line 129, in _replace_linear_with_quantized_linear_mps
    pack_weight_op=getattr(torch.ops.torchao, f"_pack_weight_{nbit}bit"),
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marcsun/miniconda3/envs/test/lib/python3.11/site-packages/torch/_ops.py", line 1365, in __getattr__
    raise AttributeError(
AttributeError: '_OpNamespace' 'torchao' object has no attribute '_pack_weight_6bit'
```

Target to be fixed in 0.16 (~end Jan 2026)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revamp int4 MPS support in torchao #3446

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Revamp int4 MPS support in torchao #3446

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions