-
Notifications
You must be signed in to change notification settings - Fork 383
Open
Labels
0.16integrationIssues related to integrations with other libraries, like huggingface, vllm, sglang, gemlite etc.Issues related to integrations with other libraries, like huggingface, vllm, sglang, gemlite etc.multibackendquantize_quantize_ APIquantize_ APItriaged
Description
Request from @SunMarc:
Huggingface transformer would like to get better support for MacOS users through torchao, we have some prototype/experimental API and kernels but there are a few issues we need to resolve to make it useful for huggingface transformer users:
* installation -> requires users to checkout to v0.13 and build the lib for macos
* api -> don't work out of the box with our integration but we can write our own implementation using directly UIntxWeightOnlyQuantizedLinear is needed. This will enable us to quantize on the fly.
* no bias needed. Right now, this implementation only works with models that don't have bias. It would be nice to remove this restriction to enable more models (e.g qwen) .
For installation:
USE_CPP=1 BUILD_TORCHAO_EXPERIMENTAL=1 TORCHAO_BUILD_EXPERIMENTAL_MPS=1 TORCHAO_BUILD_MPS_OPS=1 pip install -e . --no-build-isolation --no-cache-dir
gives:
File "/Users/marcsun/Desktop/HF/transformers/src/transformers/cli/serve.py", line 1839, in _load_model_and_data_processor
model = quantizer.quantize(model)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marcsun/Desktop/HF/ao/torchao/experimental/quant_api.py", line 183, in quantize
_replace_linear_with_quantized_linear_mps(
File "/Users/marcsun/Desktop/HF/ao/torchao/experimental/quant_api.py", line 125, in _replace_linear_with_quantized_linear_mps
_replace_linear_with_quantized_linear_mps(child, kwargs)
File "/Users/marcsun/Desktop/HF/ao/torchao/experimental/quant_api.py", line 125, in _replace_linear_with_quantized_linear_mps
_replace_linear_with_quantized_linear_mps(child, kwargs)
File "/Users/marcsun/Desktop/HF/ao/torchao/experimental/quant_api.py", line 125, in _replace_linear_with_quantized_linear_mps
_replace_linear_with_quantized_linear_mps(child, kwargs)
[Previous line repeated 1 more time]
File "/Users/marcsun/Desktop/HF/ao/torchao/experimental/quant_api.py", line 129, in _replace_linear_with_quantized_linear_mps
pack_weight_op=getattr(torch.ops.torchao, f"_pack_weight_{nbit}bit"),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marcsun/miniconda3/envs/test/lib/python3.11/site-packages/torch/_ops.py", line 1365, in __getattr__
raise AttributeError(
AttributeError: '_OpNamespace' 'torchao' object has no attribute '_pack_weight_6bit'
Target to be fixed in 0.16 (~end Jan 2026)
SunMarc
Metadata
Metadata
Assignees
Labels
0.16integrationIssues related to integrations with other libraries, like huggingface, vllm, sglang, gemlite etc.Issues related to integrations with other libraries, like huggingface, vllm, sglang, gemlite etc.multibackendquantize_quantize_ APIquantize_ APItriaged