Skip to content

[FP16] Improved performance by fusing dequantize with compute in kernels: 20-30% Inference Speedup #159

[FP16] Improved performance by fusing dequantize with compute in kernels: 20-30% Inference Speedup

[FP16] Improved performance by fusing dequantize with compute in kernels: 20-30% Inference Speedup #159

build-and-run (ptx)

succeeded Dec 8, 2025 in 8m 49s