❓ Question
Hi, I saw that one of the lowering pass TRTorch has is lowering linear to mm + add. I'm wondering what the reason behind this is. Does TensorRT provide better performance with matmul layer + elementwise sum layer than fully connected layer? Or breaking it down help the fusion process in TensorRT?