- The inference of my model in pytorch takes 8ms
- the inference of the tensor-rt model inside a triton server (according to the metrics) takes ~8ms
- the overall inference time with grpc overhead is ~60ms
how can serialization and deserialization take x10 more time then the actual inference ?
is this a bug or am i doing something wrong?