triton not suitable for live inference


 * The inference of my model in pytorch takes 8ms
 * the inference of the tensor-rt model inside a triton server (according to the metrics) takes ~8ms
 * the overall inference time with grpc overhead is ~60ms

how can serialization and deserialization take x10 more time then the actual inference ?
is this a bug or am i doing something wrong?