Description
The current script suggests to use --enforce-eager in vllm server
However, this disables the cuda graph acceleration, and as a result, the token generation on A6000 (Holoscan IGX, arm64) is very slow (11.2 tokes/s)
Possible solution
We can add documentation about the flag.
Also, removing the flag will increase the speed 4x (45.9 tokens/s )