-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Description
When building a TensorRT-LLM engine for Qwen3-235B (MoE) in a rootless Podman container, the build process completes the engine serialization but then hangs indefinitely at the log message:
[TensorRT-LLM][INFO] Engine version 1.0.0 found in the config file, assuming engine(s) built by new builder API. Not sure why this happened?
Triton Information
I am using the image nvcr.io/nvidia/tritonserver:25.10-trtllm-python-py3 and run it by podman in a rootless mode.
To Reproduce
podman run --rm -it \
--gpus all \
--security-opt label=disable \
--network host \
--shm-size=64g \
--pids-limit=-1 \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
nvcr.io/nvidia/tritonserver:25.10-trtllm-python-py3 bash \
python3 examples/qwen/convert_checkpoint.py \
--model_dir /models/source \
--output_dir /models/engines/ckpt \
--dtype auto\
--tp_size 2 \
--pp_size 2 \
--moe_tp_size 2 \
--workers 4
trtllm-build --checkpoint_dir /models/engines/ckpt \
--output_dir /models/engines/final \
--gemm_plugin auto \
--max_batch_size 256 \
--max_input_len 2048 \
--max_seq_len 8192 \
--max_num_tokens 16384 \
--workers 4
python3 /app/scripts/launch_triton_server.py --world_size=4 --model_repo=/models
Expected behavior
After running the launch_triton_server.py, it should appear the model loaded, i.e. preprocessing, tensorrtllm, postprocessing and listening port, 8000, 8001, 8002. However it stuck at [TensorRT-LLM][INFO] Engine version 1.0.0 found in the config file, assuming engine(s) built by new builder API.