Skip to content

Get stuck on the stage "Engine version 1.0.0 found in the config file, assuming engine(s) built by new builder API." #8540

@ZCJ1111

Description

@ZCJ1111

Description
When building a TensorRT-LLM engine for Qwen3-235B (MoE) in a rootless Podman container, the build process completes the engine serialization but then hangs indefinitely at the log message:
[TensorRT-LLM][INFO] Engine version 1.0.0 found in the config file, assuming engine(s) built by new builder API. Not sure why this happened?

Triton Information
I am using the image nvcr.io/nvidia/tritonserver:25.10-trtllm-python-py3 and run it by podman in a rootless mode.

To Reproduce

podman run --rm -it \
  --gpus all \
  --security-opt label=disable \
  --network host \
  --shm-size=64g \
  --pids-limit=-1 \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
    nvcr.io/nvidia/tritonserver:25.10-trtllm-python-py3 bash \

python3 examples/qwen/convert_checkpoint.py \
    --model_dir /models/source \
    --output_dir /models/engines/ckpt \
    --dtype auto\
    --tp_size 2 \
    --pp_size 2 \
    --moe_tp_size 2 \
    --workers 4

trtllm-build  --checkpoint_dir /models/engines/ckpt \
    --output_dir /models/engines/final \
    --gemm_plugin auto \
    --max_batch_size 256 \
    --max_input_len 2048 \
    --max_seq_len 8192 \
    --max_num_tokens 16384 \
    --workers 4

python3 /app/scripts/launch_triton_server.py --world_size=4 --model_repo=/models

Expected behavior

After running the launch_triton_server.py, it should appear the model loaded, i.e. preprocessing, tensorrtllm, postprocessing and listening port, 8000, 8001, 8002. However it stuck at [TensorRT-LLM][INFO] Engine version 1.0.0 found in the config file, assuming engine(s) built by new builder API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions