Skip to content

'Qwen2Model' object has no attribute 'model' #3335

@Sunhill666

Description

@Sunhill666

System Info

Environment:

  • Docker image: ghcr.io/huggingface/text-generation-inference:latest
  • GPU: Single Nvidia RTX 4090
  • Model: Qwen2.5-VL-3B-Instruct

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

When I load only the base model, everything runs successfully. However, when I try to use a LoRA adapter, I encounter a problem:

2025-10-10T04:56:41.424770Z  INFO text_generation_launcher: Args {
    model_id: "/data/models/Qwen/Qwen2.5-VL-3B-Instruct/",
    revision: None,
    validation_workers: 2,
    sharded: None,
    num_shard: None,
    quantize: None,
    speculate: None,
    dtype: None,
    kv_cache_dtype: None,
    trust_remote_code: false,
    max_concurrent_requests: 128,
    max_best_of: 2,
    max_stop_sequences: 4,
    max_top_n_tokens: 5,
    max_input_tokens: None,
    max_input_length: None,
    max_total_tokens: None,
    waiting_served_ratio: 0.3,
    max_batch_prefill_tokens: None,
    max_batch_total_tokens: None,
    max_waiting_tokens: 20,
    max_batch_size: None,
    cuda_graphs: None,
    hostname: "45deee43948c",
    port: 80,
    prometheus_port: 9000,
    shard_uds_path: "/tmp/text-generation-server",
    master_addr: "localhost",
    master_port: 29500,
    huggingface_hub_cache: None,
    weights_cache_override: None,
    disable_custom_kernels: false,
    cuda_memory_fraction: 1.0,
    rope_scaling: None,
    rope_factor: None,
    json_output: false,
    otlp_endpoint: None,
    otlp_service_name: "text-generation-inference.router",
    cors_allow_origin: [],
    api_key: None,
    watermark_gamma: None,
    watermark_delta: None,
    ngrok: false,
    ngrok_authtoken: None,
    ngrok_edge: None,
    tokenizer_config_path: None,
    disable_grammar_support: false,
    env: false,
    max_client_batch_size: 4,
    lora_adapters: Some(
        "/data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oceanarium/checkpoint-103",
    ),
    usage_stats: On,
    payload_limit: 2000000,
    enable_prefill_logprobs: false,
    graceful_termination_timeout: 90,
}
2025-10-10T04:56:42.094905Z  INFO text_generation_launcher: Disabling prefix caching because of VLM model
2025-10-10T04:56:42.094914Z  INFO text_generation_launcher: Using attention flashinfer - Prefix caching 0
2025-10-10T04:56:42.124502Z  WARN text_generation_launcher: Unkown compute for card nvidia-geforce-rtx-4090
2025-10-10T04:56:42.139062Z  INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 10000
2025-10-10T04:56:42.139075Z  INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2025-10-10T04:56:42.139154Z  INFO download: text_generation_launcher: Starting check and download process for /data/models/Qwen/Qwen2.5-VL-3B-Instruct/
2025-10-10T04:56:44.131476Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.
2025-10-10T04:56:44.548817Z  INFO download: text_generation_launcher: Successfully downloaded weights for /data/models/Qwen/Qwen2.5-VL-3B-Instruct/
2025-10-10T04:56:44.548877Z  INFO download: text_generation_launcher: Starting check and download process for /data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oceanarium/checkpoint-103
2025-10-10T04:56:46.526146Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.
2025-10-10T04:56:46.956859Z  INFO download: text_generation_launcher: Successfully downloaded weights for /data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oceanarium/checkpoint-103
2025-10-10T04:56:46.957039Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2025-10-10T04:56:49.016998Z  INFO text_generation_launcher: Using prefix caching = False
2025-10-10T04:56:49.017018Z  INFO text_generation_launcher: Using Attention = flashinfer
2025-10-10T04:56:52.520365Z  WARN text_generation_launcher: LoRA adapters enabled (experimental feature).
2025-10-10T04:56:52.520383Z  WARN text_generation_launcher: LoRA adapters incompatible with CUDA Graphs. Disabling CUDA Graphs.
2025-10-10T04:56:55.241607Z  INFO text_generation_launcher: Using prefill chunking = False
2025-10-10T04:56:55.311710Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/usr/src/.venv/bin/text-generation-server", line 10, in <module>
    sys.exit(app())
  File "/usr/src/.venv/lib/python3.11/site-packages/typer/main.py", line 323, in __call__
    return get_command(self)(*args, **kwargs)
  File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/usr/src/.venv/lib/python3.11/site-packages/typer/core.py", line 740, in main
    return _main(
  File "/usr/src/.venv/lib/python3.11/site-packages/typer/core.py", line 195, in _main
    rv = self.invoke(ctx)
  File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/usr/src/.venv/lib/python3.11/site-packages/typer/main.py", line 698, in wrapper
    return callback(**use_params)
  File "/usr/src/server/text_generation_server/cli.py", line 119, in serve
    server.serve(
  File "/usr/src/server/text_generation_server/server.py", line 313, in serve
    asyncio.run(
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
    self.run_forever()
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
    self._run_once()
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
    handle._run()
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/events.py", line 84, in _run
    self._context.run(self._callback, *self._args)
> File "/usr/src/server/text_generation_server/server.py", line 266, in serve_inner
    model = get_model_with_lora_adapters(
  File "/usr/src/server/text_generation_server/models/__init__.py", line 1830, in get_model_with_lora_adapters
    target_to_layer = build_layer_weight_lookup(model.model)
  File "/usr/src/server/text_generation_server/utils/adapter.py", line 307, in build_layer_weight_lookup
    m = model.text_model.model
  File "/usr/src/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1940, in __getattr__
    raise AttributeError(
AttributeError: 'Qwen2Model' object has no attribute 'model'
2025-10-10T04:56:56.472363Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

2025-10-10 04:56:47.736 | INFO     | text_generation_server.utils.import_utils:<module>:76 - Detected system cuda
/usr/src/.venv/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:158: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd
/usr/src/.venv/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:231: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  @custom_bwd
/usr/src/.venv/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:507: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd
/usr/src/.venv/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:566: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  @custom_bwd
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
You are using a model of type qwen2_5_vl to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /usr/src/server/text_generation_server/cli.py:119 in serve                   │
│                                                                              │
│   116 │   │   raise RuntimeError(                                            │
│   117 │   │   │   "Only 1 can be set between `dtype` and `quantize`, as they │
│   118 │   │   )                                                              │
│ ❱ 119 │   server.serve(                                                      │
│   120 │   │   model_id,                                                      │
│   121 │   │   lora_adapters,                                                 │
│   122 │   │   revision,                                                      │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │             dtype = None                                                 │ │
│ │       json_output = True                                                 │ │
│ │    kv_cache_dtype = None                                                 │ │
│ │      logger_level = 'INFO'                                               │ │
│ │     lora_adapters = [                                                    │ │
│ │                     │   AdapterInfo(                                     │ │
│ │                     │   │                                                │ │
│ │                     id='/data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oceana… │ │
│ │                     │   │   path=None,                                   │ │
│ │                     │   │   revision=None                                │ │
│ │                     │   )                                                │ │
│ │                     ]                                                    │ │
│ │  max_input_tokens = None                                                 │ │
│ │          model_id = '/data/models/Qwen/Qwen2.5-VL-3B-Instruct/'          │ │
│ │     otlp_endpoint = None                                                 │ │
│ │ otlp_service_name = 'text-generation-inference.router'                   │ │
│ │          quantize = None                                                 │ │
│ │          revision = None                                                 │ │
│ │            server = <module 'text_generation_server.server' from         │ │
│ │                     '/usr/src/server/text_generation_server/server.py'>  │ │
│ │           sharded = False                                                │ │
│ │         speculate = None                                                 │ │
│ │ trust_remote_code = False                                                │ │
│ │          uds_path = PosixPath('/tmp/text-generation-server')             │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /usr/src/server/text_generation_server/server.py:313 in serve                │
│                                                                              │
│   310 │   │   while signal_handler.KEEP_PROCESSING:                          │
│   311 │   │   │   await asyncio.sleep(0.5)                                   │
│   312 │                                                                      │
│ ❱ 313 │   asyncio.run(                                                       │
│   314 │   │   serve_inner(                                                   │
│   315 │   │   │   model_id,                                                  │
│   316 │   │   │   lora_adapters,                                             │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │             dtype = None                                                 │ │
│ │    kv_cache_dtype = None                                                 │ │
│ │     lora_adapters = [                                                    │ │
│ │                     │   AdapterInfo(                                     │ │
│ │                     │   │                                                │ │
│ │                     id='/data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oceana… │ │
│ │                     │   │   path=None,                                   │ │
│ │                     │   │   revision=None                                │ │
│ │                     │   )                                                │ │
│ │                     ]                                                    │ │
│ │  max_input_tokens = None                                                 │ │
│ │          model_id = '/data/models/Qwen/Qwen2.5-VL-3B-Instruct/'          │ │
│ │          quantize = None                                                 │ │
│ │          revision = None                                                 │ │
│ │           sharded = False                                                │ │
│ │         speculate = None                                                 │ │
│ │ trust_remote_code = False                                                │ │
│ │          uds_path = PosixPath('/tmp/text-generation-server')             │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11 │
│ /asyncio/runners.py:190 in run                                               │
│                                                                              │
│   187 │   │   │   "asyncio.run() cannot be called from a running event loop" │
│   188 │                                                                      │
│   189 │   with Runner(debug=debug) as runner:                                │
│ ❱ 190 │   │   return runner.run(main)                                        │
│   191                                                                        │
│   192                                                                        │
│   193 def _cancel_all_tasks(loop):                                           │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │  debug = None                                                            │ │
│ │   main = <coroutine object serve.<locals>.serve_inner at 0x74dbb4a7cf70> │ │
│ │ runner = <asyncio.runners.Runner object at 0x74dbb5f23290>               │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11 │
│ /asyncio/runners.py:118 in run                                               │
│                                                                              │
│   115 │   │                                                                  │
│   116 │   │   self._interrupt_count = 0                                      │
│   117 │   │   try:                                                           │
│ ❱ 118 │   │   │   return self._loop.run_until_complete(task)                 │
│   119 │   │   except exceptions.CancelledError:                              │
│   120 │   │   │   if self._interrupt_count > 0:                              │
│   121 │   │   │   │   uncancel = getattr(task, "uncancel", None)             │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │        context = <_contextvars.Context object at 0x74dbb5e07140>         │ │
│ │           coro = <coroutine object serve.<locals>.serve_inner at         │ │
│ │                  0x74dbb4a7cf70>                                         │ │
│ │           self = <asyncio.runners.Runner object at 0x74dbb5f23290>       │ │
│ │ sigint_handler = functools.partial(<bound method Runner._on_sigint of    │ │
│ │                  <asyncio.runners.Runner object at 0x74dbb5f23290>>,     │ │
│ │                  main_task=<Task finished name='Task-1'                  │ │
│ │                  coro=<serve.<locals>.serve_inner() done, defined at     │ │
│ │                  /usr/src/server/text_generation_server/server.py:242>   │ │
│ │                  exception=AttributeError("'Qwen2Model' object has no    │ │
│ │                  attribute 'model'")>)                                   │ │
│ │           task = <Task finished name='Task-1'                            │ │
│ │                  coro=<serve.<locals>.serve_inner() done, defined at     │ │
│ │                  /usr/src/server/text_generation_server/server.py:242>   │ │
│ │                  exception=AttributeError("'Qwen2Model' object has no    │ │
│ │                  attribute 'model'")>                                    │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11 │
│ /asyncio/base_events.py:654 in run_until_complete                            │
│                                                                              │
│    651 │   │   if not future.done():                                         │
│    652 │   │   │   raise RuntimeError('Event loop stopped before Future comp │
│    653 │   │                                                                 │
│ ❱  654 │   │   return future.result()                                        │
│    655 │                                                                     │
│    656 │   def stop(self):                                                   │
│    657 │   │   """Stop running the event loop.                               │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │   future = <Task finished name='Task-1'                                  │ │
│ │            coro=<serve.<locals>.serve_inner() done, defined at           │ │
│ │            /usr/src/server/text_generation_server/server.py:242>         │ │
│ │            exception=AttributeError("'Qwen2Model' object has no          │ │
│ │            attribute 'model'")>                                          │ │
│ │ new_task = False                                                         │ │
│ │     self = <_UnixSelectorEventLoop running=False closed=True             │ │
│ │            debug=False>                                                  │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /usr/src/server/text_generation_server/server.py:266 in serve_inner          │
│                                                                              │
│   263 │   │   │   server_urls = [local_url]                                  │
│   264 │   │                                                                  │
│   265 │   │   try:                                                           │
│ ❱ 266 │   │   │   model = get_model_with_lora_adapters(                      │
│   267 │   │   │   │   model_id,                                              │
│   268 │   │   │   │   lora_adapters,                                         │
│   269 │   │   │   │   revision,                                              │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │     adapter_to_index = {}                                                │ │
│ │                dtype = None                                              │ │
│ │       kv_cache_dtype = None                                              │ │
│ │            local_url = 'unix:///tmp/text-generation-server-0'            │ │
│ │        lora_adapters = [                                                 │ │
│ │                        │   AdapterInfo(                                  │ │
│ │                        │   │                                             │ │
│ │                        id='/data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oce… │ │
│ │                        │   │   path=None,                                │ │
│ │                        │   │   revision=None                             │ │
│ │                        │   )                                             │ │
│ │                        ]                                                 │ │
│ │     max_input_tokens = None                                              │ │
│ │             model_id = '/data/models/Qwen/Qwen2.5-VL-3B-Instruct/'       │ │
│ │             quantize = None                                              │ │
│ │             revision = None                                              │ │
│ │          server_urls = ['unix:///tmp/text-generation-server-0']          │ │
│ │              sharded = False                                             │ │
│ │            speculate = None                                              │ │
│ │    trust_remote_code = False                                             │ │
│ │             uds_path = PosixPath('/tmp/text-generation-server')          │ │
│ │ unix_socket_template = 'unix://{}-{}'                                    │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /usr/src/server/text_generation_server/models/__init__.py:1830 in            │
│ get_model_with_lora_adapters                                                 │
│                                                                              │
│   1827 │   )                                                                 │
│   1828 │                                                                     │
│   1829 │   if len(lora_adapters) > 0:                                        │
│ ❱ 1830 │   │   target_to_layer = build_layer_weight_lookup(model.model)      │
│   1831 │   │                                                                 │
│   1832 │   │   for index, adapter in enumerate(lora_adapters):               │
│   1833 │   │   │   # The AdapterParameters object allows for merging multipl │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │  adapter_to_index = {}                                                   │ │
│ │             dtype = None                                                 │ │
│ │    kv_cache_dtype = None                                                 │ │
│ │  lora_adapter_ids = [                                                    │ │
│ │                     │                                                    │ │
│ │                     '/data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oceanariu… │ │
│ │                     ]                                                    │ │
│ │     lora_adapters = [                                                    │ │
│ │                     │   AdapterInfo(                                     │ │
│ │                     │   │                                                │ │
│ │                     id='/data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oceana… │ │
│ │                     │   │   path=None,                                   │ │
│ │                     │   │   revision=None                                │ │
│ │                     │   )                                                │ │
│ │                     ]                                                    │ │
│ │  max_input_tokens = None                                                 │ │
│ │             model = <text_generation_server.models.vlm_causal_lm.VlmCau… │ │
│ │                     object at 0x74dbb5f15b90>                            │ │
│ │          model_id = '/data/models/Qwen/Qwen2.5-VL-3B-Instruct/'          │ │
│ │          quantize = None                                                 │ │
│ │          revision = None                                                 │ │
│ │           sharded = False                                                │ │
│ │         speculate = None                                                 │ │
│ │ trust_remote_code = False                                                │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /usr/src/server/text_generation_server/utils/adapter.py:307 in               │
│ build_layer_weight_lookup                                                    │
│                                                                              │
│   304 │   if hasattr(model, "language_model"):                               │
│   305 │   │   m = model.language_model.model                                 │
│   306 │   elif hasattr(model, "text_model"):                                 │
│ ❱ 307 │   │   m = model.text_model.model                                     │
│   308 │   else:                                                              │
│   309 │   │   m = model.model                                                │
│   310                                                                        │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │ model = Qwen2_5VLForConditionalGeneration(                               │ │
│ │           (embed_tokens): TensorParallelEmbedding()                      │ │
│ │           (visual): Qwen2_5VisionModel(                                  │ │
│ │         │   (patch_embedding): Conv3d(3, 1280, kernel_size=(2, 14, 14),  │ │
│ │         stride=(2, 14, 14), bias=False)                                  │ │
│ │         │   (blocks): ModuleList(                                        │ │
│ │         │     (0-31): 32 x Qwen2_5VLVisionBlock(                         │ │
│ │         │   │   (attn): Qwen2_5VLAttention(                              │ │
│ │         │   │     (qkv): TensorParallelColumnLinear(                     │ │
│ │         │   │   │   (linear): FastLinear()                               │ │
│ │         │   │     )                                                      │ │
│ │         │   │     (proj): TensorParallelRowLinear(                       │ │
│ │         │   │   │   (linear): FastLinear()                               │ │
│ │         │   │     )                                                      │ │
│ │         │   │   )                                                        │ │
│ │         │   │   (norm1): FastRMSNorm()                                   │ │
│ │         │   │   (norm2): FastRMSNorm()                                   │ │
│ │         │   │   (mlp): Qwen2_5VLVisionMLP(                               │ │
│ │         │   │     (activation_fn): SiLU()                                │ │
│ │         │   │     (up): TensorParallelColumnLinear(                      │ │
│ │         │   │   │   (linear): FastLinear()                               │ │
│ │         │   │     )                                                      │ │
│ │         │   │     (gate): TensorParallelColumnLinear(                    │ │
│ │         │   │   │   (linear): FastLinear()                               │ │
│ │         │   │     )                                                      │ │
│ │         │   │     (down): TensorParallelRowLinear(                       │ │
│ │         │   │   │   (linear): FastLinear()                               │ │
│ │         │   │     )                                                      │ │
│ │         │   │   )                                                        │ │
│ │         │     )                                                          │ │
│ │         │   )                                                            │ │
│ │         │   (merger): Qwen2_5VLPatchMerger(                              │ │
│ │         │     (patch_merger_ln_q): FastRMSNorm()                         │ │
│ │         │     (fc1): TensorParallelColumnLinear(                         │ │
│ │         │   │   (linear): FastLinear()                                   │ │
│ │         │     )                                                          │ │
│ │         │     (fc2): TensorParallelRowLinear(                            │ │
│ │         │   │   (linear): FastLinear()                                   │ │
│ │         │     )                                                          │ │
│ │         │   )                                                            │ │
│ │           )                                                              │ │
│ │           (text_model): Qwen2Model(                                      │ │
│ │         │   (layers): ModuleList(                                        │ │
│ │         │     (0-35): 36 x Qwen2Layer(                                   │ │
│ │         │   │   (self_attn): Qwen2Attention(                             │ │
│ │         │   │     (rotary_emb):                                          │ │
│ │         RotaryPositionEmbeddingMultimodalSections()                      │ │
│ │         │   │     (query_key_value): TensorParallelMultiAdapterLinear(   │ │
│ │         │   │   │   (base_layer): TensorParallelColumnLinear(            │ │
│ │         │   │   │     (linear): FastLinear()                             │ │
│ │         │   │   │   )                                                    │ │
│ │         │   │     )                                                      │ │
│ │         │   │     (o_proj): TensorParallelAdapterRowLinear(              │ │
│ │         │   │   │   (base_layer): TensorParallelRowLinear(               │ │
│ │         │   │   │     (linear): FastLinear()                             │ │
│ │         │   │   │   )                                                    │ │
│ │         │   │     )                                                      │ │
│ │         │   │   )                                                        │ │
│ │         │   │   (mlp): Qwen2MLP(                                         │ │
│ │         │   │     (act): SiLU()                                          │ │
│ │         │   │     (gate_up_proj): TensorParallelMultiAdapterLinear(      │ │
│ │         │   │   │   (base_layer): TensorParallelColumnLinear(            │ │
│ │         │   │   │     (linear): FastLinear()                             │ │
│ │         │   │   │   )                                                    │ │
│ │         │   │     )                                                      │ │
│ │         │   │     (down_proj): TensorParallelAdapterRowLinear(           │ │
│ │         │   │   │   (base_layer): TensorParallelRowLinear(               │ │
│ │         │   │   │     (linear): FastLinear()                             │ │
│ │         │   │   │   )                                                    │ │
│ │         │   │     )                                                      │ │
│ │         │   │   )                                                        │ │
│ │         │   │   (input_layernorm): FastRMSNorm()                         │ │
│ │         │   │   (post_attention_layernorm): FastRMSNorm()                │ │
│ │         │     )                                                          │ │
│ │         │   )                                                            │ │
│ │         │   (norm): FastRMSNorm()                                        │ │
│ │           )                                                              │ │
│ │           (lm_head): SpeculativeHead(                                    │ │
│ │         │   (head): TensorParallelHead(                                  │ │
│ │         │     (linear): FastLinear()                                     │ │
│ │         │   )                                                            │ │
│ │           )                                                              │ │
│ │         )                                                                │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /usr/src/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1940  │
│ in __getattr__                                                               │
│                                                                              │
│   1937 │   │   │   modules = self.__dict__["_modules"]                       │
│   1938 │   │   │   if name in modules:                                       │
│   1939 │   │   │   │   return modules[name]                                  │
│ ❱ 1940 │   │   raise AttributeError(                                         │
│   1941 │   │   │   f"'{type(self).__name__}' object has no attribute '{name} │
│   1942 │   │   )                                                             │
│   1943                                                                       │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │    _buffers = {}                                                         │ │
│ │ _parameters = {}                                                         │ │
│ │     modules = {                                                          │ │
│ │               │   'layers': ModuleList(                                  │ │
│ │                 (0-35): 36 x Qwen2Layer(                                 │ │
│ │               │   (self_attn): Qwen2Attention(                           │ │
│ │               │     (rotary_emb):                                        │ │
│ │               RotaryPositionEmbeddingMultimodalSections()                │ │
│ │               │     (query_key_value): TensorParallelMultiAdapterLinear( │ │
│ │               │   │   (base_layer): TensorParallelColumnLinear(          │ │
│ │               │   │     (linear): FastLinear()                           │ │
│ │               │   │   )                                                  │ │
│ │               │     )                                                    │ │
│ │               │     (o_proj): TensorParallelAdapterRowLinear(            │ │
│ │               │   │   (base_layer): TensorParallelRowLinear(             │ │
│ │               │   │     (linear): FastLinear()                           │ │
│ │               │   │   )                                                  │ │
│ │               │     )                                                    │ │
│ │               │   )                                                      │ │
│ │               │   (mlp): Qwen2MLP(                                       │ │
│ │               │     (act): SiLU()                                        │ │
│ │               │     (gate_up_proj): TensorParallelMultiAdapterLinear(    │ │
│ │               │   │   (base_layer): TensorParallelColumnLinear(          │ │
│ │               │   │     (linear): FastLinear()                           │ │
│ │               │   │   )                                                  │ │
│ │               │     )                                                    │ │
│ │               │     (down_proj): TensorParallelAdapterRowLinear(         │ │
│ │               │   │   (base_layer): TensorParallelRowLinear(             │ │
│ │               │   │     (linear): FastLinear()                           │ │
│ │               │   │   )                                                  │ │
│ │               │     )                                                    │ │
│ │               │   )                                                      │ │
│ │               │   (input_layernorm): FastRMSNorm()                       │ │
│ │               │   (post_attention_layernorm): FastRMSNorm()              │ │
│ │                 )                                                        │ │
│ │               ),                                                         │ │
│ │               │   'norm': FastRMSNorm()                                  │ │
│ │               }                                                          │ │
│ │        name = 'model'                                                    │ │
│ │        self = Qwen2Model(                                                │ │
│ │                 (layers): ModuleList(                                    │ │
│ │               │   (0-35): 36 x Qwen2Layer(                               │ │
│ │               │     (self_attn): Qwen2Attention(                         │ │
│ │               │   │   (rotary_emb):                                      │ │
│ │               RotaryPositionEmbeddingMultimodalSections()                │ │
│ │               │   │   (query_key_value):                                 │ │
│ │               TensorParallelMultiAdapterLinear(                          │ │
│ │               │   │     (base_layer): TensorParallelColumnLinear(        │ │
│ │               │   │   │   (linear): FastLinear()                         │ │
│ │               │   │     )                                                │ │
│ │               │   │   )                                                  │ │
│ │               │   │   (o_proj): TensorParallelAdapterRowLinear(          │ │
│ │               │   │     (base_layer): TensorParallelRowLinear(           │ │
│ │               │   │   │   (linear): FastLinear()                         │ │
│ │               │   │     )                                                │ │
│ │               │   │   )                                                  │ │
│ │               │     )                                                    │ │
│ │               │     (mlp): Qwen2MLP(                                     │ │
│ │               │   │   (act): SiLU()                                      │ │
│ │               │   │   (gate_up_proj): TensorParallelMultiAdapterLinear(  │ │
│ │               │   │     (base_layer): TensorParallelColumnLinear(        │ │
│ │               │   │   │   (linear): FastLinear()                         │ │
│ │               │   │     )                                                │ │
│ │               │   │   )                                                  │ │
│ │               │   │   (down_proj): TensorParallelAdapterRowLinear(       │ │
│ │               │   │     (base_layer): TensorParallelRowLinear(           │ │
│ │               │   │   │   (linear): FastLinear()                         │ │
│ │               │   │     )                                                │ │
│ │               │   │   )                                                  │ │
│ │               │     )                                                    │ │
│ │               │     (input_layernorm): FastRMSNorm()                     │ │
│ │               │     (post_attention_layernorm): FastRMSNorm()            │ │
│ │               │   )                                                      │ │
│ │                 )                                                        │ │
│ │                 (norm): FastRMSNorm()                                    │ │
│ │               )                                                          │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'Qwen2Model' object has no attribute 'model' rank=0
Error: ShardCannotStart
2025-10-10T04:56:56.565882Z ERROR text_generation_launcher: Shard 0 failed to start
2025-10-10T04:56:56.565887Z  INFO text_generation_launcher: Shutting down shards

Here is the command I used:

sudo docker run --gpus all --shm-size 16g -p 8080:80 \
  -v $PWD/output:/data/output \
  -v /mnt/nvme_ssd/models:/data/models \
  ghcr.io/huggingface/text-generation-inference \
  --model-id "/data/models/Qwen/Qwen2.5-VL-3B-Instruct/" \
  --lora-adapters "/data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oceanarium/checkpoint-103"

Expected behavior

Run successfully.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions