'Qwen2Model' object has no attribute 'model'

### System Info

**Environment:**

- **Docker image:** `ghcr.io/huggingface/text-generation-inference:latest`
- **GPU:** Single Nvidia RTX 4090
- **Model:** `Qwen2.5-VL-3B-Instruct`

### Information

- [x] Docker
- [ ] The CLI directly

### Tasks

- [x] An officially supported command
- [ ] My own modifications

### Reproduction

When I load only the base model, everything runs successfully. However, when I try to use a LoRA adapter, I encounter a problem:

```text
2025-10-10T04:56:41.424770Z  INFO text_generation_launcher: Args {
    model_id: "/data/models/Qwen/Qwen2.5-VL-3B-Instruct/",
    revision: None,
    validation_workers: 2,
    sharded: None,
    num_shard: None,
    quantize: None,
    speculate: None,
    dtype: None,
    kv_cache_dtype: None,
    trust_remote_code: false,
    max_concurrent_requests: 128,
    max_best_of: 2,
    max_stop_sequences: 4,
    max_top_n_tokens: 5,
    max_input_tokens: None,
    max_input_length: None,
    max_total_tokens: None,
    waiting_served_ratio: 0.3,
    max_batch_prefill_tokens: None,
    max_batch_total_tokens: None,
    max_waiting_tokens: 20,
    max_batch_size: None,
    cuda_graphs: None,
    hostname: "45deee43948c",
    port: 80,
    prometheus_port: 9000,
    shard_uds_path: "/tmp/text-generation-server",
    master_addr: "localhost",
    master_port: 29500,
    huggingface_hub_cache: None,
    weights_cache_override: None,
    disable_custom_kernels: false,
    cuda_memory_fraction: 1.0,
    rope_scaling: None,
    rope_factor: None,
    json_output: false,
    otlp_endpoint: None,
    otlp_service_name: "text-generation-inference.router",
    cors_allow_origin: [],
    api_key: None,
    watermark_gamma: None,
    watermark_delta: None,
    ngrok: false,
    ngrok_authtoken: None,
    ngrok_edge: None,
    tokenizer_config_path: None,
    disable_grammar_support: false,
    env: false,
    max_client_batch_size: 4,
    lora_adapters: Some(
        "/data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oceanarium/checkpoint-103",
    ),
    usage_stats: On,
    payload_limit: 2000000,
    enable_prefill_logprobs: false,
    graceful_termination_timeout: 90,
}
2025-10-10T04:56:42.094905Z  INFO text_generation_launcher: Disabling prefix caching because of VLM model
2025-10-10T04:56:42.094914Z  INFO text_generation_launcher: Using attention flashinfer - Prefix caching 0
2025-10-10T04:56:42.124502Z  WARN text_generation_launcher: Unkown compute for card nvidia-geforce-rtx-4090
2025-10-10T04:56:42.139062Z  INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 10000
2025-10-10T04:56:42.139075Z  INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2025-10-10T04:56:42.139154Z  INFO download: text_generation_launcher: Starting check and download process for /data/models/Qwen/Qwen2.5-VL-3B-Instruct/
2025-10-10T04:56:44.131476Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.
2025-10-10T04:56:44.548817Z  INFO download: text_generation_launcher: Successfully downloaded weights for /data/models/Qwen/Qwen2.5-VL-3B-Instruct/
2025-10-10T04:56:44.548877Z  INFO download: text_generation_launcher: Starting check and download process for /data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oceanarium/checkpoint-103
2025-10-10T04:56:46.526146Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.
2025-10-10T04:56:46.956859Z  INFO download: text_generation_launcher: Successfully downloaded weights for /data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oceanarium/checkpoint-103
2025-10-10T04:56:46.957039Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2025-10-10T04:56:49.016998Z  INFO text_generation_launcher: Using prefix caching = False
2025-10-10T04:56:49.017018Z  INFO text_generation_launcher: Using Attention = flashinfer
2025-10-10T04:56:52.520365Z  WARN text_generation_launcher: LoRA adapters enabled (experimental feature).
2025-10-10T04:56:52.520383Z  WARN text_generation_launcher: LoRA adapters incompatible with CUDA Graphs. Disabling CUDA Graphs.
2025-10-10T04:56:55.241607Z  INFO text_generation_launcher: Using prefill chunking = False
2025-10-10T04:56:55.311710Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/usr/src/.venv/bin/text-generation-server", line 10, in <module>
    sys.exit(app())
  File "/usr/src/.venv/lib/python3.11/site-packages/typer/main.py", line 323, in __call__
    return get_command(self)(*args, **kwargs)
  File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/usr/src/.venv/lib/python3.11/site-packages/typer/core.py", line 740, in main
    return _main(
  File "/usr/src/.venv/lib/python3.11/site-packages/typer/core.py", line 195, in _main
    rv = self.invoke(ctx)
  File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/usr/src/.venv/lib/python3.11/site-packages/typer/main.py", line 698, in wrapper
    return callback(**use_params)
  File "/usr/src/server/text_generation_server/cli.py", line 119, in serve
    server.serve(
  File "/usr/src/server/text_generation_server/server.py", line 313, in serve
    asyncio.run(
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
    self.run_forever()
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
    self._run_once()
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
    handle._run()
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/events.py", line 84, in _run
    self._context.run(self._callback, *self._args)
> File "/usr/src/server/text_generation_server/server.py", line 266, in serve_inner
    model = get_model_with_lora_adapters(
  File "/usr/src/server/text_generation_server/models/__init__.py", line 1830, in get_model_with_lora_adapters
    target_to_layer = build_layer_weight_lookup(model.model)
  File "/usr/src/server/text_generation_server/utils/adapter.py", line 307, in build_layer_weight_lookup
    m = model.text_model.model
  File "/usr/src/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1940, in __getattr__
    raise AttributeError(
AttributeError: 'Qwen2Model' object has no attribute 'model'
2025-10-10T04:56:56.472363Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

2025-10-10 04:56:47.736 | INFO     | text_generation_server.utils.import_utils:<module>:76 - Detected system cuda
/usr/src/.venv/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:158: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd
/usr/src/.venv/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:231: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  @custom_bwd
/usr/src/.venv/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:507: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd
/usr/src/.venv/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:566: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  @custom_bwd
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
You are using a model of type qwen2_5_vl to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /usr/src/server/text_generation_server/cli.py:119 in serve                   │
│                                                                              │
│   116 │   │   raise RuntimeError(                                            │
│   117 │   │   │   "Only 1 can be set between `dtype` and `quantize`, as they │
│   118 │   │   )                                                              │
│ ❱ 119 │   server.serve(                                                      │
│   120 │   │   model_id,                                                      │
│   121 │   │   lora_adapters,                                                 │
│   122 │   │   revision,                                                      │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │             dtype = None                                                 │ │
│ │       json_output = True                                                 │ │
│ │    kv_cache_dtype = None                                                 │ │
│ │      logger_level = 'INFO'                                               │ │
│ │     lora_adapters = [                                                    │ │
│ │                     │   AdapterInfo(                                     │ │
│ │                     │   │                                                │ │
│ │                     id='/data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oceana… │ │
│ │                     │   │   path=None,                                   │ │
│ │                     │   │   revision=None                                │ │
│ │                     │   )                                                │ │
│ │                     ]                                                    │ │
│ │  max_input_tokens = None                                                 │ │
│ │          model_id = '/data/models/Qwen/Qwen2.5-VL-3B-Instruct/'          │ │
│ │     otlp_endpoint = None                                                 │ │
│ │ otlp_service_name = 'text-generation-inference.router'                   │ │
│ │          quantize = None                                                 │ │
│ │          revision = None                                                 │ │
│ │            server = <module 'text_generation_server.server' from         │ │
│ │                     '/usr/src/server/text_generation_server/server.py'>  │ │
│ │           sharded = False                                                │ │
│ │         speculate = None                                                 │ │
│ │ trust_remote_code = False                                                │ │
│ │          uds_path = PosixPath('/tmp/text-generation-server')             │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /usr/src/server/text_generation_server/server.py:313 in serve                │
│                                                                              │
│   310 │   │   while signal_handler.KEEP_PROCESSING:                          │
│   311 │   │   │   await asyncio.sleep(0.5)                                   │
│   312 │                                                                      │
│ ❱ 313 │   asyncio.run(                                                       │
│   314 │   │   serve_inner(                                                   │
│   315 │   │   │   model_id,                                                  │
│   316 │   │   │   lora_adapters,                                             │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │             dtype = None                                                 │ │
│ │    kv_cache_dtype = None                                                 │ │
│ │     lora_adapters = [                                                    │ │
│ │                     │   AdapterInfo(                                     │ │
│ │                     │   │                                                │ │
│ │                     id='/data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oceana… │ │
│ │                     │   │   path=None,                                   │ │
│ │                     │   │   revision=None                                │ │
│ │                     │   )                                                │ │
│ │                     ]                                                    │ │
│ │  max_input_tokens = None                                                 │ │
│ │          model_id = '/data/models/Qwen/Qwen2.5-VL-3B-Instruct/'          │ │
│ │          quantize = None                                                 │ │
│ │          revision = None                                                 │ │
│ │           sharded = False                                                │ │
│ │         speculate = None                                                 │ │
│ │ trust_remote_code = False                                                │ │
│ │          uds_path = PosixPath('/tmp/text-generation-server')             │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11 │
│ /asyncio/runners.py:190 in run                                               │
│                                                                              │
│   187 │   │   │   "asyncio.run() cannot be called from a running event loop" │
│   188 │                                                                      │
│   189 │   with Runner(debug=debug) as runner:                                │
│ ❱ 190 │   │   return runner.run(main)                                        │
│   191                                                                        │
│   192                                                                        │
│   193 def _cancel_all_tasks(loop):                                           │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │  debug = None                                                            │ │
│ │   main = <coroutine object serve.<locals>.serve_inner at 0x74dbb4a7cf70> │ │
│ │ runner = <asyncio.runners.Runner object at 0x74dbb5f23290>               │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11 │
│ /asyncio/runners.py:118 in run                                               │
│                                                                              │
│   115 │   │                                                                  │
│   116 │   │   self._interrupt_count = 0                                      │
│   117 │   │   try:                                                           │
│ ❱ 118 │   │   │   return self._loop.run_until_complete(task)                 │
│   119 │   │   except exceptions.CancelledError:                              │
│   120 │   │   │   if self._interrupt_count > 0:                              │
│   121 │   │   │   │   uncancel = getattr(task, "uncancel", None)             │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │        context = <_contextvars.Context object at 0x74dbb5e07140>         │ │
│ │           coro = <coroutine object serve.<locals>.serve_inner at         │ │
│ │                  0x74dbb4a7cf70>                                         │ │
│ │           self = <asyncio.runners.Runner object at 0x74dbb5f23290>       │ │
│ │ sigint_handler = functools.partial(<bound method Runner._on_sigint of    │ │
│ │                  <asyncio.runners.Runner object at 0x74dbb5f23290>>,     │ │
│ │                  main_task=<Task finished name='Task-1'                  │ │
│ │                  coro=<serve.<locals>.serve_inner() done, defined at     │ │
│ │                  /usr/src/server/text_generation_server/server.py:242>   │ │
│ │                  exception=AttributeError("'Qwen2Model' object has no    │ │
│ │                  attribute 'model'")>)                                   │ │
│ │           task = <Task finished name='Task-1'                            │ │
│ │                  coro=<serve.<locals>.serve_inner() done, defined at     │ │
│ │                  /usr/src/server/text_generation_server/server.py:242>   │ │
│ │                  exception=AttributeError("'Qwen2Model' object has no    │ │
│ │                  attribute 'model'")>                                    │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11 │
│ /asyncio/base_events.py:654 in run_until_complete                            │
│                                                                              │
│    651 │   │   if not future.done():                                         │
│    652 │   │   │   raise RuntimeError('Event loop stopped before Future comp │
│    653 │   │                                                                 │
│ ❱  654 │   │   return future.result()                                        │
│    655 │                                                                     │
│    656 │   def stop(self):                                                   │
│    657 │   │   """Stop running the event loop.                               │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │   future = <Task finished name='Task-1'                                  │ │
│ │            coro=<serve.<locals>.serve_inner() done, defined at           │ │
│ │            /usr/src/server/text_generation_server/server.py:242>         │ │
│ │            exception=AttributeError("'Qwen2Model' object has no          │ │
│ │            attribute 'model'")>                                          │ │
│ │ new_task = False                                                         │ │
│ │     self = <_UnixSelectorEventLoop running=False closed=True             │ │
│ │            debug=False>                                                  │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /usr/src/server/text_generation_server/server.py:266 in serve_inner          │
│                                                                              │
│   263 │   │   │   server_urls = [local_url]                                  │
│   264 │   │                                                                  │
│   265 │   │   try:                                                           │
│ ❱ 266 │   │   │   model = get_model_with_lora_adapters(                      │
│   267 │   │   │   │   model_id,                                              │
│   268 │   │   │   │   lora_adapters,                                         │
│   269 │   │   │   │   revision,                                              │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │     adapter_to_index = {}                                                │ │
│ │                dtype = None                                              │ │
│ │       kv_cache_dtype = None                                              │ │
│ │            local_url = 'unix:///tmp/text-generation-server-0'            │ │
│ │        lora_adapters = [                                                 │ │
│ │                        │   AdapterInfo(                                  │ │
│ │                        │   │                                             │ │
│ │                        id='/data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oce… │ │
│ │                        │   │   path=None,                                │ │
│ │                        │   │   revision=None                             │ │
│ │                        │   )                                             │ │
│ │                        ]                                                 │ │
│ │     max_input_tokens = None                                              │ │
│ │             model_id = '/data/models/Qwen/Qwen2.5-VL-3B-Instruct/'       │ │
│ │             quantize = None                                              │ │
│ │             revision = None                                              │ │
│ │          server_urls = ['unix:///tmp/text-generation-server-0']          │ │
│ │              sharded = False                                             │ │
│ │            speculate = None                                              │ │
│ │    trust_remote_code = False                                             │ │
│ │             uds_path = PosixPath('/tmp/text-generation-server')          │ │
│ │ unix_socket_template = 'unix://{}-{}'                                    │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /usr/src/server/text_generation_server/models/__init__.py:1830 in            │
│ get_model_with_lora_adapters                                                 │
│                                                                              │
│   1827 │   )                                                                 │
│   1828 │                                                                     │
│   1829 │   if len(lora_adapters) > 0:                                        │
│ ❱ 1830 │   │   target_to_layer = build_layer_weight_lookup(model.model)      │
│   1831 │   │                                                                 │
│   1832 │   │   for index, adapter in enumerate(lora_adapters):               │
│   1833 │   │   │   # The AdapterParameters object allows for merging multipl │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │  adapter_to_index = {}                                                   │ │
│ │             dtype = None                                                 │ │
│ │    kv_cache_dtype = None                                                 │ │
│ │  lora_adapter_ids = [                                                    │ │
│ │                     │                                                    │ │
│ │                     '/data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oceanariu… │ │
│ │                     ]                                                    │ │
│ │     lora_adapters = [                                                    │ │
│ │                     │   AdapterInfo(                                     │ │
│ │                     │   │                                                │ │
│ │                     id='/data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oceana… │ │
│ │                     │   │   path=None,                                   │ │
│ │                     │   │   revision=None                                │ │
│ │                     │   )                                                │ │
│ │                     ]                                                    │ │
│ │  max_input_tokens = None                                                 │ │
│ │             model = <text_generation_server.models.vlm_causal_lm.VlmCau… │ │
│ │                     object at 0x74dbb5f15b90>                            │ │
│ │          model_id = '/data/models/Qwen/Qwen2.5-VL-3B-Instruct/'          │ │
│ │          quantize = None                                                 │ │
│ │          revision = None                                                 │ │
│ │           sharded = False                                                │ │
│ │         speculate = None                                                 │ │
│ │ trust_remote_code = False                                                │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /usr/src/server/text_generation_server/utils/adapter.py:307 in               │
│ build_layer_weight_lookup                                                    │
│                                                                              │
│   304 │   if hasattr(model, "language_model"):                               │
│   305 │   │   m = model.language_model.model                                 │
│   306 │   elif hasattr(model, "text_model"):                                 │
│ ❱ 307 │   │   m = model.text_model.model                                     │
│   308 │   else:                                                              │
│   309 │   │   m = model.model                                                │
│   310                                                                        │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │ model = Qwen2_5VLForConditionalGeneration(                               │ │
│ │           (embed_tokens): TensorParallelEmbedding()                      │ │
│ │           (visual): Qwen2_5VisionModel(                                  │ │
│ │         │   (patch_embedding): Conv3d(3, 1280, kernel_size=(2, 14, 14),  │ │
│ │         stride=(2, 14, 14), bias=False)                                  │ │
│ │         │   (blocks): ModuleList(                                        │ │
│ │         │     (0-31): 32 x Qwen2_5VLVisionBlock(                         │ │
│ │         │   │   (attn): Qwen2_5VLAttention(                              │ │
│ │         │   │     (qkv): TensorParallelColumnLinear(                     │ │
│ │         │   │   │   (linear): FastLinear()                               │ │
│ │         │   │     )                                                      │ │
│ │         │   │     (proj): TensorParallelRowLinear(                       │ │
│ │         │   │   │   (linear): FastLinear()                               │ │
│ │         │   │     )                                                      │ │
│ │         │   │   )                                                        │ │
│ │         │   │   (norm1): FastRMSNorm()                                   │ │
│ │         │   │   (norm2): FastRMSNorm()                                   │ │
│ │         │   │   (mlp): Qwen2_5VLVisionMLP(                               │ │
│ │         │   │     (activation_fn): SiLU()                                │ │
│ │         │   │     (up): TensorParallelColumnLinear(                      │ │
│ │         │   │   │   (linear): FastLinear()                               │ │
│ │         │   │     )                                                      │ │
│ │         │   │     (gate): TensorParallelColumnLinear(                    │ │
│ │         │   │   │   (linear): FastLinear()                               │ │
│ │         │   │     )                                                      │ │
│ │         │   │     (down): TensorParallelRowLinear(                       │ │
│ │         │   │   │   (linear): FastLinear()                               │ │
│ │         │   │     )                                                      │ │
│ │         │   │   )                                                        │ │
│ │         │     )                                                          │ │
│ │         │   )                                                            │ │
│ │         │   (merger): Qwen2_5VLPatchMerger(                              │ │
│ │         │     (patch_merger_ln_q): FastRMSNorm()                         │ │
│ │         │     (fc1): TensorParallelColumnLinear(                         │ │
│ │         │   │   (linear): FastLinear()                                   │ │
│ │         │     )                                                          │ │
│ │         │     (fc2): TensorParallelRowLinear(                            │ │
│ │         │   │   (linear): FastLinear()                                   │ │
│ │         │     )                                                          │ │
│ │         │   )                                                            │ │
│ │           )                                                              │ │
│ │           (text_model): Qwen2Model(                                      │ │
│ │         │   (layers): ModuleList(                                        │ │
│ │         │     (0-35): 36 x Qwen2Layer(                                   │ │
│ │         │   │   (self_attn): Qwen2Attention(                             │ │
│ │         │   │     (rotary_emb):                                          │ │
│ │         RotaryPositionEmbeddingMultimodalSections()                      │ │
│ │         │   │     (query_key_value): TensorParallelMultiAdapterLinear(   │ │
│ │         │   │   │   (base_layer): TensorParallelColumnLinear(            │ │
│ │         │   │   │     (linear): FastLinear()                             │ │
│ │         │   │   │   )                                                    │ │
│ │         │   │     )                                                      │ │
│ │         │   │     (o_proj): TensorParallelAdapterRowLinear(              │ │
│ │         │   │   │   (base_layer): TensorParallelRowLinear(               │ │
│ │         │   │   │     (linear): FastLinear()                             │ │
│ │         │   │   │   )                                                    │ │
│ │         │   │     )                                                      │ │
│ │         │   │   )                                                        │ │
│ │         │   │   (mlp): Qwen2MLP(                                         │ │
│ │         │   │     (act): SiLU()                                          │ │
│ │         │   │     (gate_up_proj): TensorParallelMultiAdapterLinear(      │ │
│ │         │   │   │   (base_layer): TensorParallelColumnLinear(            │ │
│ │         │   │   │     (linear): FastLinear()                             │ │
│ │         │   │   │   )                                                    │ │
│ │         │   │     )                                                      │ │
│ │         │   │     (down_proj): TensorParallelAdapterRowLinear(           │ │
│ │         │   │   │   (base_layer): TensorParallelRowLinear(               │ │
│ │         │   │   │     (linear): FastLinear()                             │ │
│ │         │   │   │   )                                                    │ │
│ │         │   │     )                                                      │ │
│ │         │   │   )                                                        │ │
│ │         │   │   (input_layernorm): FastRMSNorm()                         │ │
│ │         │   │   (post_attention_layernorm): FastRMSNorm()                │ │
│ │         │     )                                                          │ │
│ │         │   )                                                            │ │
│ │         │   (norm): FastRMSNorm()                                        │ │
│ │           )                                                              │ │
│ │           (lm_head): SpeculativeHead(                                    │ │
│ │         │   (head): TensorParallelHead(                                  │ │
│ │         │     (linear): FastLinear()                                     │ │
│ │         │   )                                                            │ │
│ │           )                                                              │ │
│ │         )                                                                │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /usr/src/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1940  │
│ in __getattr__                                                               │
│                                                                              │
│   1937 │   │   │   modules = self.__dict__["_modules"]                       │
│   1938 │   │   │   if name in modules:                                       │
│   1939 │   │   │   │   return modules[name]                                  │
│ ❱ 1940 │   │   raise AttributeError(                                         │
│   1941 │   │   │   f"'{type(self).__name__}' object has no attribute '{name} │
│   1942 │   │   )                                                             │
│   1943                                                                       │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │    _buffers = {}                                                         │ │
│ │ _parameters = {}                                                         │ │
│ │     modules = {                                                          │ │
│ │               │   'layers': ModuleList(                                  │ │
│ │                 (0-35): 36 x Qwen2Layer(                                 │ │
│ │               │   (self_attn): Qwen2Attention(                           │ │
│ │               │     (rotary_emb):                                        │ │
│ │               RotaryPositionEmbeddingMultimodalSections()                │ │
│ │               │     (query_key_value): TensorParallelMultiAdapterLinear( │ │
│ │               │   │   (base_layer): TensorParallelColumnLinear(          │ │
│ │               │   │     (linear): FastLinear()                           │ │
│ │               │   │   )                                                  │ │
│ │               │     )                                                    │ │
│ │               │     (o_proj): TensorParallelAdapterRowLinear(            │ │
│ │               │   │   (base_layer): TensorParallelRowLinear(             │ │
│ │               │   │     (linear): FastLinear()                           │ │
│ │               │   │   )                                                  │ │
│ │               │     )                                                    │ │
│ │               │   )                                                      │ │
│ │               │   (mlp): Qwen2MLP(                                       │ │
│ │               │     (act): SiLU()                                        │ │
│ │               │     (gate_up_proj): TensorParallelMultiAdapterLinear(    │ │
│ │               │   │   (base_layer): TensorParallelColumnLinear(          │ │
│ │               │   │     (linear): FastLinear()                           │ │
│ │               │   │   )                                                  │ │
│ │               │     )                                                    │ │
│ │               │     (down_proj): TensorParallelAdapterRowLinear(         │ │
│ │               │   │   (base_layer): TensorParallelRowLinear(             │ │
│ │               │   │     (linear): FastLinear()                           │ │
│ │               │   │   )                                                  │ │
│ │               │     )                                                    │ │
│ │               │   )                                                      │ │
│ │               │   (input_layernorm): FastRMSNorm()                       │ │
│ │               │   (post_attention_layernorm): FastRMSNorm()              │ │
│ │                 )                                                        │ │
│ │               ),                                                         │ │
│ │               │   'norm': FastRMSNorm()                                  │ │
│ │               }                                                          │ │
│ │        name = 'model'                                                    │ │
│ │        self = Qwen2Model(                                                │ │
│ │                 (layers): ModuleList(                                    │ │
│ │               │   (0-35): 36 x Qwen2Layer(                               │ │
│ │               │     (self_attn): Qwen2Attention(                         │ │
│ │               │   │   (rotary_emb):                                      │ │
│ │               RotaryPositionEmbeddingMultimodalSections()                │ │
│ │               │   │   (query_key_value):                                 │ │
│ │               TensorParallelMultiAdapterLinear(                          │ │
│ │               │   │     (base_layer): TensorParallelColumnLinear(        │ │
│ │               │   │   │   (linear): FastLinear()                         │ │
│ │               │   │     )                                                │ │
│ │               │   │   )                                                  │ │
│ │               │   │   (o_proj): TensorParallelAdapterRowLinear(          │ │
│ │               │   │     (base_layer): TensorParallelRowLinear(           │ │
│ │               │   │   │   (linear): FastLinear()                         │ │
│ │               │   │     )                                                │ │
│ │               │   │   )                                                  │ │
│ │               │     )                                                    │ │
│ │               │     (mlp): Qwen2MLP(                                     │ │
│ │               │   │   (act): SiLU()                                      │ │
│ │               │   │   (gate_up_proj): TensorParallelMultiAdapterLinear(  │ │
│ │               │   │     (base_layer): TensorParallelColumnLinear(        │ │
│ │               │   │   │   (linear): FastLinear()                         │ │
│ │               │   │     )                                                │ │
│ │               │   │   )                                                  │ │
│ │               │   │   (down_proj): TensorParallelAdapterRowLinear(       │ │
│ │               │   │     (base_layer): TensorParallelRowLinear(           │ │
│ │               │   │   │   (linear): FastLinear()                         │ │
│ │               │   │     )                                                │ │
│ │               │   │   )                                                  │ │
│ │               │     )                                                    │ │
│ │               │     (input_layernorm): FastRMSNorm()                     │ │
│ │               │     (post_attention_layernorm): FastRMSNorm()            │ │
│ │               │   )                                                      │ │
│ │                 )                                                        │ │
│ │                 (norm): FastRMSNorm()                                    │ │
│ │               )                                                          │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'Qwen2Model' object has no attribute 'model' rank=0
Error: ShardCannotStart
2025-10-10T04:56:56.565882Z ERROR text_generation_launcher: Shard 0 failed to start
2025-10-10T04:56:56.565887Z  INFO text_generation_launcher: Shutting down shards
```

Here is the command I used:

```shell
sudo docker run --gpus all --shm-size 16g -p 8080:80 \
  -v $PWD/output:/data/output \
  -v /mnt/nvme_ssd/models:/data/models \
  ghcr.io/huggingface/text-generation-inference \
  --model-id "/data/models/Qwen/Qwen2.5-VL-3B-Instruct/" \
  --lora-adapters "/data/output/Qwen2.5-VL-3B-Instruct-LoRA-Oceanarium/checkpoint-103"
```

### Expected behavior

Run successfully.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

'Qwen2Model' object has no attribute 'model' #3335

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

'Qwen2Model' object has no attribute 'model' #3335

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions