[Bug]: vLLM cold start on MOE models not optimal

### Your current environment

main

### 🐛 Describe the bug

Previously, for dense models, with FX graph splitting, vLLM produces 3 unique graphs. (the model is split at the attention operator). The graph split ends up producing ~50 graphs, and we only needed to compile 3 unique graphs out of the 50.

Looking at a tlparse for llama4 maverick, which is an MOE model:
- every other layer has a MOE (instead of an nn.Linear feedforward)
- this means there should be at most <= 6 unique graphs

However, all of the graph splits that has an moe operator (e.g. torch.ops.vllm.moe_forward) are actually unique, so there are at least 25 unique graphs that need to be compiled.

The only difference between the graphs is the name of the MOE layer.

We should hide the name of the MOE layer from existing in the graph (maybe via context) to avoid this and bring the number of unique graphs back to <=6 for MOE models.

<img width="1431" height="708" alt="Image" src="https://github.com/user-attachments/assets/edc2cb00-b76f-4c88-ac85-3cbc0e32c5f0" />

<img width="1474" height="521" alt="Image" src="https://github.com/user-attachments/assets/1c46f6aa-6548-40f8-8441-8bec2d3427ad" />

cc @ProExpertProg 

Also, this potentially has implications for switching to inductor graph partition. Depending on what model we were actually benchmarking (I hope we were benchmarking a dense model?) the compile time speedup/slowdown number might change after this.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: vLLM cold start on MOE models not optimal #29992

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: vLLM cold start on MOE models not optimal #29992

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions