Improve Qwen3-Next Speed #17585

lovedheart · 2025-11-29T00:38:11Z

Improve Qwen3Next inference speed

Fix Qwen3Next inference speed.

@pwilkin: I don't have a discrete GPU, could you check this PR improves the speed?

main

============================================================
                    Performance Report
============================================================
Operation Name           Total(ms)   Calls     Avg(ms)        Pct(%)
------------------------------------------------------------
attn_calc                26.204      3408      0.008          22.78
ffn_calc                 23.581      3408      0.007          20.50
attn_calc_linear         19.204      2556      0.008          16.69
fwd_exp_upd_recur_sts    15.883      2556      0.006          13.81   <- Bottleneck
build_delta_net          11.294      2556      0.004          9.82
build_dn_recur           8.553       2484      0.003          7.43
attn_calc_full           6.771       852       0.008          5.89
build_dn_chunking        2.575       72        0.036          2.24
decay_mask               0.314       2484      0.000          0.27
broadcast                0.160       2484      0.000          0.14
ssm_conv                 0.144       2556      0.000          0.13
value_mul_mat            0.107       2484      0.000          0.09
solve_tri_recur          0.097       2484      0.000          0.08
cumsum_dn                0.074       2484      0.000          0.06
fwd_expand               0.049       71        0.001          0.04
ggml_exp                 0.028       2484      0.000          0.02
------------------------------------------------------------
Total                    115.038     100.00
============================================================

Performance Bottleneck Analysis:
- Most time-consuming operation: attn_calc (26.204 ms)   <- Bottleneck, here should be ffn calc
- Longest average time: build_dn_chunking (0.036 ms)

PR

============================================================
                    Performance Report
============================================================
Operation Name           Total(ms)   Calls     Avg(ms)        Pct(%)
------------------------------------------------------------
ffn_calc                 56.509      4800      0.012          47.68
build_delta_net          15.740      3600      0.004          13.28
attn_calc                14.720      4800      0.003          12.42
build_dn_recur           13.227      3528      0.004          11.16
attn_calc_full           9.955       1200      0.008          8.40
attn_calc_linear         4.460       3600      0.001          3.76
build_dn_chunking        2.197       72        0.031          1.85
ssm_conv                 0.540       3600      0.000          0.46
decay_mask               0.312       3528      0.000          0.26
value_mul_mat            0.251       3528      0.000          0.21
broadcast                0.159       3528      0.000          0.13
fwd_exp_upd_recur_sts    0.150       3600      0.000          0.13
solve_tri_recur          0.149       3528      0.000          0.13
fwd_expand               0.082       100       0.001          0.07
ggml_exp                 0.041       3528      0.000          0.03
cumsum_dn                0.037       3528      0.000          0.03
------------------------------------------------------------
Total                    118.529     100.00
============================================================

Performance Bottleneck Analysis:
- Most time-consuming operation: ffn_calc (56.509 ms)
- Longest average time: build_dn_chunking (0.031 ms)

Added performance metrics tracking to qwen3next model, including operation timing, reporting, and CSV export functionality.

pwilkin · 2025-11-29T01:05:13Z

This performance measuring function only computes the time for the graph building - which is completely negligible compared to the graph computation. I don't think you're getting anything meaningful this way.

Improve Qwen3-Next Speed

1a77d97

Added performance metrics tracking to qwen3next model, including operation timing, reporting, and CSV export functionality.

github-actions bot added the model Model specific label Nov 29, 2025

loci-dev mentioned this pull request Nov 29, 2025

UPSTREAM PR #17585: Improve Qwen3-Next Speed auroralabs-loci/llama.cpp#356

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Qwen3-Next Speed #17585

Improve Qwen3-Next Speed #17585

lovedheart commented Nov 29, 2025

Uh oh!

pwilkin commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve Qwen3-Next Speed #17585

Are you sure you want to change the base?

Improve Qwen3-Next Speed #17585

Conversation

lovedheart commented Nov 29, 2025

Uh oh!

pwilkin commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants