convert: support Mistral 3 Large MoE #17730

ngxson · 2025-12-03T11:39:39Z

WIP, the code is quite ugly for now, but just want to get it to work.

Remember to convert with the --mistral-format argument, as the weight is not yet transformers-compatible

Output ~~F16 weight is 1.35 Terabytes~~ Q8_0 weight is 716GB and I don't have enough hw to test it

Edit: thanks @bartowski1182 for testing it!

Disclaimer: unlike Ministral release, this PR is not affiliated with Mistral Team

NOTE: this PR only covers the conversion to GGUF. the C++ code still missing llama 4 scaling to work, but it will be another PR

bartowski1182 · 2025-12-03T13:49:47Z

So far so good with this, in a couple hours will be able to test generation

bartowski1182 · 2025-12-03T16:48:41Z

seems to work and produce coherent results!

ngxson · 2025-12-03T16:55:50Z

This PR still needs to be clean up before it is ready for review 😅

convert_hf_to_gguf.py

CISC · 2025-12-03T20:44:11Z

convert_hf_to_gguf.py

+        # remap hparams from Mistral MoE format to DeepseekV2 format
+        # we do this way to be able to reuse DeepseekV2Model set_gguf_parameters logic


Somewhat ugly but an acceptable trade-off.

csabakecskemeti · 2025-12-05T02:36:13Z

@ngxson Thank you so much for this.
I've also tested the conversion from your branch the convert script succeeded (with --mistral-format) but at inference time (Q8_0) I've received:
llama_model_load: error loading model: missing tensor 'blk.0.attn_k_b.weight'
Tried F16 too it also failed on the same (it should have been failed on not enough memory)

I've tried your Q4_K_M - seems working just fine.
(now downloading @bartowski1182's Q8_0 version to test on that too)

Is there any other setting or change needed for the conversion?
Note I've used Mistral's own BF16 version as the source, which has now disappeared.

bartowski1182 · 2025-12-05T02:42:26Z

It disappeared?? 👀 I can re-upload if necessary I guess ..

Only difference is using --mistral-format

csabakecskemeti · 2025-12-05T02:50:46Z

Yeah I've used the mistral format. Than I guess I have a corrupted bf16 version (I cannot think of anything else)
Yeah I can't see the BF16 version on HF.
If you ca upload that would be nice.
I made a dequantizer I've used with the Ministral 3 instruct models.
If anyone need it

https://github.com/csabakecskemeti/ministral-3_dequantizer_fp8-bf16

bartowski1182 · 2025-12-05T03:02:10Z

I can see it here:

https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-BF16

csabakecskemeti · 2025-12-05T03:13:25Z

You're right, they just removed it from the collection (it it was ever there :p) there's where I looked for. My bad

CISC · 2025-12-05T09:01:33Z

@ngxson Thank you so much for this. I've also tested the conversion from your branch the convert script succeeded (with --mistral-format) but at inference time (Q8_0) I've received: llama_model_load: error loading model: missing tensor 'blk.0.attn_k_b.weight' Tried F16 too it also failed on the same (it should have been failed on not enough memory)

It looks like @ngxson forgot wkv_b remapping in the cleanup.

CISC

@csabakecskemeti This should work.

CISC · 2025-12-05T09:28:21Z

convert_hf_to_gguf.py

+            name = name.replace(".qscale_act", ".input_scale")
+        if name.endswith(".qscale_weight"):
+            name = name.replace(".qscale_weight", ".weight_scale")
+        if ".experts." in name:


Suggested change

if ".experts." in name:

if ".wkv_b." in name:

name = name.replace(".wkv_b.", ".kv_b_proj.")

if ".experts." in name:

This change gave me:

ValueError: Can not map tensor 'layers.32.attention.k_b_proj.weight'

Yeah, you need the changes below as well (cannot be applied directly because GitHub's "new experience" is useless).

Working so far with the other change included :)

CISC · 2025-12-05T13:29:34Z

gguf-py/gguf/tensor_mapping.py

        MODEL_TENSOR.ATTN_KV_B: (
            "model.layers.{bid}.self_attn.kv_b_proj", # deepseek2
+            "layers.{bid}.attention.wkv_b",           # mistral-large
        ),

        MODEL_TENSOR.ATTN_K_B: (
            "model.layers.{bid}.self_attn.k_b_proj",  # deepseek2
        ),

        MODEL_TENSOR.ATTN_V_B: (
            "model.layers.{bid}.self_attn.v_b_proj",  # deepseek2
        ),


Suggested change

MODEL_TENSOR.ATTN_KV_B: (

"model.layers.{bid}.self_attn.kv_b_proj", # deepseek2

),

MODEL_TENSOR.ATTN_K_B: (

"model.layers.{bid}.self_attn.k_b_proj", # deepseek2

"layers.{bid}.attention.k_b_proj", # mistral-large

),

MODEL_TENSOR.ATTN_V_B: (

"model.layers.{bid}.self_attn.v_b_proj", # deepseek2

"layers.{bid}.attention.v_b_proj", # mistral-large

),

GitHub will mess up the diff here, but you get the gist.

Ah I guess I needed this one too

hmm yeah I didn't notice that the changes was overwritten by git merge. thanks!

(feel free to ping me when these changes are OK to be added)

It's working so far @ngxson but I can wait until I have a quant I can run and do that first to confirm

ngxson added 2 commits December 3, 2025 12:36

convert: support Mistral 3 Large MoE

3e623b9

filter out vision tensors, add missing keys

1a308da

ngxson changed the title ~~convert: support Mistral 3 Large MoE~~ convert: support Mistral 3 Large MoE (need help for testing) Dec 3, 2025

ngxson added 3 commits December 3, 2025 14:08

handle vocab

08e0a4e

add temperature_length

249eda4

fix mscale_all_dim

aebab5f

github-actions bot added the python python script changes label Dec 3, 2025

ngxson marked this pull request as ready for review December 3, 2025 16:54

ngxson requested a review from CISC as a code owner December 3, 2025 16:54

ngxson marked this pull request as draft December 3, 2025 16:55

bartowski1182 mentioned this pull request Dec 3, 2025

Can't convert mistral 3 large #17705

Open

clean up

646e47d

ngxson marked this pull request as ready for review December 3, 2025 18:10

ngxson changed the title ~~convert: support Mistral 3 Large MoE (need help for testing)~~ convert: support Mistral 3 Large MoE Dec 3, 2025

ngxson mentioned this pull request Dec 3, 2025

model: add llama 4 scaling for mistral-large (deepseek arch) #17744

Open

loci-dev mentioned this pull request Dec 3, 2025

UPSTREAM PR #17744: model: add llama 4 scaling for mistral-large (deepseek arch) auroralabs-loci/llama.cpp#423

Open

CISC approved these changes Dec 3, 2025

View reviewed changes

CISC mentioned this pull request Dec 4, 2025

fix: convert_hf_to_gguf - use existing local chat_template if mistral-format model has one. #17749

Merged

Merge branch 'master' into xsn/mistral_large_moe

ab2474d

taronaeo linked an issue Dec 5, 2025 that may be closed by this pull request

Can't convert mistral 3 large #17705

Open

CISC reviewed Dec 5, 2025

View reviewed changes

		# remap hparams from Mistral MoE format to DeepseekV2 format
		# we do this way to be able to reuse DeepseekV2Model set_gguf_parameters logic

convert: support Mistral 3 Large MoE #17730

Are you sure you want to change the base?

convert: support Mistral 3 Large MoE #17730

Conversation

ngxson commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bartowski1182 commented Dec 3, 2025

Uh oh!

bartowski1182 commented Dec 3, 2025

Uh oh!

ngxson commented Dec 3, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

csabakecskemeti commented Dec 5, 2025

Uh oh!

bartowski1182 commented Dec 5, 2025

Uh oh!

csabakecskemeti commented Dec 5, 2025

Uh oh!

bartowski1182 commented Dec 5, 2025

Uh oh!

csabakecskemeti commented Dec 5, 2025

Uh oh!

CISC commented Dec 5, 2025

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ngxson commented Dec 3, 2025 •

edited

Loading