Perplexity vs Size Graphs for the recent quants (Deepseek-V3.1-Terminus, Deepseek-R1, Qwen3-Coder, Kimi-K2, Chimera etc.) #715
Replies: 9 comments 75 replies
-
|
@magikRUKKOLA Thank you for these graphs, very useful! Can one do something to improve discoverability? I personally find it a bit hard to find which point corresponds to which quantization. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks @magikRUKKOLA for putting these together. Always interesting to see which quantization types are performing well on some of these big models. I just added a few more data points to my DeepSeek-V3.1 collection. The IQ4_KSS is doing unreasonably well again right around 4.0BPW. I went back and re-read this earlier discussion on QAT and IQ4_KS here: #359 (comment) and speculating wildly if it could have anything to do with ~4.0BPW being a "sweet spot" in the size vs perplexity trade-off curve.
👈 json data[
{
"name": "BF16",
"ppl": "3.3469 +/- 0.01936",
"size": 1250.084,
"bpw": 16.003,
"legend": "pure"
},
{
"name": "Q8_0",
"ppl": "3.3473 +/- 0.01935",
"size": 664.295,
"bpw": 8.504,
"legend": "pure",
"skip": true
},
{
"name": "IQ5_K",
"ppl": "3.3550 +/- 0.01942",
"size": 465.075,
"bpw": 5.944,
"legend": "ubergarm"
},
{
"name": "IQ4_K",
"ppl": "3.3715 +/- 0.01956",
"size": 384.765,
"bpw": 4.925,
"legend": "ubergarm",
"comment": ""
},
{
"name": "IQ4_KS",
"ppl": "3.3806 +/- 0.01966",
"size": 363.151,
"bpw": 4.649,
"legend": "ubergarm",
"comment": ""
},
{
"name": "Q4_0",
"ppl": "3.4277 +/- 0.02000",
"size": 352.096,
"bpw": 4.507,
"legend": "pure",
"comment": "q4_K embd, q6_K head"
},
{
"name": "IQ4_KSS",
"ppl": "3.3887 +/- 0.01968",
"size": 325.088,
"bpw": 4.162,
"legend": "ubergarm",
"comment": ""
},
{
"name": "smol-IQ4_KSS",
"ppl": "3.3898 +/- 0.01964",
"size": 318.745,
"bpw": 4.080,
"legend": "ubergarm",
"comment": ""
},
{
"name": "IQ3_K",
"ppl": "3.4260 +/- 0.01995",
"size": 293.177,
"bpw": 3.753,
"legend": "ubergarm",
"comment": "PR624 ik/quantization_tweaks"
},
{
"name": "IQ3_KS",
"ppl": "3.4534 +/- 0.02019",
"size": 277.397,
"bpw": 3.551,
"legend": "ubergarm",
"comment": "PR624 ik/quantization_tweaks"
},
{
"name": "IQ2_KL",
"ppl": "3.6312 +/- 0.02161",
"size": 231.206,
"bpw": 2.960,
"legend": "ubergarm",
"comment": "PR624 ik/quantization_tweaks"
},
{
"name": "IQ2_KT",
"ppl": "3.8109 +/- 0.02294",
"size": 204.592,
"bpw": 2.619,
"legend": "ubergarm",
"comment": "PR624 ik/quantization_tweaks + PR to fix KT quantization"
},
{
"name": "IQ2_KS",
"ppl": "3.9583 +/- 0.02433",
"size": 193.144,
"bpw": 2.472,
"legend": "ubergarm",
"comment": "PR624 ik/quantization_tweaks"
},
{
"name": "IQ1_KT",
"ppl": "4.3987 +/- 0.02786",
"size": 154.968,
"bpw": 1.984,
"legend": "ubergarm",
"comment": ""
},
{
"name": "IQ1_S",
"ppl": "5.3113 +/- 0.03507",
"size": 133.610,
"bpw": 1.710,
"legend": "ubergarm",
"comment": ""
}
] |
Beta Was this translation helpful? Give feedback.
-
|
Add my test result : Kimi-K2-Instruct-UD-Q3_K_XL : PPL = 3.2330 +/- 0.01668 |
Beta Was this translation helpful? Give feedback.
-
|
I tried to calculate the Kimi-K2-Instruct-0905-THIREUS-IQ3_K-SPECIAL_SPLIT and got very bad results. PPL 2.7851 with 3.4325bpw? Seems like something is very wrong. I made sure all the split files are valild. Anyways, honestly, its unlikely I will be using Kimi-K2 personally. The DeepSeek-V3.1-Terminus just got released. :) |
Beta Was this translation helpful? Give feedback.
-
|
I was thinking about an automated tool that finds the most optimal quant given the input parameters such as the RAM/VRAM limitations, the prefill/decode speed, the max context length, and the perplexity. So that everyone would find the exact quant they want spending very little work for that. |
Beta Was this translation helpful? Give feedback.
-
|
Qwen3-Coder added. |
Beta Was this translation helpful? Give feedback.
-
|
@magikRUKKOLA : Amazing work!!! Could you please, if that's not too much of a hassle, add the sizes you collected for your graphs to your DATA SOURCES json for readability? I invested into a new mobo/cpu set with 192GB DDR5 (+my existing 64GB of VRAM) and your data are very helpful to figure out grossly what quant I'll need to use those biggies. |
Beta Was this translation helpful? Give feedback.
-
|
So what should I stick to for speed? I have been going by <200GB yet that seems not to be working so well after trying a bunch of GLM REAP quants. Despite using less memory, many reap quants were slower than the full model. Even though I have 4x3090, my PP speeds between 100-200 with smaller batch like 1024 + RTR. And yes, using high batch size can artificially pump up the number but will absolutely decimate your latency on small prompts. When I load, I only do 32k CTX and fill the rest of the GPUs with layers or pieces of the experts. Yet IQ3_XXS pruned quant (130GB) is slower than Q4K or Q3K_XL (160-180gb). I'd try them all but I don't have the b/w to keep downloading so can't get a feel. |
Beta Was this translation helpful? Give feedback.
-
|
related to: #715 (reply in thread) sweep-bench 4k batches with THIREUS-R1-3.5652bpw, 124k ctx, two rtx 3090: So the prefill is about 150 tps. I have no idea why you're getting only 40-something. |
Beta Was this translation helpful? Give feedback.



Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
GRAPHS:
DATA SOURCES:
{ "title": "DeepSeek-V3.1-Terminus (671B) Quantization Analysis", "subtitle": "Lower perplexity = Better performance", "model_parameters": 671000000000, "data": [ {"name": "IQ1_S", "bpw": 1.745, "ppl": 5.4829, "size": 134.45, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/IQ1_S"}, {"name": "IQ1_KT", "bpw": 1.987, "ppl": 4.5310, "size": 154.61, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/IQ1_KT"}, {"name": "IQ2_KS", "bpw": 2.472, "ppl": 4.0280, "size": 190.56, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/IQ2_KS"}, {"name": "IQ2_KL", "bpw": 2.962, "ppl": 3.7112, "size": 228.54, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/IQ2_KL"}, {"name": "IQ3_KS", "bpw": 3.545, "ppl": 3.5174, "size": 276.89, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/IQ3_KS"}, {"name": "IQ3_K", "bpw": 3.724, "ppl": 3.4781, "size": 290.56, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/IQ3_K"}, {"name": "smol-IQ4_KSS", "bpw": 4.080, "ppl": 3.4445, "size": 317.96, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/smol-IQ4_KSS"}, {"name": "IQ4_K", "bpw": 4.896, "ppl": 3.4198, "size": 380.57, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/IQ4_K"}, {"name": "smol-IQ5_KS", "bpw": 5.339, "ppl": 3.4059, "size": 416.27, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/smol-IQ5_KS"}, {"name": "THIREUS-5.4498bpw-R4", "bpw": 5.4498, "ppl": 3.3961, "size": 426.07, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/715#discussioncomment-14579570"}, {"name": "IQ5_K", "bpw": 5.941, "ppl": 3.4000, "size": 462.87, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/IQ5_K"}, {"name": "THIREUS-6.2212bpw", "bpw": 6.2212, "ppl": 3.3949, "size": 485.07, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/715#discussioncomment-14554951"}, {"name": "Q8_0", "bpw": 8.504, "ppl": 3.3929, "size": 660.30, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/Q8_0"} ] } { "title": "DeepSeek-R1-0528 (671B) Quantization Analysis", "subtitle": "Lower perplexity = Better performance", "model_parameters": 671000000000, "data": [ {"name": "IQ1_S_R4", "bpw": 1.664, "ppl": 4.8831, "size": 129.53, "url": "https://huggingface.co/ubergarm/DeepSeek-R1-0528-GGUF/tree/main/IQ1_S_R4"}, {"name": "THIREUS-1.9364", "bpw": 1.9364, "ppl": 4.3533, "size": 150.75, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-1.9364bpw-4.3533ppl.151GB-GGUF_14GB-GPU_203GB-CPU.3c88ec6_9fd615d.recipe"}, {"name": "IQ2_KT", "bpw": 2.514, "ppl": 3.6378, "size": 197.70, "url": null}, {"name": "THIREUS-2.7840", "bpw": 2.7840, "ppl": 3.4341, "size": 216.55, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-2.7840bpw-3.4341ppl.217GB-GGUF_14GB-GPU_203GB-CPU.3c88ec6_02247be.recipe"}, {"name": "IQ2_K_R4", "bpw": 2.799, "ppl": 3.5069, "size": 217.76, "url": "https://huggingface.co/ubergarm/DeepSeek-R1-0528-GGUF/tree/main/IQ2_K_R4"}, {"name": "JWNoctis/R1-0528/IQ2_KL", "bpw": 2.930, "ppl": 3.4379, "size": 227.64, "url": "https://forum.level1techs.com/t/deepseek-deep-dive-r1-at-home/225826/354"}, {"name": "UD_Q2_K_XL", "bpw": 2.994, "ppl": 3.5278, "size": 232.02, "url": "https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF/tree/main/UD-Q2_K_XL"}, {"name": "THIREUS-3.1027", "bpw": 3.1027, "ppl": 3.3372, "size": 240.96, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-3.1027bpw-3.3372ppl.242GB-GGUF_11GB-GPU_231GB-CPU.3c88ec6_adc8101.recipe"}, {"name": "THIREUS-3.1446", "bpw": 3.1446, "ppl": 3.3257, "size": 244.39, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-3.1446bpw-3.3257ppl.246GB-GGUF_15GB-GPU_231GB-CPU.3c88ec6_7d1efe1.recipe"}, {"name": "THIREUS-3.1447", "bpw": 3.1447, "ppl": 3.3269, "size": 244.40, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-3.1447bpw-3.3269ppl.246GB-GGUF_15GB-GPU_231GB-CPU.3c88ec6_4b1254a.recipe"}, {"name": "THIREUS-3.1525", "bpw": 3.1525, "ppl": 3.3251, "size": 245.07, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-3.1525bpw-3.3251ppl.246GB-GGUF_15GB-GPU_231GB-CPU.3c88ec6_5a3fc0f.recipe"}, {"name": "THIREUS-3.1740", "bpw": 3.1740, "ppl": 3.3253, "size": 246.76, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-3.1740bpw-3.3253ppl.248GB-GGUF_17GB-GPU_231GB-CPU.3c88ec6_6cf3a72.recipe"}, {"name": "THIREUS-3.1858", "bpw": 3.1858, "ppl": 3.3261, "size": 247.60, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-3.1858bpw-3.3261ppl.249GB-GGUF_18GB-GPU_231GB-CPU.3c88ec6_027b7ff.recipe"}, {"name": "THIREUS-3.2564", "bpw": 3.2564, "ppl": 3.2985, "size": 253.18, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-3.2564bpw-3.2985ppl.254GB-GGUF_15GB-GPU_239GB-CPU.3c88ec6_7c0be1e.recipe"}, {"name": "IQ3_KT", "bpw": 3.483, "ppl": 3.3056, "size": 267.63, "url": "https://huggingface.co/ubergarm/DeepSeek-R1-0528-GGUF/tree/main/IQ3_KT"}, {"name": "THIREUS-3.5652", "bpw": 3.5652, "ppl": 3.2734, "size": 284.90, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-3.5652bpw-3.2734ppl.278GB-GGUF_14GB-GPU_264GB-CPU.3c88ec6_9b5660b.recipe"}, {"name": "IQ3_KS", "bpw": 3.598, "ppl": 3.2991, "size": 287.54, "url": "https://huggingface.co/ubergarm/DeepSeek-R1-0528-GGUF/tree/main/IQ3_KS"}, {"name": "THIREUS-3.6766", "bpw": 3.6766, "ppl": 3.2741, "size": 293.80, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13781700"}, {"name": "IQ3_K_R4", "bpw": 3.847, "ppl": 3.2730, "size": 306.52, "url": "https://huggingface.co/ubergarm/DeepSeek-R1-0528-GGUF/tree/main/IQ3_K_R4"}, {"name": "THIREUS-3.976", "bpw": 3.976, "ppl": 3.2452, "size": 315.18, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13798329"}, {"name": "IQ4_XS (unsloth)", "bpw": 4.2683, "ppl": 3.2598, "size": 337.03, "url": "https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF/tree/main/IQ4_XS"}, {"name": "q4_0", "bpw": 4.508, "ppl": 3.2895, "size": 356.27, "url": "https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF/tree/main/Q4_0"}, {"name": "UD_Q4_K_XL", "bpw": 4.578, "ppl": 3.2483, "size": 361.92, "url": "https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF/tree/main/UD-Q4_K_XL"}, {"name": "IQ4_KS_R4", "bpw": 4.701, "ppl": 3.2286, "size": 371.94, "url": "https://huggingface.co/ubergarm/DeepSeek-R1-0528-GGUF/tree/main/IQ4_KS_R4"}, {"name":"THIREUS-5.0601","bpw":5.0601,"ppl":3.2223,"size": 397.12, "url":"https://github.com/ikawrakow/ik_llama.cpp/discussions/715#discussioncomment-14625973"}, {"name": "DQ4_K_R4", "bpw": 5.289, "ppl": 3.2276, "size": 415.04, "url": "https://huggingface.co/anikifoss/DeepSeek-R1-0528-DQ4_K_R4"}, {"name": "THIREUS-6.2218", "bpw": 6.2218, "ppl": 3.2240, "size": 486.97, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13781560"}, {"name": "THIREUS-6.4296", "bpw": 6.4296, "ppl": 3.2231, "size": 503.65, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/718#discussioncomment-14193821"}, {"name": "THIREUS-6.5522", "bpw": 6.5522, "ppl": 3.2227, "size": 512.60, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/718#discussioncomment-14193821"}, {"name": "Q8_0", "bpw": 8.5259260, "ppl": 3.2130, "size": 664.33, "url": "https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF/tree/main/Q8_0"} ] } { "title": "DeepSeek-V3.1 (671B) Quantization Analysis", "subtitle": "Lower perplexity = Better performance", "model_parameters": 671000000000, "data": [ { "name": "IQ1_S", "bpw": 1.710, "ppl": 5.3113, "size": 132.84, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ1_S" }, { "name": "IQ1_KT", "bpw": 1.984, "ppl": 4.3987, "size": 153.62, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ1_KT" }, { "name": "IQ2_KS", "bpw": 2.472, "ppl": 3.9583, "size": 190.56, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ2_KS" }, { "name": "IQ2_KT", "bpw": 2.619, "ppl": 3.8109, "size": 201.47, "url": "" }, { "name": "IQ2_KL", "bpw": 2.960, "ppl": 3.6312, "size": 225.58, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ2_KL" }, { "name": "IQ3_KS", "bpw": 3.551, "ppl": 3.4534, "size": 273.06, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ3_KS" }, { "name": "IQ3_K", "bpw": 3.753, "ppl": 3.4260, "size": 287.09, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ3_K" }, { "name": "smol-IQ4_KSS", "bpw": 4.080, "ppl": 3.3898, "size": 317.96, "url": "" }, { "name": "IQ4_KSS", "bpw": 4.162, "ppl": 3.3887, "size": 325.03, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ4_KSS" }, { "name": "Q4_0", "bpw": 4.507, "ppl": 3.4277, "size": 355.86, "url": "" }, { "name": "UD-Q4_K_XL", "bpw": 4.507, "ppl": 3.4013, "size": 355.86, "url": "https://huggingface.co/unsloth/DeepSeek-V3.1-GGUF/tree/main/UD-Q4_K_XL" }, { "name": "IQ4_KS", "bpw": 4.649, "ppl": 3.3806, "size": 367.52, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ4_KS" }, { "name": "IQ4_K", "bpw": 4.925, "ppl": 3.3715, "size": 389.94, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ4_K" }, { "name": "IQ5_K", "bpw": 5.944, "ppl": 3.3550, "size": 462.98, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ5_K" }, { "name": "Q8_0", "bpw": 8.504, "ppl": 3.3473, "size": 660.30, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/Q8_0" }, { "name": "BF16", "bpw": 16.003, "ppl": 3.3469, "size": 1257.33, "url": "" } ] } { "title": "DeepSeek-TNG-R1T2-Chimera (671B) Quantization Analysis", "subtitle": "Lower perplexity = Better performance", "model_parameters": 671000000000, "data": [ {"name": "IQ1_S", "bpw": 1.699, "ppl": 4.9878, "size": 132.25, "url": "https://huggingface.co/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/tree/main/IQ1_S"}, {"name": "THIREUS-1.6693", "bpw": 1.6693, "ppl": 4.9676, "size": 130.58, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13883488"}, {"name": "THIREUS-1.7067", "bpw": 1.7067, "ppl": 4.9199, "size": 132.75, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13914222"}, {"name": "THIREUS-2.0622", "bpw": 2.0622, "ppl": 4.0622, "size": 159.84, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13914222"}, {"name": "IQ2_XSS", "bpw": 2.168, "ppl": 4.0078, "size": 168.55, "url": "https://huggingface.co/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/tree/main/IQ2_XSS"}, {"name": "IQ2_KT", "bpw": 2.188, "ppl": 3.8887, "size": 170.29, "url": "https://huggingface.co/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/tree/main/IQ2_KT"}, {"name": "THIREUS-2.5961", "bpw": 2.5961, "ppl": 3.6768, "size": 204.39, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13883488"}, {"name": "IQ2_KS", "bpw": 2.602, "ppl": 3.6254, "size": 204.91, "url": "https://huggingface.co/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/tree/main/IQ2_KS"}, {"name": "THIREUS-2.6261", "bpw": 2.6261, "ppl": 3.5627, "size": 207.12, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13914222"}, {"name": "THIREUS-3.5753", "bpw": 3.5753, "ppl": 3.3187, "size": 280.66, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13883488"}, {"name": "THIREUS-3.5858", "bpw": 3.5858, "ppl": 3.3063, "size": 281.55, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13914222"}, {"name": "IQ3_KS", "bpw": 3.598, "ppl": 3.3167, "size": 282.60, "url": "https://huggingface.co/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/tree/main/IQ3_KS"} ] } { "title": "Kimi-K2-Instruct-0905 (1026B) Quantization Analysis", "subtitle": "Lower perplexity = Better performance", "model_parameters": 1026000000000, "data": [ {"name": "smol-IQ1_KT", "bpw": 1.832, "ppl": 4.2224, "size": 227.95, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/smol-IQ1_KT"}, {"name": "smol-IQ2_KS", "bpw": 2.261, "ppl": 3.4977, "size": 281.47, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/smol-IQ2_KS"}, {"name": "IQ2_KS", "bpw": 2.425, "ppl": 3.2478, "size": 303.61, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/IQ2_KS"}, {"name": "smol-IQ2_KL", "bpw": 2.755, "ppl": 2.9294, "size": 342.81, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/smol-IQ2_KL"}, {"name": "IQ2_KL", "bpw": 3.000, "ppl": 2.7993, "size": 371.48, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/IQ2_KL"}, {"name": "smol-IQ3_KS", "bpw": 3.249, "ppl": 2.5902, "size": 401.87, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/smol-IQ3_KS"}, {"name": "IQ3_KS", "bpw": 3.520, "ppl": 2.5640, "size": 431.87, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/IQ3_KS"}, {"name": "UD-Q3_K_XL", "bpw": 3.521, "ppl": 2.6706, "size": 432.02, "url": "https://huggingface.co/unsloth/Kimi-K2-Instruct-0905-GGUF/tree/main/UD-Q3_K_XL"}, {"name": "THIREUS-4.0285", "bpw": 4.034, "ppl": 2.493, "size": 494.61, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/715#discussioncomment-14485602"}, {"name": "smol-IQ4_KSS", "bpw": 4.059, "ppl": 2.5185, "size": 498.63, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/smol-IQ4_KSS"}, {"name": "IQ4_KS", "bpw": 4.633, "ppl": 2.4641, "size": 567.88, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/IQ4_KS"}, {"name": "smol-IQ5_KS", "bpw": 5.295, "ppl": 2.4526, "size": 651.80, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/smol-IQ5_KS"} ] } { "title": "GLM-4.6 Quantization Analysis", "subtitle": "Lower perplexity = Better performance", "model_parameters": 357000000000, "data": [ {"name": "smol-IQ4_KSS", "bpw": 4.090, "ppl": 3.5911, "size": 169.82, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/smol-IQ4_KSS"}, {"name": "smol-IQ1_KT", "bpw": 1.948, "ppl": 5.9034, "size": 82.31, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/smol-IQ1_KT"}, {"name": "smol-IQ2_KS", "bpw": 2.359, "ppl": 5.2760, "size": 98.97, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/smol-IQ2_KS"}, {"name": "IQ2_KL", "bpw": 3.070, "ppl": 4.1456, "size": 129.13, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/IQ2_KL"}, {"name": "IQ3_KS", "bpw": 3.573, "ppl": 3.6427, "size": 150.98, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/IQ3_KS"}, {"name": "IQ4_KS", "bpw": 4.646, "ppl": 3.5309, "size": 196.27, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/IQ4_KS"}, {"name": "IQ4_K", "bpw": 5.001, "ppl": 3.4758, "size": 210.95, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/IQ4_K"}, {"name": "THIREUS-5.5774bpw", "bpw": 5.5774, "ppl": 3.4486, "size": 234.10, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/715#discussioncomment-14572398"}, {"name": "UD-Q5_K_XL(unsloth)", "bpw": 5.6471, "ppl": 3.4807, "size": 238.97, "url": "https://huggingface.co/unsloth/GLM-4.6-GGUF/tree/main/UD-Q5_K_XL"}, {"name": "IQ5_K", "bpw": 5.997, "ppl": 3.4428, "size": 254.10, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/IQ5_K"}, {"name": "Q8_0", "bpw": 8.505, "ppl": 3.4471, "size": 359.26, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/Q8_0"}, {"name": "BF16", "bpw": 16.003, "ppl": 3.4454, "size": 672.09, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF"} ] } { "title": "Qwen3-Coder-480B-A35B-Instruct Quantization Analysis", "subtitle": "Lower perplexity = Better performance", "model_parameters": 480000000000, "data": [ {"name": "IQ1_KT", "bpw": 1.945, "ppl": 6.3370, "size": 108.26, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/IQ1_KT"}, {"name": "IQ2_KS", "bpw": 2.578, "ppl": 5.6658, "size": 142.90, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/IQ2_KS"}, {"name": "IQ2_K", "bpw": 2.588, "ppl": 5.6578, "size": 143.49, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/IQ2_K"}, {"name": "IQ2_KL", "bpw": 3.034, "ppl": 5.4113, "size": 169.76, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/IQ2_KL"}, {"name": "IQ3_K", "bpw": 3.865, "ppl": 5.1808, "size": 214.77, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/IQ3_K"}, {"name": "IQ4_KSS", "bpw": 4.180, "ppl": 5.1579, "size": 236.73, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/IQ4_KSS"}, {"name": "IQ4_K", "bpw": 4.885, "ppl": 5.1257, "size": 276.05, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/IQ4_K"}, {"name": "THIREUS-5.1546bpw", "bpw": 5.1546, "ppl": 5.1057, "size": 289.53, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/715#discussioncomment-14670424"}, {"name": "IQ5_K", "bpw": 5.900, "ppl": 5.1073, "size": 334.84, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/IQ5_K"}, {"name": "Q8_0", "bpw": 8.503, "ppl": 5.0975, "size": 480.12, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/Q8_0"} ] }CODE: #477 (comment)
Beta Was this translation helpful? Give feedback.
All reactions