Fix YARN mscale_all_dim for long-context (mirror upstream Mistral fix)

Set text_config.rope_parameters.mscale_all_dim from 1.0 to 0.0 to match the upstream mistralai/Mistral-Medium-3.5-128B repo (commit c4be198050, 2026-05-01).

The original value of 1.0 caused HF transformers _compute_yarn_parameters to evaluate get_mscale(64,1)/get_mscale(64,1) = 1.0, silently disabling YARN attention scaling. With mscale_all_dim=0.0, the falsy mscale_all_dim routes to the else branch that returns 1+0.1*ln(64) = 1.4159, matching vLLM and the YARN paper.

This fixes long-context generation degeneration (repetition loops past ~600-800 tokens) under HF transformers and llama.cpp pre-existing GGUF builds. params.json apply_scale=true was already correct and is unchanged.

Files changed (1) hide show

config.json +1 -1

config.json CHANGED Viewed

@@ -42,7 +42,7 @@
       "factor": 64.0,
       "llama_4_scaling_beta": 0,
       "mscale": 1.0,
-      "mscale_all_dim": 1.0,
       "original_max_position_embeddings": 4096,
       "rope_theta": 1000000.0,
       "rope_type": "yarn",

       "factor": 64.0,
       "llama_4_scaling_beta": 0,
       "mscale": 1.0,
+      "mscale_all_dim": 0.0,
       "original_max_position_embeddings": 4096,
       "rope_theta": 1000000.0,
       "rope_type": "yarn",