danielhanchen commited on
Commit
02fdd45
·
verified ·
1 Parent(s): 78b8734

Fix YARN mscale_all_dim for long-context (mirror upstream Mistral fix)

Browse files

Set text_config.rope_parameters.mscale_all_dim from 1.0 to 0.0 to match the upstream mistralai/Mistral-Medium-3.5-128B repo (commit c4be198050, 2026-05-01).

The original value of 1.0 caused HF transformers _compute_yarn_parameters to evaluate get_mscale(64,1)/get_mscale(64,1) = 1.0, silently disabling YARN attention scaling. With mscale_all_dim=0.0, the falsy mscale_all_dim routes to the else branch that returns 1+0.1*ln(64) = 1.4159, matching vLLM and the YARN paper.

This fixes long-context generation degeneration (repetition loops past ~600-800 tokens) under HF transformers and llama.cpp pre-existing GGUF builds. params.json apply_scale=true was already correct and is unchanged.

Files changed (1) hide show
  1. config.json +1 -1
config.json CHANGED
@@ -42,7 +42,7 @@
42
  "factor": 64.0,
43
  "llama_4_scaling_beta": 0,
44
  "mscale": 1.0,
45
- "mscale_all_dim": 1.0,
46
  "original_max_position_embeddings": 4096,
47
  "rope_theta": 1000000.0,
48
  "rope_type": "yarn",
 
42
  "factor": 64.0,
43
  "llama_4_scaling_beta": 0,
44
  "mscale": 1.0,
45
+ "mscale_all_dim": 0.0,
46
  "original_max_position_embeddings": 4096,
47
  "rope_theta": 1000000.0,
48
  "rope_type": "yarn",