The default setting is 1.5 which merges to 12% salience. If you want 25% then decrease to 0.75. If you want 33% use 0.4, and 0.0 results in 50% salience. `arcee_fusion` hardcodes this to 1.5 by default but you can simply edit the python script before merging (easier), or update the code to assign it as a yaml parameter (more complex). [Here is a scanner to audit Arcee_Fusion merge salience](https://huggingface.co/spaces/Naphula/model_tools/blob/main/arcee_fusion_salience_scanner.py) --- In the context of merging models using the `mergekit` framework, the provided code showcases a **Dynamic Threshold Fusion** mechanism. Here's an overview of how modifying the **Tukey fence** parameter (from **1.5** to **0.75** or **3.0**) would affect the model merging process. ## Impact of Changing Tukey Fence Parameter ### Current Implementation In your code, the dynamic threshold is set as: ```python dynamic_threshold = median + 1.0 * iqr # Tukey Fence ``` This uses **1.0** as the multiplier instead of **1.5**, but let's assume it was meant to be **1.5** for generalizing the Tukey fence. ### Changes to k Values #### 1. **If k = 0.75** - **Increased Sensitivity**: The dynamic threshold would be lower, leading to more values being included in the fusion mask. - **More Features Merged**: This results in a potentially more aggressive merging of parameters, effectively incorporating even slight differences between the model weights. - **Use Cases**: Ideal for exploratory processes or applications where capturing subtle variations is critical. #### Implementation You would change the line in `calculate_dynamic_threshold`: ```python dynamic_threshold = median + 0.75 * iqr # Tukey Fence with k = 0.75 ``` --- #### 2. **If k = 3.0** - **Decreased Sensitivity**: The dynamic threshold would be much higher, allowing only significant differences to be considered. - **Fewer Features Merged**: This could lead to a conservative merging approach, where only clear outliers are merged, possibly overlooking important but smaller differences. - **Use Cases**: Useful in high-stakes environments where only distinctly different parameters should contribute to the final model. #### Implementation You would change the same line to: ```python dynamic_threshold = median + 3.0 * iqr # Tukey Fence with k = 3.0 ``` --- ## Summary of Effects on Fusion Process - **k = 0.75**: This results in a more inclusive fusion mask. More difference is merged, potentially capturing more nuanced information. - **k = 3.0**: This leads to a restrictive fusion mask. Smaller differences would be ignored, possibly resulting in a model that retains more characteristics of the base model if the other model has less deviation. ### Final Thought The choice of **k** can significantly dictate the behavior and performance of the merged model. Deciding whether to accommodate subtle differences or enforce strict conditions on merging will greatly depend on the specific use case and the importance of the characteristics being learned from either model. If you'd like to delve deeper into any specific aspect or need further assistance, just let me know! --- The reason you are seeing a consistent **~12%** is not because of a hardcoded number, but because of a specific statistical formula used to identify "outliers" in the weight distribution. In `arcee_fusion.py`, the density is controlled by the **Tukey's Fence** method. Here are the specific lines of code and the math behind them: ### 1. The Threshold Calculation In `arcee_fusion.py`, inside the `DynamicThresholdFusion` class: ```python # Line 41-42: Calculate the Interquartile Range (IQR) iqr = q3 - q1 # Line 45: The "Magic" Formula dynamic_threshold = median + 1.5 * iqr ``` ### 2. The Technical Analysis of "12%" The value **`1.5`** is the standard multiplier for **Tukey's Fences**, a statistical method used to find "outliers." * **The Logic:** Arcee Fusion assumes that most weight changes in a fine-tune are "noise" or "jitter." It only wants to keep the "salient" changes—the ones that are statistically significant. * **The Distribution:** Importance scores in neural networks usually follow a **Laplace** or **Heavy-Tailed Gaussian** distribution. * **The Result:** In these types of distributions, the data points that fall above `Median + 1.5 * IQR` (the "upper fence") typically represent the top **10% to 15%** of the values. Your result of **12.1%** means that in every layer, roughly 12% of the weights were "important enough" to cross that statistical fence. The other 88% were considered "noise" and were reverted to the Mistral Base weights. ### 3. Why is it so consistent across layers? The scanner shows almost no variation (11.9% to 12.2%). This reveals a technical truth about your fine-tunes: **The "knowledge" is spread evenly.** If you were merging a model that only learned a specific task (like coding), you would see a "spike" in the middle layers (where logic is processed) and 0% in the early layers. Because your layers are all ~12%, it means the fine-tuned models you are using modified the entire "brain" of the model uniformly. ### 4. How to change this (The "Patch") If you want a higher density (e.g., you want 25% or 50% "New Info"), you have to modify the multiplier in `arcee_fusion.py`. * **To get MORE new info:** Lower the `1.5` to `0.5`. * **To get LESS new info:** Raise the `1.5` to `3.0` (this is the "Extreme Outlier" threshold).