--- base_model: - mistralai/Mistral-Nemo-Instruct-2407 - Vortex5/Prototype-X-12b - Vortex5/Stellar-Witch-12B - Vortex5/Celestial-Queen-12B - Vortex5/Moonlit-Mirage-12B - Vortex5/Crimson-Constellation-12B - Vortex5/Wicked-Nebula-12B library_name: transformers tags: - mergekit - merge - mistral - nemo - karcher_stock widget: - text: "Geodesic-Phantom-12B" output: url: https://cdn-uploads.huggingface.co/production/uploads/69e46bb84df2a2575b60a527/7tnIXKdUUtGLGkbcGPRGK.jpeg --- # 👻 Geodesic Phantom 12B ![geodesic-phantom](https://cdn-uploads.huggingface.co/production/uploads/69e46bb84df2a2575b60a527/7tnIXKdUUtGLGkbcGPRGK.jpeg) This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). This was merged in 7 hours on a runpod A40 using an [adaptive VRAM chunking script](https://huggingface.co/spaces/Naphula/model_tools/blob/main/graph_v18_runpod_A40.py) (based on `measure.py` by [GrimJim](https://huggingface.co/grimjim)) ```bat WARNING:mergekit.graph:OOM at chunk 65536, reducing to 32768 (attempt 1, progress: 0/131075) WARNING:mergekit.graph:OOM at chunk 32768, reducing to 16384 (attempt 2, progress: 0/131075) [Karcher_Stock Audit] Layer: lm_head.weight Stats: Cos(θ): 0.564 | t-factor: 0.8843 | Karcher Iters: 2960 (Base) mistralai--Mistral-Nemo-Instruct-2407 : █████ ( 11.57%) (Donor) Vortex5--Prototype-X-12b : ███████ ( 14.74%) (Donor) Vortex5--Stellar-Witch-12B : ███████ ( 14.74%) (Donor) Vortex5--Celestial-Queen-12B : ███████ ( 14.74%) (Donor) Vortex5--Moonlit-Mirage-12B : ███████ ( 14.74%) (Donor) Vortex5--Crimson-Constellation-12B : ███████ ( 14.74%) (Donor) Vortex5--Wicked-Nebula-12B : ███████ ( 14.74%) ``` The following patch was also required for this merge # `karcher_stock` Adaptive Tanh Soft-Clamp v11 ```py # ── 11. Model Stock t factor with Adaptive Soft-Clamp ───────────── N = len(ws_2d) ct = cos_theta.unsqueeze(-1) if cos_theta.dim() > 0 else cos_theta # Raw Model Stock formula denom = 1.0 + (N - 1) * ct # Add a tiny epsilon to prevent literal division by zero t_raw = (N * ct) / denom.clamp(min=1e-6) # --- BULLETPROOF TANH CLAMP --- # 1. Prevent negative infinity spikes (fallback to base model) t_clamped_bottom = torch.clamp(t_raw, min=0.0) # 2. Smoothly asymptote positive spikes to L (Maximum allowed t-factor) L = 1.5 excess = torch.clamp(t_clamped_bottom - 1.0, min=0.0) t_soft_top = 1.0 + (L - 1.0) * torch.tanh(excess / (L - 1.0)) # 3. Apply: If t <= 1.0, use exact math. If t > 1.0, use soft curve. t = torch.where(t_clamped_bottom <= 1.0, t_clamped_bottom, t_soft_top) # ------------------------------ ``` ## Example of the clamp preventing merge corruption ![tanh_clamp](https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/eRdxOMhKsRysDgP-6Pkw0.png) ## Merge Details ### Merge Method This model was merged using the `karcher_stock` merge method using /workspace/models/mistralai--Mistral-Nemo-Instruct-2407 as a base. ### Models Merged The following models were included in the merge: * /workspace/models/Vortex5--Wicked-Nebula-12B * /workspace/models/Vortex5--Celestial-Queen-12B * /workspace/models/Vortex5--Moonlit-Mirage-12B * /workspace/models/Vortex5--Stellar-Witch-12B * /workspace/models/Vortex5--Prototype-X-12b * /workspace/models/Vortex5--Crimson-Constellation-12B ### Configuration The following YAML configuration was used to produce this model: ```yaml architecture: MistralForCausalLM base_model: /workspace/models/mistralai--Mistral-Nemo-Instruct-2407 models: - model: /workspace/models/Vortex5--Prototype-X-12b - model: /workspace/models/Vortex5--Celestial-Queen-12B - model: /workspace/models/Vortex5--Wicked-Nebula-12B - model: /workspace/models/Vortex5--Stellar-Witch-12B - model: /workspace/models/Vortex5--Moonlit-Mirage-12B - model: /workspace/models/Vortex5--Crimson-Constellation-12B merge_method: karcher_stock # v8 parameters: filter_wise: true max_iter: 10000 min_iter: 1000 tol: 1.0e-11 dtype: float32 out_dtype: bfloat16 tokenizer: source: union chat_template: auto name: 👻 Geodesic Phantom 12B ```