| --- |
| base_model: |
| - mistralai/Mistral-Nemo-Instruct-2407 |
| - Vortex5/Prototype-X-12b |
| - Vortex5/Stellar-Witch-12B |
| - Vortex5/Celestial-Queen-12B |
| - Vortex5/Moonlit-Mirage-12B |
| - Vortex5/Crimson-Constellation-12B |
| - Vortex5/Wicked-Nebula-12B |
| library_name: transformers |
| tags: |
| - mergekit |
| - merge |
| - mistral |
| - nemo |
| - karcher_stock |
| widget: |
| - text: "Geodesic-Phantom-12B" |
| output: |
| url: https://cdn-uploads.huggingface.co/production/uploads/69e46bb84df2a2575b60a527/7tnIXKdUUtGLGkbcGPRGK.jpeg |
| --- |
| # π» Geodesic Phantom 12B |
|
|
|  |
|
|
| This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). |
|
|
| This was merged in 7 hours on a runpod A40 using an [adaptive VRAM chunking script](https://huggingface.co/spaces/Naphula/model_tools/blob/main/graph_v18_runpod_A40.py) (based on `measure.py` by [GrimJim](https://huggingface.co/grimjim)) |
|
|
| ```bat |
| WARNING:mergekit.graph:OOM at chunk 65536, reducing to 32768 (attempt 1, progress: 0/131075) |
| WARNING:mergekit.graph:OOM at chunk 32768, reducing to 16384 (attempt 2, progress: 0/131075) |
| |
| [Karcher_Stock Audit] Layer: lm_head.weight |
| Stats: Cos(ΞΈ): 0.564 | t-factor: 0.8843 | Karcher Iters: 2960 |
| (Base) mistralai--Mistral-Nemo-Instruct-2407 : βββββ ( 11.57%) |
| (Donor) Vortex5--Prototype-X-12b : βββββββ ( 14.74%) |
| (Donor) Vortex5--Stellar-Witch-12B : βββββββ ( 14.74%) |
| (Donor) Vortex5--Celestial-Queen-12B : βββββββ ( 14.74%) |
| (Donor) Vortex5--Moonlit-Mirage-12B : βββββββ ( 14.74%) |
| (Donor) Vortex5--Crimson-Constellation-12B : βββββββ ( 14.74%) |
| (Donor) Vortex5--Wicked-Nebula-12B : βββββββ ( 14.74%) |
| ``` |
|
|
| The following patch was also required for this merge |
|
|
| # `karcher_stock` Adaptive Tanh Soft-Clamp v11 |
| |
| ```py |
| # ββ 11. Model Stock t factor with Adaptive Soft-Clamp βββββββββββββ |
| N = len(ws_2d) |
| ct = cos_theta.unsqueeze(-1) if cos_theta.dim() > 0 else cos_theta |
| |
| # Raw Model Stock formula |
| denom = 1.0 + (N - 1) * ct |
| # Add a tiny epsilon to prevent literal division by zero |
| t_raw = (N * ct) / denom.clamp(min=1e-6) |
| |
| # --- BULLETPROOF TANH CLAMP --- |
| # 1. Prevent negative infinity spikes (fallback to base model) |
| t_clamped_bottom = torch.clamp(t_raw, min=0.0) |
| |
| # 2. Smoothly asymptote positive spikes to L (Maximum allowed t-factor) |
| L = 1.5 |
| excess = torch.clamp(t_clamped_bottom - 1.0, min=0.0) |
| t_soft_top = 1.0 + (L - 1.0) * torch.tanh(excess / (L - 1.0)) |
| |
| # 3. Apply: If t <= 1.0, use exact math. If t > 1.0, use soft curve. |
| t = torch.where(t_clamped_bottom <= 1.0, t_clamped_bottom, t_soft_top) |
| # ------------------------------ |
| ``` |
| |
| ## Example of the clamp preventing merge corruption |
|  |
|
|
| ## Merge Details |
| ### Merge Method |
|
|
| This model was merged using the `karcher_stock` merge method using /workspace/models/mistralai--Mistral-Nemo-Instruct-2407 as a base. |
|
|
| ### Models Merged |
|
|
| The following models were included in the merge: |
| * /workspace/models/Vortex5--Wicked-Nebula-12B |
| * /workspace/models/Vortex5--Celestial-Queen-12B |
| * /workspace/models/Vortex5--Moonlit-Mirage-12B |
| * /workspace/models/Vortex5--Stellar-Witch-12B |
| * /workspace/models/Vortex5--Prototype-X-12b |
| * /workspace/models/Vortex5--Crimson-Constellation-12B |
|
|
| ### Configuration |
|
|
| The following YAML configuration was used to produce this model: |
|
|
| ```yaml |
| architecture: MistralForCausalLM |
| base_model: /workspace/models/mistralai--Mistral-Nemo-Instruct-2407 |
| models: |
| - model: /workspace/models/Vortex5--Prototype-X-12b |
| - model: /workspace/models/Vortex5--Celestial-Queen-12B |
| - model: /workspace/models/Vortex5--Wicked-Nebula-12B |
| - model: /workspace/models/Vortex5--Stellar-Witch-12B |
| - model: /workspace/models/Vortex5--Moonlit-Mirage-12B |
| - model: /workspace/models/Vortex5--Crimson-Constellation-12B |
| merge_method: karcher_stock # v8 |
| parameters: |
| filter_wise: true |
| max_iter: 10000 |
| min_iter: 1000 |
| tol: 1.0e-11 |
| dtype: float32 |
| out_dtype: bfloat16 |
| tokenizer: |
| source: union |
| chat_template: auto |
| name: π» Geodesic Phantom 12B |
| ``` |
|
|