File size: 4,875 Bytes
4edccb2 3b81dbc 4edccb2 3b81dbc bfc03e9 4edccb2 3b81dbc 4edccb2 cd6cc5e 3b81dbc 4edccb2 3b81dbc 4edccb2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | ---
base_model:
- mistralai/Mistral-Nemo-Instruct-2407
- Vortex5/Prototype-X-12b
- Vortex5/Stellar-Witch-12B
- Vortex5/Celestial-Queen-12B
- Vortex5/Moonlit-Mirage-12B
- Vortex5/Crimson-Constellation-12B
- Vortex5/Wicked-Nebula-12B
library_name: transformers
tags:
- mergekit
- merge
- mistral
- nemo
- karcher_stock
widget:
- text: "Geodesic-Phantom-12B"
output:
url: https://cdn-uploads.huggingface.co/production/uploads/69e46bb84df2a2575b60a527/7tnIXKdUUtGLGkbcGPRGK.jpeg
---
# π» Geodesic Phantom 12B

This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
This was merged in 7 hours on a runpod A40 using an [adaptive VRAM chunking script](https://huggingface.co/spaces/Naphula/model_tools/blob/main/graph_v18_runpod_A40.py) (based on `measure.py` by [GrimJim](https://huggingface.co/grimjim))
```bat
WARNING:mergekit.graph:OOM at chunk 65536, reducing to 32768 (attempt 1, progress: 0/131075)
WARNING:mergekit.graph:OOM at chunk 32768, reducing to 16384 (attempt 2, progress: 0/131075)
[Karcher_Stock Audit] Layer: lm_head.weight
Stats: Cos(ΞΈ): 0.564 | t-factor: 0.8843 | Karcher Iters: 2960
(Base) mistralai--Mistral-Nemo-Instruct-2407 : βββββ ( 11.57%)
(Donor) Vortex5--Prototype-X-12b : βββββββ ( 14.74%)
(Donor) Vortex5--Stellar-Witch-12B : βββββββ ( 14.74%)
(Donor) Vortex5--Celestial-Queen-12B : βββββββ ( 14.74%)
(Donor) Vortex5--Moonlit-Mirage-12B : βββββββ ( 14.74%)
(Donor) Vortex5--Crimson-Constellation-12B : βββββββ ( 14.74%)
(Donor) Vortex5--Wicked-Nebula-12B : βββββββ ( 14.74%)
```
The following patch was also required for this merge
# `karcher_stock` Adaptive Tanh Soft-Clamp v11
```py
# ββ 11. Model Stock t factor with Adaptive Soft-Clamp βββββββββββββ
N = len(ws_2d)
ct = cos_theta.unsqueeze(-1) if cos_theta.dim() > 0 else cos_theta
# Raw Model Stock formula
denom = 1.0 + (N - 1) * ct
# Add a tiny epsilon to prevent literal division by zero
t_raw = (N * ct) / denom.clamp(min=1e-6)
# --- BULLETPROOF TANH CLAMP ---
# 1. Prevent negative infinity spikes (fallback to base model)
t_clamped_bottom = torch.clamp(t_raw, min=0.0)
# 2. Smoothly asymptote positive spikes to L (Maximum allowed t-factor)
L = 1.5
excess = torch.clamp(t_clamped_bottom - 1.0, min=0.0)
t_soft_top = 1.0 + (L - 1.0) * torch.tanh(excess / (L - 1.0))
# 3. Apply: If t <= 1.0, use exact math. If t > 1.0, use soft curve.
t = torch.where(t_clamped_bottom <= 1.0, t_clamped_bottom, t_soft_top)
# ------------------------------
```
## Example of the clamp preventing merge corruption

## Merge Details
### Merge Method
This model was merged using the `karcher_stock` merge method using /workspace/models/mistralai--Mistral-Nemo-Instruct-2407 as a base.
### Models Merged
The following models were included in the merge:
* /workspace/models/Vortex5--Wicked-Nebula-12B
* /workspace/models/Vortex5--Celestial-Queen-12B
* /workspace/models/Vortex5--Moonlit-Mirage-12B
* /workspace/models/Vortex5--Stellar-Witch-12B
* /workspace/models/Vortex5--Prototype-X-12b
* /workspace/models/Vortex5--Crimson-Constellation-12B
### Configuration
The following YAML configuration was used to produce this model:
```yaml
architecture: MistralForCausalLM
base_model: /workspace/models/mistralai--Mistral-Nemo-Instruct-2407
models:
- model: /workspace/models/Vortex5--Prototype-X-12b
- model: /workspace/models/Vortex5--Celestial-Queen-12B
- model: /workspace/models/Vortex5--Wicked-Nebula-12B
- model: /workspace/models/Vortex5--Stellar-Witch-12B
- model: /workspace/models/Vortex5--Moonlit-Mirage-12B
- model: /workspace/models/Vortex5--Crimson-Constellation-12B
merge_method: karcher_stock # v8
parameters:
filter_wise: true
max_iter: 10000
min_iter: 1000
tol: 1.0e-11
dtype: float32
out_dtype: bfloat16
tokenizer:
source: union
chat_template: auto
name: π» Geodesic Phantom 12B
```
|