File size: 4,875 Bytes
4edccb2
3b81dbc
 
 
 
 
 
 
 
4edccb2
 
 
 
3b81dbc
 
 
bfc03e9
 
 
 
4edccb2
3b81dbc
 
 
4edccb2
 
 
cd6cc5e
3b81dbc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4edccb2
 
 
3b81dbc
4edccb2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
base_model: 
- mistralai/Mistral-Nemo-Instruct-2407   
- Vortex5/Prototype-X-12b
- Vortex5/Stellar-Witch-12B
- Vortex5/Celestial-Queen-12B
- Vortex5/Moonlit-Mirage-12B
- Vortex5/Crimson-Constellation-12B
- Vortex5/Wicked-Nebula-12B
library_name: transformers
tags:
- mergekit
- merge
- mistral
- nemo
- karcher_stock
widget:
  - text: "Geodesic-Phantom-12B"
    output:
      url: https://cdn-uploads.huggingface.co/production/uploads/69e46bb84df2a2575b60a527/7tnIXKdUUtGLGkbcGPRGK.jpeg
---
# πŸ‘» Geodesic Phantom 12B

![geodesic-phantom](https://cdn-uploads.huggingface.co/production/uploads/69e46bb84df2a2575b60a527/7tnIXKdUUtGLGkbcGPRGK.jpeg)

This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

This was merged in 7 hours on a runpod A40 using an [adaptive VRAM chunking script](https://huggingface.co/spaces/Naphula/model_tools/blob/main/graph_v18_runpod_A40.py) (based on `measure.py` by [GrimJim](https://huggingface.co/grimjim))

```bat
WARNING:mergekit.graph:OOM at chunk 65536, reducing to 32768 (attempt 1, progress: 0/131075)
WARNING:mergekit.graph:OOM at chunk 32768, reducing to 16384 (attempt 2, progress: 0/131075)

[Karcher_Stock Audit] Layer: lm_head.weight
Stats: Cos(ΞΈ): 0.564 | t-factor: 0.8843 | Karcher Iters: 2960
  (Base)  mistralai--Mistral-Nemo-Instruct-2407               : β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                              ( 11.57%)
  (Donor) Vortex5--Prototype-X-12b                            : β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                            ( 14.74%)
  (Donor) Vortex5--Stellar-Witch-12B                          : β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                            ( 14.74%)
  (Donor) Vortex5--Celestial-Queen-12B                        : β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                            ( 14.74%)
  (Donor) Vortex5--Moonlit-Mirage-12B                         : β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                            ( 14.74%)
  (Donor) Vortex5--Crimson-Constellation-12B                  : β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                            ( 14.74%)
  (Donor) Vortex5--Wicked-Nebula-12B                          : β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                            ( 14.74%)
```

The following patch was also required for this merge

# `karcher_stock` Adaptive Tanh Soft-Clamp v11

```py
# ── 11. Model Stock t factor with Adaptive Soft-Clamp ─────────────
        N = len(ws_2d)
        ct = cos_theta.unsqueeze(-1) if cos_theta.dim() > 0 else cos_theta
        
        # Raw Model Stock formula
        denom = 1.0 + (N - 1) * ct
        # Add a tiny epsilon to prevent literal division by zero
        t_raw = (N * ct) / denom.clamp(min=1e-6) 

        # --- BULLETPROOF TANH CLAMP ---
        # 1. Prevent negative infinity spikes (fallback to base model)
        t_clamped_bottom = torch.clamp(t_raw, min=0.0)
        
        # 2. Smoothly asymptote positive spikes to L (Maximum allowed t-factor)
        L = 1.5 
        excess = torch.clamp(t_clamped_bottom - 1.0, min=0.0)
        t_soft_top = 1.0 + (L - 1.0) * torch.tanh(excess / (L - 1.0))
        
        # 3. Apply: If t <= 1.0, use exact math. If t > 1.0, use soft curve.
        t = torch.where(t_clamped_bottom <= 1.0, t_clamped_bottom, t_soft_top)
        # ------------------------------
```

## Example of the clamp preventing merge corruption
![tanh_clamp](https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/eRdxOMhKsRysDgP-6Pkw0.png)

## Merge Details
### Merge Method

This model was merged using the `karcher_stock` merge method using /workspace/models/mistralai--Mistral-Nemo-Instruct-2407 as a base.

### Models Merged

The following models were included in the merge:
* /workspace/models/Vortex5--Wicked-Nebula-12B
* /workspace/models/Vortex5--Celestial-Queen-12B
* /workspace/models/Vortex5--Moonlit-Mirage-12B
* /workspace/models/Vortex5--Stellar-Witch-12B
* /workspace/models/Vortex5--Prototype-X-12b
* /workspace/models/Vortex5--Crimson-Constellation-12B

### Configuration

The following YAML configuration was used to produce this model:

```yaml
architecture: MistralForCausalLM
base_model: /workspace/models/mistralai--Mistral-Nemo-Instruct-2407
models:
  - model: /workspace/models/Vortex5--Prototype-X-12b
  - model: /workspace/models/Vortex5--Celestial-Queen-12B
  - model: /workspace/models/Vortex5--Wicked-Nebula-12B
  - model: /workspace/models/Vortex5--Stellar-Witch-12B
  - model: /workspace/models/Vortex5--Moonlit-Mirage-12B
  - model: /workspace/models/Vortex5--Crimson-Constellation-12B
merge_method: karcher_stock # v8
parameters:  
  filter_wise: true
  max_iter: 10000
  min_iter: 1000
  tol: 1.0e-11
dtype: float32
out_dtype: bfloat16
tokenizer:
  source: union
chat_template: auto
name: πŸ‘» Geodesic Phantom 12B
```