Lgr54HFi
/

chimera-gguf-import

Model card Files Files and versions

xet

Community

Lgr54HFi commited on 16 days ago

Commit

a4a6375

verified ·

1 Parent(s): 74be8b4

Upload README.md

Browse files

Files changed (1) hide show

README.md +98 -72

README.md CHANGED Viewed

@@ -1,117 +1,122 @@
-# Chimera GGUF Import
-**Universal weight importer**: convert ANY GGUF model (any quantization, any architecture) into Chimera 5.1 ternary format.
-## What it does
 ```
-GGUF (Q4_0/Q5_1/Q8_0/F16/F32/BF16...)
-  → Dequantize to FP32 (lossless for the given quant)
-  → Smart noise reduction (outlier-aware, median-robust)
-  → Ternary conversion {-1, 0, +1} with per-row AbsMean
-  → 2-bit pack (4 weights/byte = 16× memory reduction)
-  → Chimera 5.1 checkpoint (.pt)
 ```
-## Install
 ```bash
 pip install gguf torch numpy
 ```
-## Usage
 ```bash
 python gguf_import.py \
-    --gguf /path/to/any-model.gguf \
-    --config /path/to/chimera/config.json \
     --scale tiny \
     --output ./imported_chimera.pt
 ```
-### With noise reduction tuning
 ```bash
 python gguf_import.py \
     --gguf model.gguf \
     --config config.json \
-    --scale small \
-    --noise-method outlier_clip \
-    --noise-sigma 2.5 \
-    --output ./model_chimera.pt
 ```
-### Supported GGUF quantizations
-| Type | Status |
-|---|---|
-| F32 | ✅ Direct |
-| F16 | ✅ Cast |
-| BF16 | ✅ Cast |
-| Q8_0 / Q8_1 | ✅ Dequantize |
-| Q5_0 / Q5_1 | ✅ Dequantize |
-| Q4_0 / Q4_1 | ✅ Dequantize |
-| Q2_K / Q3_K / Q4_K / Q5_K / Q6_K | ✅ Via `gguf.dequantize` |
-### Supported source architectures
-Any GGUF model: LLaMA, Qwen, Mistral, Phi, Gemma, DeepSeek, etc.
-## How the conversion works
-### 1. Noise reduction
-Before ternary conversion, weights are pre-processed to remove outliers that would distort the AbsMean scale:
-- **`outlier_clip`** (default): clip values beyond `mean ± 3σ`. Keeps 99.7% of mass, removes extreme outliers.
-- **`median_center`**: center by median, scale by MAD. Robust to heavy-tailed distributions.
-- **`none`**: passthrough (useful if source is already clean).
-### 2. Ternary conversion
-Per-row AbsMean scaling:
-```
-α_m = mean(|W_m,:|)
-W̃ = W / α
-W_q = round_STE(W̃) ∈ {-1, 0, +1}
-```
-Straight-Through Estimator (STE) allows the ternary grid to learn from the original weight distribution.
-### 3. 2-bit packing
-```
--1 → bits 10 (2)
- 0 → bits 00 (0)
-+1 → bits 01 (1)
-```
-4 weights packed into 1 uint8 byte = **16× memory reduction** vs FP32.
-### 4. Shape adaptation
-When source and Chimera dimensions differ (e.g., 4096 → 256 hidden size), weights are resized via bilinear interpolation, preserving the spatial structure of the weight distribution.
-## Output format
-```python
-ckpt = torch.load("imported_chimera.pt")
-ckpt["model"]          # state_dict with FP32 latent weights for BitLinear + norms/embeds
-ckpt["config"]         # Chimera config used
-ckpt["source"]         # GGUF path, scale, noise params
-```
-## Architecture note
-Chimera 5.1 uses hybrid recurrent layers (GatedDeltaNet, xLSTM mLSTM, Titans MAC, TSP Span Knot) — NOT standard transformer attention. Weight mapping from standard transformer QKV → Chimera attention is inherently lossy. The importer maps:
-- **Embeddings** (`token_embd` → `embed`)
-- **Output** (`output` → `lm_head`)
-- **Norms** (`attn_norm`, `ffn_norm` → `attn_norm`, `mlp_norm`)
-- **MLP** (`ffn_gate`, `ffn_up`, `ffn_down` → `mlp.gate_proj`, `mlp.up_proj`, `mlp.down_proj`)
-- **Attention projections** (`attn_q`, `attn_k`, `attn_v`, `attn_output` → `attn.q_proj`, `k_proj`, `v_proj`, `o_proj`)
-After import, **fine-tuning is strongly recommended** to adapt the transplanted weights to Chimera's recurrence dynamics. Use the patched `train.py` with MeZO:
 ```bash
 OMP_NUM_THREADS=20 python train.py \
@@ -120,3 +125,24 @@ OMP_NUM_THREADS=20 python train.py \
   --max_tokens 50000000 --compile --no-bf16 --num_workers 0 \
   --output_dir ./finetune_imported
 ```

+# Chimera GGUF Import — v2.0 Optimized
+Importer universel : convertit **n'importe quel modèle GGUF** (n'importe quelle quantisation, n'importe quelle architecture) en checkpoint compatible Chimera 5.1.
+## Ce que fait le script
 ```
+GGUF (Q4_0, Q5_1, Q8_0, F16, F32, BF16...)
+  → Déquantification FP32 via gguf.dequantize()
+  → Noise reduction (outlier-aware, par ligne ou global)
+  → Conversion ternaire {-1, 0, +1} avec AbsMean par ligne
+  → 2-bit packing (4 poids/byte = 16× réduction mémoire)
+  → Checkpoint Chimera 5.1 (.pt)
 ```
+## Installation
 ```bash
 pip install gguf torch numpy
 ```
+## Usage rapide
 ```bash
 python gguf_import.py \
+    --gguf /chemin/vers/nimportequel-modele.gguf \
+    --config /chemin/vers/chimera/config.json \
     --scale tiny \
     --output ./imported_chimera.pt
 ```
+## Modes de stockage
+| Mode | Description | Quand l'utiliser |
+|---|---|---|
+| `fp32` (défaut) | Sauvegarde weight latent FP32 natif Chimera. Compatible avec `Chimera51ForCausalLM.load_state_dict()`. | **Recommandé** — le plus simple. |
+| `packed` | Sauvegarde uniquement `packed_weight` + `alpha` pour les couches linéaires. **Nécessite un loader custom** dans Chimera. | Expérimental — checkpoint ultra-compact. |
+| `both` | Sauvegarde les deux : weight FP32 + packed + alpha. | Pour migration progressive vers packed. |
 ```bash
+# Mode packed (expérimental)
 python gguf_import.py \
     --gguf model.gguf \
     --config config.json \
+    --scale tiny \
+    --storage packed \
+    --output ./chimera_packed.pt
 ```
+## Réduction de bruit configurable
+| Méthode | Description | Par défaut |
+|---|---|---|
+| `row_outlier_clip` | Clip par ligne `mean ± 3σ` — préserve la structure locale des poids. | **✅ défaut** |
+| `global_clip` | Clip global mean ± σ — plus agressif, moins de granularité. | |
+| `median_center` | Center par médiane, scale par MAD — robuste aux distributions lourdes. | |
+| `none` | Passthrough — si la source est déjà propre. | |
+```bash
+python gguf_import.py \
+    --gguf model.gguf \
+    --config config.json \
+    --noise-method row_outlier_clip \
+    --noise-sigma 2.5 \
+    --output ./model_chimera.pt
+```
+## Stratégies de resize
+| Stratégie | Description | Par défaut |
+|---|---|---|
+| `crop_pad` | Copie les zones communes, init le reste avec des gaussiennes de même std. | **✅ défaut** |
+| `interpolate` | Interpolation bilinéaire (préserve la structure spatiale). | |
+| `strict` | Échoue si les shapes ne matchent exactement. | |
+## Auto-transpose
+Détecte automatiquement si les dimensions source et cible sont inversées (`[out, in]` vs `[in, out]`) et transpose silencieusement.
+Désactiver avec `--no-auto-transpose`.
+## Quantisations GGUF supportées
+| Type | Statut |
+|---|---|
+| F32 | ✅ Direct |
+| F16 | ✅ Cast |
+| BF16 | ✅ Cast |
+| Q8_0 / Q8_1 | ✅ Déquantification |
+| Q5_0 / Q5_1 | ✅ Déquantification |
+| Q4_0 / Q4_1 | ✅ Déquantification |
+| Q2_K / Q3_K / Q4_K / Q5_K / Q6_K | ✅ Via `gguf.dequantize` |
+## Architectures source supportées
+Tout modèle GGUF : LLaMA, Qwen, Mistral, Phi, Gemma, DeepSeek, etc.
+## Mapping GGUF → Chimera
+| GGUF source | Chimera cible |
+|---|---|
+| `token_embd` | `embed.weight` |
+| `output` | `lm_head.weight` |
+| `output_norm` | `norm.weight` |
+| `blk.N.attn_q/k/v/output` | `layers.N.attn.q/k/v/o_proj` |
+| `blk.N.ffn_gate/up/down` | `layers.N.mlp.gate/up/down_proj` |
+| `blk.N.attn_norm` | `layers.N.attn_norm` |
+| `blk.N.ffn_norm` | `layers.N.mlp_norm` |
+## Clés manquantes
+Par défaut (`--no-init-missing` désactivé), toutes les couches Chimera absentes du GGUF source sont initialisées automatiquement :
+- Norms : `torch.ones(...)`
+- Embeddings/Head : `normal_(0, 0.02)`
+- Linéaires BitLinear : `normal_(0, sqrt(2/fan_in))` + ternarisation
+## Après import : fine-tuning obligatoire
+Chimera utilise des couches récurrentes hybrides (GatedDeltaNet, xLSTM, Titans, TSP Span Knot) — **pas** du transformer standard. Le mapping QKV↔recurrence est intrinsèquement lossy. Fine-tunez avec MeZO sur CPU :
 ```bash
 OMP_NUM_THREADS=20 python train.py \
   --max_tokens 50000000 --compile --no-bf16 --num_workers 0 \
   --output_dir ./finetune_imported
 ```
+## Exemple complet
+```bash
+python gguf_import.py \
+    --gguf ./models/mistral-7b-q4_0.gguf \
+    --config ./chimera/config.json \
+    --scale tiny \
+    --storage fp32 \
+    --param-dtype fp32 \
+    --noise-method row_outlier_clip \
+    --noise-sigma 3.0 \
+    --ternary-threshold 0.5 \
+    --resize-strategy crop_pad \
+    --output ./mistral_chimera_tiny.pt
+```
+## Dépôt
+- Repo HuggingFace : [`Lgr54HFi/chimera-gguf-import`](https://huggingface.co/Lgr54HFi/chimera-gguf-import)
+- Version : 2.0-optimized