chomera / chimera

179 kB

Ctrl+K

1 contributor

History: 23 commits

Lgr54HFi

fix: MoE intermediate_size not scaled for tiny — 158M→4M MoE params

6cb7b4d verified 11 days ago

training
fix: MoE intermediate_size not scaled for tiny — 158M→4M MoE params 11 days ago
__init__.py

2.43 kB
Upload folder using huggingface_hub 12 days ago
__main__.py

894 Bytes
Upload folder using huggingface_hub 12 days ago
cli.py

1.97 kB
Upload folder using huggingface_hub 12 days ago
config.py

3.11 kB
Upload folder using huggingface_hub 12 days ago
evolution.py

23.3 kB
perf: eliminate .item() graph breaks in evolution.py — use tensor comparisons for torch.compile compat" 12 days ago
hyper.py

18.7 kB
Upload folder using huggingface_hub 12 days ago
inference.py

15.1 kB
Upload folder using huggingface_hub 12 days ago
layers.py

21.1 kB
Upload folder using huggingface_hub 12 days ago
looping.py

2.82 kB
Upload folder using huggingface_hub 12 days ago
model.py

15.9 kB
Skip SpanEngine/Grammar/DebtLedger during training (inference-only ops on 200K logits) 11 days ago
moe.py

4.29 kB
Upload folder using huggingface_hub 12 days ago
multimodal.py

5.15 kB
Upload folder using huggingface_hub 12 days ago
paths.py

358 Bytes
Upload folder using huggingface_hub 12 days ago
quantization.py

17.4 kB
fix: NaN at step 150 — add gradient clamping to STE detach trick + lower max_grad_norm to 0.5\n\nThe pure detach() STE passes gradients through unbounded, causing\ngradient explosion around step 140-150 when loss is still high.\n\nFix: clamp the gradient contribution within the detach trick:\n w_q = clamp(w_scaled, -1, 1) + (round(clamped) - clamped).detach()\nThis ensures gradients are zero outside [-1, 1] (weights already at\nquantization boundary get no gradient push) while keeping the STE\nidentity pass-through inside the valid range.\n\nAlso reduces max_grad_norm from 1.0 to 0.5 for additional stability.\n\nRef: 4-bit CPU training paper (2603.13931) uses tanh soft clipping\nfor the same reason." 12 days ago
tokenizer.py

6.84 kB
Upload folder using huggingface_hub 12 days ago