chomera / chimera

Commit History

fix: MoE intermediate_size not scaled for tiny — 158M→4M MoE params
6cb7b4d
verified

Lgr54HFi commited on

fix: print every step + first-step timing to diagnose slow forward
5b5a08d
verified

Lgr54HFi commited on

fix: OOM at batch=256 β€” cap batch by logits memory, enable grad ckpt
5bfbb8a
verified

Lgr54HFi commited on

perf: tune train_hyper_loop for 300-step convergence
9d8c566
verified

Lgr54HFi commited on

Fix loss rebound: lower Muon LR (0.02β†’0.008), clamp ternary latents, steeper cosine decay
e4d9588
verified

Lgr54HFi commited on

Skip SpanEngine/Grammar/DebtLedger during training (inference-only ops on 200K logits)
dda344d
verified

Lgr54HFi commited on

Upload chimera/training/loops.py
6d5c935
verified

Lgr54HFi commited on

Fix NaN loss reporting: show nan instead of 0.0 when all steps in window are NaN
8e41f12
verified

Lgr54HFi commited on

Upload chimera/model.py
310c416
verified

Lgr54HFi commited on

Upload chimera/training/hyper.py
6a7521a
verified

Lgr54HFi commited on

Upload chimera/training/loops.py
edcdcb3
verified

Lgr54HFi commited on

feat: loops.py v11 β€” aligned with GENESIS engine, no distiller overhead"
3859a82
verified

Lgr54HFi commited on

feat: loops.py β€” integrate Muon + MTP + EMA distillation in training loop"
9897d01
verified

Lgr54HFi commited on

feat: train_hyper_loop with progressive looping, evolution loss feedback, no progressive_unfreeze default\n\nActivates dormant ch1mera paradigms:\n1. Progressive looping: 1β†’2β†’3 Parcae loops during training\n2. Evolution receives prev_loss for surprise-based memory writes\n3. progressive_unfreeze disabled by default (all layers train from start)\n4. Logs loop count and NaN-safe averaging"
b6bcd75
verified

Lgr54HFi commited on

feat: export ProgressiveLoopScheduler"
945c5bf
verified

Lgr54HFi commited on

feat: activate dormant paradigms β€” progressive looping, evolution with loss feedback, no progressive_unfreeze\n\nWith STE+AdamW (not MeZO), we can afford multi-loop training.\nProgressive loop schedule: 1β†’2β†’3 loops as training advances.\nEvolution engine now receives previous step loss for surprise\ndetection and memory writes.\nProgressive unfreeze disabled by default (counterproductive with backprop)."
5fd9d22
verified

Lgr54HFi commited on

fix: loops.py β€” use chimera_turbo v8 defaults (wd=0.01, warmup=750, Ξ²2=0.98) instead of hardcoded values"
e2f5e25
verified

Lgr54HFi commited on

fix: NaN at step 150 β€” add gradient clamping to STE detach trick + lower max_grad_norm to 0.5\n\nThe pure detach() STE passes gradients through unbounded, causing\ngradient explosion around step 140-150 when loss is still high.\n\nFix: clamp the gradient contribution within the detach trick:\n w_q = clamp(w_scaled, -1, 1) + (round(clamped) - clamped).detach()\nThis ensures gradients are zero outside [-1, 1] (weights already at\nquantization boundary get no gradient push) while keeping the STE\nidentity pass-through inside the valid range.\n\nAlso reduces max_grad_norm from 1.0 to 0.5 for additional stability.\n\nRef: 4-bit CPU training paper (2603.13931) uses tanh soft clipping\nfor the same reason."
ec200d2
verified

Lgr54HFi commited on

perf: eliminate .item() graph breaks in evolution.py β€” use tensor comparisons for torch.compile compat"
fc678ef
verified

Lgr54HFi commited on

fix: re-enable torch.compile in train_hyper_loop (STE graph breaks fixed)"
f6670ea
verified

Lgr54HFi commited on

perf: replace _RoundTernarySTE autograd.Function with detach() trick β€” zero graph breaks for torch.compile\n\nThe detach() identity pattern (w + (round(clamp(w)) - w).detach()) is\nmathematically equivalent to the old STE but uses only standard aten ops\nthat torch.compile/Inductor can trace through. This eliminates 84+\ngraph breaks, enabling full kernel fusion of quantize+linear.\n\nPattern from official BitNet b1.58 implementation (1bitLLM/bitnet_b1_58-large).\nRef: arxiv 2402.17764"
31b0fdf
verified

Lgr54HFi commited on

fix: train_hyper_loop grad_accum=1 (DataLoader already batches), better tok/s logging
31d69ba
verified

Lgr54HFi commited on

Upload folder using huggingface_hub
11c11f8
verified

Lgr54HFi commited on