aethera-gp
/

kotodama-108m-base

Text Generation

attention-residuals

nca-pretraining

geometric-monitoring

Eval Results (legacy)

Model card Files Files and versions

LuxiaSL commited on 3 days ago

Commit

e0b18a0

·

verified ·

1 Parent(s): d264b3c

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -62,7 +62,7 @@ The model combines three techniques not previously studied together at this scal
 - **Muon optimizer** -- spectral-norm steepest descent via Newton-Schulz orthogonalization, producing 2-4x higher stable rank than AdamW at matched loss, with Gram-NS optimized coefficients.
 **Organization:** [aethera-gp](https://huggingface.co/aethera-gp)
-**Training code:** [github.com/aethera-gp/kotodama](https://github.com/aethera-gp/kotodama) (pretraining/)
 ## Architecture

 - **Muon optimizer** -- spectral-norm steepest descent via Newton-Schulz orthogonalization, producing 2-4x higher stable rank than AdamW at matched loss, with Gram-NS optimized coefficients.
 **Organization:** [aethera-gp](https://huggingface.co/aethera-gp)
+**Training code:** [github.com/LuxiaSL/kotodama](https://github.com/LuxiaSL/kotodama)
 ## Architecture