Update README.md
Browse files
README.md
CHANGED
|
@@ -62,7 +62,7 @@ The model combines three techniques not previously studied together at this scal
|
|
| 62 |
- **Muon optimizer** -- spectral-norm steepest descent via Newton-Schulz orthogonalization, producing 2-4x higher stable rank than AdamW at matched loss, with Gram-NS optimized coefficients.
|
| 63 |
|
| 64 |
**Organization:** [aethera-gp](https://huggingface.co/aethera-gp)
|
| 65 |
-
**Training code:** [github.com/
|
| 66 |
|
| 67 |
## Architecture
|
| 68 |
|
|
|
|
| 62 |
- **Muon optimizer** -- spectral-norm steepest descent via Newton-Schulz orthogonalization, producing 2-4x higher stable rank than AdamW at matched loss, with Gram-NS optimized coefficients.
|
| 63 |
|
| 64 |
**Organization:** [aethera-gp](https://huggingface.co/aethera-gp)
|
| 65 |
+
**Training code:** [github.com/LuxiaSL/kotodama](https://github.com/LuxiaSL/kotodama)
|
| 66 |
|
| 67 |
## Architecture
|
| 68 |
|