Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
alexandretl
/
dragon
like
0
Model card
Files
Files and versions
xet
Community
main
dragon
367 kB
Ctrl+K
Ctrl+K
2 contributors
History:
62 commits
alexandretl
working resume | classic input embed | nGPT logit scaling | XSA | del M3 as_strided
10aee3a
21 days ago
optimizers
ngram embeds | DDL new code + EC | Hyperball (AdamH, AdEMAMixH) | lr_expert
2 months ago
.gitattributes
Safe
1.52 kB
initial commit
8 months ago
.gitignore
Safe
54 Bytes
CCE | Gate attn | ZCG | RoPE GDN | GQA GDN | uniconv GDN ||CCA | NSA | PLT (not tested) | DMA fix | SWR
6 months ago
__init__.py
Safe
0 Bytes
fixes+refactoring
7 months ago
compute_loss.py
Safe
16.1 kB
alpha normalize ademamix | mamba norms and gate | VWN | wnorm (nemotron-flash) | MG equivalence | fix IDM config saving | CCAv2 | MoBA | reduce lm head
5 months ago
configuration_dragon.py
Safe
13.6 kB
working resume | classic input embed | nGPT logit scaling | XSA | del M3 as_strided
21 days ago
coordcheck_utils.py
Safe
20 kB
mamba3 flags | mamba3 default state size to 128, headdim to 64 | mamba2 | fix mamba3 mimo (JG) | (fake) moe | intra doc maskiiiing (with SS) | seednorm tests | coord checks
5 months ago
coordchecking_dragon.py
Safe
4.76 kB
mamba3 flags | mamba3 default state size to 128, headdim to 64 | mamba2 | fix mamba3 mimo (JG) | (fake) moe | intra doc maskiiiing (with SS) | seednorm tests | coord checks
5 months ago
inspecting_dragon.py
Safe
12.3 kB
mamba3 flags | mamba3 default state size to 128, headdim to 64 | mamba2 | fix mamba3 mimo (JG) | (fake) moe | intra doc maskiiiing (with SS) | seednorm tests | coord checks
5 months ago
modeling_dragon.py
Safe
136 kB
working resume | classic input embed | nGPT logit scaling | XSA | del M3 as_strided
21 days ago
nsa_utils.py
Safe
18.8 kB
CCE | Gate attn | ZCG | RoPE GDN | GQA GDN | uniconv GDN ||CCA | NSA | PLT (not tested) | DMA fix | SWR
6 months ago
training_dragon.py
Safe
76.7 kB
working resume | classic input embed | nGPT logit scaling | XSA | del M3 as_strided
21 days ago