YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Test-shakespeare
Trained with transformer-toolkit.
Architecture
| param | value |
|---|---|
vocab_size |
32000 |
dim |
512 |
n_layers |
12 |
n_heads |
8 |
max_seq |
512 |
attn |
gqa |
n_kv_heads |
2 |
latent_dim |
64 |
ffn |
swiglu |
hidden_dim |
1536 |
n_experts |
8 |
top_k |
2 |
moe_aux_weight |
0.01 |
moe_capacity |
1.0 |
moe_n_shared |
2 |
moe_n_routed |
6 |
norm |
rmsnorm |
eps |
1e-06 |
pos_enc |
rope |
dropout |
0.1 |
tie_weights |
False |
Metrics
| metric | value |
|---|---|
val_loss |
3.949221694469452 |
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support