You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Scratch

Trained with transformer-toolkit.

Architecture

param value
vocab_size 32000
dim 512
n_layers 12
n_heads 8
max_seq 512
attn gqa
n_kv_heads 2
latent_dim 64
ffn swiglu
hidden_dim 1536
n_experts 8
top_k 2
moe_aux_weight 0.01
moe_capacity 1.0
moe_n_shared 2
moe_n_routed 6
norm rmsnorm
eps 1e-06
pos_enc rope
dropout 0.1
tie_weights False

Metrics

metric value
val_loss 3.944817864894867
step 37000
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support