YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
mHC-GPT Checkpoint
Trained with nanochat-mHC (multi-head communication).
Model Config
{
"sequence_len": 2048,
"vocab_size": 65536,
"n_layer": 12,
"n_head": 6,
"n_kv_head": 6,
"n_embd": 768,
"mhc_enabled": true,
"mhc_num_streams": 4,
"mhc_sinkhorn_iters": 50,
"mhc_sinkhorn_tau": 0.1,
"mhc_gate_noise": false
}
Training Config
{
"run": "mhc-sanity-20260117-062153",
"seed": 42,
"device_type": "",
"skip_compile": false,
"depth": 12,
"max_seq_len": 2048,
"mhc_enabled": true,
"mhc_num_streams": 4,
"mhc_sinkhorn_iters": 50,
"mhc_sinkhorn_tau": 0.1,
"mhc_gate_noise": false,
"num_iterations": 5000,
"target_flops": -1.0,
"target_param_data_ratio": 20,
"device_batch_size": 16,
"total_batch_size": 131072,
"embedding_lr": 0.2,
"unembedding_lr": 0.004,
"weight_decay": 0.0,
"matrix_lr": 0.02,
"grad_clip": 1.0,
"warmup_ratio": 0.0,
"warmdown_ratio": 0.2,
"final_lr_frac": 0.0,
"resume_from_step": -1,
"eval_every": 500,
"eval_tokens": 10485760,
"core_metric_every": -1,
"core_metric_max_per_task": 500,
"sample_every": 100,
"save_every": 5000,
"model_tag": ""
}
Results
- Step: 5000
- Val BPB: 0.9955225661074094
Usage
from nanochat.checkpoint_manager import build_model
model, tokenizer = build_model("path/to/checkpoint", step=5000, device="cuda", phase="inference")
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support