mHC-GPT Checkpoint

Trained with nanochat-mHC (multi-head communication).

Model Config

{
  "sequence_len": 2048,
  "vocab_size": 65536,
  "n_layer": 12,
  "n_head": 6,
  "n_kv_head": 6,
  "n_embd": 768,
  "mhc_enabled": true,
  "mhc_num_streams": 4,
  "mhc_sinkhorn_iters": 50,
  "mhc_sinkhorn_tau": 0.1,
  "mhc_gate_noise": false
}

Training Config

{
  "run": "mhc-sanity-20260117-062153",
  "seed": 42,
  "device_type": "",
  "skip_compile": false,
  "depth": 12,
  "max_seq_len": 2048,
  "mhc_enabled": true,
  "mhc_num_streams": 4,
  "mhc_sinkhorn_iters": 50,
  "mhc_sinkhorn_tau": 0.1,
  "mhc_gate_noise": false,
  "num_iterations": 5000,
  "target_flops": -1.0,
  "target_param_data_ratio": 20,
  "device_batch_size": 16,
  "total_batch_size": 131072,
  "embedding_lr": 0.2,
  "unembedding_lr": 0.004,
  "weight_decay": 0.0,
  "matrix_lr": 0.02,
  "grad_clip": 1.0,
  "warmup_ratio": 0.0,
  "warmdown_ratio": 0.2,
  "final_lr_frac": 0.0,
  "resume_from_step": -1,
  "eval_every": 500,
  "eval_tokens": 10485760,
  "core_metric_every": -1,
  "core_metric_max_per_task": 500,
  "sample_every": 100,
  "save_every": 5000,
  "model_tag": ""
}

Results

Step: 5000
Val BPB: 0.9955225661074094

Usage

from nanochat.checkpoint_manager import build_model

model, tokenizer = build_model("path/to/checkpoint", step=5000, device="cuda", phase="inference")

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support