Llama-Hopper-Reasoning (Experimental)
This model is a research artifact from the blog post: Hopper: The Optimizer That Learns Parallelism 2x Faster Than Adam.
It was trained to benchmark Hopper (a modified Muon optimizer with ns_steps=1 + Variance Normalization) against AdamW in a Reinforcement Learning (DAPO) setting.
Why this model exists
We found that standard Muon (ns_steps=5) causes entropy collapse in RL, while Adam (ns_steps=0) relies on linear heuristics.
This model (cp-200) demonstrates a unique property: It solves parallel reasoning tasks that Adam fails, even though it is slightly clumsy at arithmetic.
The "Towel Test"
- Prompt: "It takes 1 hour to dry 3 towels. How long for 6 towels?"
- Adam (Baseline): Fails ($6 \to 2$ hours).
- This Model: Solves it correctly ($6 \to 1$ hour).
Technical Details
- Base Model:
Qwen/Qwen2.5-0.5B-Instruct - Optimizer: Hopper (Muon variant)
- Newton-Schulz Steps: 1 (Lazy Orthogonality)
- Checkpoint: Step 200 (Early Stopped for reasoning emergence).
How to Run Inferencing
from huggingface_hub import HfApi, login
# 1. Login
login()
# 2. Upload only the essentials
api = HfApi()
repo_id = "JenWei/Llama-Hopper-Reasoning-v1"
checkpoint_path = "./hopper-cp-200" # folder path for the check point
print("🚀 Uploading ONLY inference files...")
api.upload_folder(
folder_path=checkpoint_path,
repo_id=repo_id,
repo_type="model",
)