Llama-Hopper-Reasoning (Experimental)

This model is a research artifact from the blog post: Hopper: The Optimizer That Learns Parallelism 2x Faster Than Adam.

It was trained to benchmark Hopper (a modified Muon optimizer with ns_steps=1 + Variance Normalization) against AdamW in a Reinforcement Learning (DAPO) setting.

Why this model exists

We found that standard Muon (ns_steps=5) causes entropy collapse in RL, while Adam (ns_steps=0) relies on linear heuristics. This model (cp-200) demonstrates a unique property: It solves parallel reasoning tasks that Adam fails, even though it is slightly clumsy at arithmetic.

The "Towel Test"

Prompt: "It takes 1 hour to dry 3 towels. How long for 6 towels?"
Adam (Baseline): Fails ($6 \to 2$ hours).
This Model: Solves it correctly ($6 \to 1$ hour).

Technical Details

Base Model: Qwen/Qwen2.5-0.5B-Instruct
Optimizer: Hopper (Muon variant)
Newton-Schulz Steps: 1 (Lazy Orthogonality)
Checkpoint: Step 200 (Early Stopped for reasoning emergence).

How to Run Inferencing

from huggingface_hub import HfApi, login

# 1. Login
login()

# 2. Upload only the essentials
api = HfApi()
repo_id = "JenWei/Llama-Hopper-Reasoning-v1" 
checkpoint_path = "./hopper-cp-200" # folder path for the check point

print("🚀 Uploading ONLY inference files...")

api.upload_folder(
    folder_path=checkpoint_path,
    repo_id=repo_id,
    repo_type="model",
)

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning