Aryagm's picture
Upload BF16 MLX checkpoint
4f6500e verified
metadata
license: apache-2.0
base_model: sapientinc/HRM-Text-1B
library_name: mlx
pipeline_tag: text-generation
inference: false
tags:
  - mlx
  - apple-silicon
  - text-generation
  - bf16
  - hrm
  - reasoning

HRM-Text-1B MLX BF16

This is a BF16 MLX checkpoint for sapientinc/HRM-Text-1B. It is intended for use with HRM-mlx on Apple Silicon.

This is not a new finetune. It is a format conversion of the public HRM-Text-1B weights for native MLX inference.

The checkpoint keeps the full HRM recurrent inference loop:

H_cycles * (L_cycles + 1) = 2 * (3 + 1) = 8 stack passes/token

Files

  • model.safetensors: MLX-format BF16 weights
  • config.json: HRM-Text config with MLX metadata
  • tokenizer.json, tokenizer_config.json: tokenizer files copied from the base model

Usage

git clone https://github.com/Aryagm/HRM-mlx.git
cd HRM-mlx
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

Download this checkpoint:

python - <<'PY'
from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="Aryagm/HRM-Text-1B-MLX-BF16",
    local_dir="exports/hrm-text-1b-mlx-bf16",
)
PY

Generate:

hrm-mlx \
  --model-dir exports/hrm-text-1b-mlx-bf16 \
  --prompt '<|im_start|><|quad_end|><|object_ref_end|>What is the derivative of (x^2) / ln(x)? Give the final simplified expression.<|im_end|>' \
  --max-tokens 420 \
  --temperature 0.0 \
  --dtype bfloat16 \
  --metal-swiglu

Expected final expression:

x(2 ln(x) - 1) / (ln(x))^2

Benchmark

On a MacBook Pro M4 Max, 32-core GPU, this checkpoint reaches about 28 decode tokens/sec with HRM-mlx's BF16 path.

Benchmark shape: 512 prompt tokens, 128 generated tokens. Absolute numbers vary by chip and system load.

Notes

For faster decode and a smaller download, use the 4-bit MXFP4 checkpoint.

HRM-Text-1B is a base reasoning model, not a polished chat assistant. It can produce incomplete or unstable answers on some prompts, especially when the prompt is underspecified or contradictory.