Instructions to use Aryagm/HRM-Text-1B-MLX-BF16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Aryagm/HRM-Text-1B-MLX-BF16 with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("Aryagm/HRM-Text-1B-MLX-BF16") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- MLX LM
How to use Aryagm/HRM-Text-1B-MLX-BF16 with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "Aryagm/HRM-Text-1B-MLX-BF16" --prompt "Once upon a time"
HRM-Text-1B MLX BF16
This is a BF16 MLX checkpoint for sapientinc/HRM-Text-1B. It is intended for use with HRM-mlx on Apple Silicon.
This is not a new finetune. It is a format conversion of the public HRM-Text-1B weights for native MLX inference.
The checkpoint keeps the full HRM recurrent inference loop:
H_cycles * (L_cycles + 1) = 2 * (3 + 1) = 8 stack passes/token
Files
model.safetensors: MLX-format BF16 weightsconfig.json: HRM-Text config with MLX metadatatokenizer.json,tokenizer_config.json: tokenizer files copied from the base model
Usage
git clone https://github.com/Aryagm/HRM-mlx.git
cd HRM-mlx
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
Download this checkpoint:
python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="Aryagm/HRM-Text-1B-MLX-BF16",
local_dir="exports/hrm-text-1b-mlx-bf16",
)
PY
Generate:
hrm-mlx \
--model-dir exports/hrm-text-1b-mlx-bf16 \
--prompt '<|im_start|><|quad_end|><|object_ref_end|>What is the derivative of (x^2) / ln(x)? Give the final simplified expression.<|im_end|>' \
--max-tokens 420 \
--temperature 0.0 \
--dtype bfloat16 \
--metal-swiglu
Expected final expression:
x(2 ln(x) - 1) / (ln(x))^2
Benchmark
On a MacBook Pro M4 Max, 32-core GPU, this checkpoint reaches about 28 decode tokens/sec with HRM-mlx's BF16 path.
Benchmark shape: 512 prompt tokens, 128 generated tokens. Absolute numbers vary by chip and system load.
Notes
For faster decode and a smaller download, use the 4-bit MXFP4 checkpoint.
HRM-Text-1B is a base reasoning model, not a polished chat assistant. It can produce incomplete or unstable answers on some prompts, especially when the prompt is underspecified or contradictory.
- Downloads last month
- 145
Quantized
Model tree for Aryagm/HRM-Text-1B-MLX-BF16
Base model
sapientinc/HRM-Text-1B