Instructions to use Aryagm/HRM-Text-1B-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Aryagm/HRM-Text-1B-MLX-4bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("Aryagm/HRM-Text-1B-MLX-4bit") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- MLX LM
How to use Aryagm/HRM-Text-1B-MLX-4bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "Aryagm/HRM-Text-1B-MLX-4bit" --prompt "Once upon a time"
File size: 2,480 Bytes
3ef4009 0aec310 3ef4009 82c7754 3ef4009 0aec310 3ef4009 054f232 3ef4009 054f232 3ef4009 054f232 0aec310 3ef4009 054f232 3ef4009 0aec310 054f232 0aec310 054f232 3ef4009 054f232 3ef4009 054f232 3ef4009 054f232 3ef4009 054f232 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | ---
license: apache-2.0
base_model: sapientinc/HRM-Text-1B
base_model_relation: quantized
library_name: mlx
pipeline_tag: text-generation
inference: false
tags:
- mlx
- apple-silicon
- text-generation
- quantized
- mxfp4
- hrm
- reasoning
---
# HRM-Text-1B MLX 4-bit
This is a persisted 4-bit MXFP4 MLX checkpoint for
[sapientinc/HRM-Text-1B](https://huggingface.co/sapientinc/HRM-Text-1B).
It is intended for use with [HRM-mlx](https://github.com/Aryagm/HRM-mlx) on
Apple Silicon.
This is not a new finetune. It is a quantized inference checkpoint derived from
the public HRM-Text-1B weights.
The checkpoint keeps the full HRM recurrent inference loop:
```text
H_cycles * (L_cycles + 1) = 2 * (3 + 1) = 8 stack passes/token
```
## Files
- `model.safetensors`: MLX-format 4-bit MXFP4 weights
- `config.json`: HRM-Text config with MLX metadata
- `quantization.json`: quantization metadata
- `tokenizer.json`, `tokenizer_config.json`: tokenizer files copied from the base model
## Usage
```bash
git clone https://github.com/Aryagm/HRM-mlx.git
cd HRM-mlx
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
```
Download this checkpoint:
```bash
python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="Aryagm/HRM-Text-1B-MLX-4bit",
local_dir="exports/hrm-text-1b-mlx-mxfp4",
)
PY
```
Generate:
```bash
hrm-mlx \
--model-dir exports/hrm-text-1b-mlx-mxfp4 \
--prompt '<|im_start|><|quad_end|><|object_ref_end|>What is the derivative of (x^2) / ln(x)? Give the final simplified expression.<|im_end|>' \
--max-tokens 420 \
--temperature 0.7 \
--dtype bfloat16 \
--metal-swiglu
```
Expected final expression:
```text
x(2 ln(x) - 1) / (ln(x))^2
```
## Benchmark
On a MacBook Pro M4 Max, 32-core GPU, this checkpoint reaches about
56 decode tokens/sec with HRM-mlx's fast path:
```text
MXFP4 weights + MLX fast RMSNorm/RoPE/SDPA + custom Metal SwiGLU
```
Benchmark shape: 512 prompt tokens, 128 generated tokens. Absolute numbers vary
by chip and system load.
## Quality Notes
This checkpoint has not been evaluated with a formal benchmark suite. In a small
qualitative check, 4-bit MXFP4 matched BF16 on simple math and short reasoning
prompts, including the derivative of `(x^2) / ln(x)`.
HRM-Text-1B is a base reasoning model, not a polished chat assistant. It can
produce incomplete or unstable answers on some prompts, especially when the
prompt is underspecified or contradictory.
|