File size: 2,480 Bytes
3ef4009
 
 
0aec310
3ef4009
82c7754
 
3ef4009
 
 
 
 
 
 
0aec310
3ef4009
 
054f232
3ef4009
054f232
3ef4009
054f232
 
0aec310
 
 
 
3ef4009
 
 
 
 
 
 
 
 
 
 
 
 
054f232
3ef4009
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0aec310
 
 
 
 
 
054f232
0aec310
054f232
 
3ef4009
 
 
 
 
054f232
 
3ef4009
054f232
3ef4009
054f232
 
 
3ef4009
054f232
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
license: apache-2.0
base_model: sapientinc/HRM-Text-1B
base_model_relation: quantized
library_name: mlx
pipeline_tag: text-generation
inference: false
tags:
  - mlx
  - apple-silicon
  - text-generation
  - quantized
  - mxfp4
  - hrm
  - reasoning
---

# HRM-Text-1B MLX 4-bit

This is a persisted 4-bit MXFP4 MLX checkpoint for
[sapientinc/HRM-Text-1B](https://huggingface.co/sapientinc/HRM-Text-1B).
It is intended for use with [HRM-mlx](https://github.com/Aryagm/HRM-mlx) on
Apple Silicon.

This is not a new finetune. It is a quantized inference checkpoint derived from
the public HRM-Text-1B weights.

The checkpoint keeps the full HRM recurrent inference loop:

```text
H_cycles * (L_cycles + 1) = 2 * (3 + 1) = 8 stack passes/token
```

## Files

- `model.safetensors`: MLX-format 4-bit MXFP4 weights
- `config.json`: HRM-Text config with MLX metadata
- `quantization.json`: quantization metadata
- `tokenizer.json`, `tokenizer_config.json`: tokenizer files copied from the base model

## Usage

```bash
git clone https://github.com/Aryagm/HRM-mlx.git
cd HRM-mlx
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
```

Download this checkpoint:

```bash
python - <<'PY'
from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="Aryagm/HRM-Text-1B-MLX-4bit",
    local_dir="exports/hrm-text-1b-mlx-mxfp4",
)
PY
```

Generate:

```bash
hrm-mlx \
  --model-dir exports/hrm-text-1b-mlx-mxfp4 \
  --prompt '<|im_start|><|quad_end|><|object_ref_end|>What is the derivative of (x^2) / ln(x)? Give the final simplified expression.<|im_end|>' \
  --max-tokens 420 \
  --temperature 0.7 \
  --dtype bfloat16 \
  --metal-swiglu
```

Expected final expression:

```text
x(2 ln(x) - 1) / (ln(x))^2
```

## Benchmark

On a MacBook Pro M4 Max, 32-core GPU, this checkpoint reaches about
56 decode tokens/sec with HRM-mlx's fast path:

```text
MXFP4 weights + MLX fast RMSNorm/RoPE/SDPA + custom Metal SwiGLU
```

Benchmark shape: 512 prompt tokens, 128 generated tokens. Absolute numbers vary
by chip and system load.

## Quality Notes

This checkpoint has not been evaluated with a formal benchmark suite. In a small
qualitative check, 4-bit MXFP4 matched BF16 on simple math and short reasoning
prompts, including the derivative of `(x^2) / ln(x)`.

HRM-Text-1B is a base reasoning model, not a polished chat assistant. It can
produce incomplete or unstable answers on some prompts, especially when the
prompt is underspecified or contradictory.