Aryagm commited on
Commit
0aec310
·
verified ·
1 Parent(s): 82c7754

Use full Hugging Face model card

Browse files
Files changed (1) hide show
  1. README.md +90 -18
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ---
2
  license: apache-2.0
3
  base_model: sapientinc/HRM-Text-1B
 
4
  library_name: mlx
5
  pipeline_tag: text-generation
6
  inference: false
@@ -11,14 +12,27 @@ tags:
11
  - quantized
12
  - mxfp4
13
  - hrm
 
14
  ---
15
 
16
- # HRM-Text-1B MLX 4-bit
17
 
18
- This is a persisted 4-bit MXFP4 MLX checkpoint for
 
 
19
  [sapientinc/HRM-Text-1B](https://huggingface.co/sapientinc/HRM-Text-1B).
20
- It is intended for use with [HRM-mlx](https://github.com/Aryagm/HRM-mlx) on
21
- Apple Silicon.
 
 
 
 
 
 
 
 
 
 
22
 
23
  The checkpoint keeps the full HRM recurrent inference loop:
24
 
@@ -33,7 +47,21 @@ H_cycles * (L_cycles + 1) = 2 * (3 + 1) = 8 stack passes/token
33
  - `quantization.json`: quantization metadata
34
  - `tokenizer.json`, `tokenizer_config.json`: tokenizer files copied from the base model
35
 
36
- ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  ```bash
39
  git clone https://github.com/Aryagm/HRM-mlx.git
@@ -52,7 +80,6 @@ from huggingface_hub import snapshot_download
52
  snapshot_download(
53
  repo_id="Aryagm/HRM-Text-1B-MLX-4bit",
54
  local_dir="exports/hrm-text-1b-mlx-mxfp4",
55
- local_dir_use_symlinks=False,
56
  )
57
  PY
58
  ```
@@ -69,24 +96,69 @@ hrm-mlx \
69
  --metal-swiglu
70
  ```
71
 
72
- ## Benchmark
 
 
 
 
 
 
 
 
73
 
74
- On a MacBook Pro M4 Max, 32-core GPU, this checkpoint reaches about
75
- 56 decode tokens/sec with HRM-mlx's fast path:
 
 
 
 
 
 
 
 
 
 
76
 
77
  ```text
78
  MXFP4 weights + MLX fast RMSNorm/RoPE/SDPA + custom Metal SwiGLU
79
  ```
80
 
81
- Benchmark shape: 512 prompt tokens, 128 generated tokens. Absolute numbers vary
82
- by chip and system load.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
 
84
- ## Quality Notes
85
 
86
- This is a quantized inference checkpoint, not a new finetune. In a small
87
- qualitative check, 4-bit MXFP4 matched BF16 on simple math and short reasoning
88
- prompts, including the derivative of `(x^2) / ln(x)`. This is not a formal eval.
89
 
90
- HRM-Text-1B is a base reasoning model, not a polished chat assistant. It can
91
- produce incomplete or unstable answers on some prompts, especially when the
92
- prompt is underspecified or contradictory.
 
1
  ---
2
  license: apache-2.0
3
  base_model: sapientinc/HRM-Text-1B
4
+ base_model_relation: quantized
5
  library_name: mlx
6
  pipeline_tag: text-generation
7
  inference: false
 
12
  - quantized
13
  - mxfp4
14
  - hrm
15
+ - reasoning
16
  ---
17
 
18
+ # HRM-Text-1B-MLX-4bit
19
 
20
+ ## Model Details
21
+
22
+ This repository contains a persisted 4-bit MXFP4 MLX checkpoint for
23
  [sapientinc/HRM-Text-1B](https://huggingface.co/sapientinc/HRM-Text-1B).
24
+ It is intended for fast local inference on Apple Silicon with
25
+ [HRM-mlx](https://github.com/Aryagm/HRM-mlx).
26
+
27
+ This is not a new finetune. It is a quantized inference checkpoint derived from
28
+ the public HRM-Text-1B weights.
29
+
30
+ - **Base model:** `sapientinc/HRM-Text-1B`
31
+ - **Runtime:** MLX
32
+ - **Quantization:** 4-bit MXFP4
33
+ - **Group size:** 32
34
+ - **Primary target:** Apple Silicon
35
+ - **License:** Apache-2.0
36
 
37
  The checkpoint keeps the full HRM recurrent inference loop:
38
 
 
47
  - `quantization.json`: quantization metadata
48
  - `tokenizer.json`, `tokenizer_config.json`: tokenizer files copied from the base model
49
 
50
+ ## Intended Use
51
+
52
+ Use this checkpoint for local HRM-Text inference on Apple Silicon through
53
+ HRM-mlx. It is useful when you want the HRM recurrent reasoning architecture
54
+ without downloading the original 2.2 GB checkpoint and quantizing it locally.
55
+
56
+ ## Out-of-Scope Use
57
+
58
+ This model card does not claim general assistant quality, safety alignment, or
59
+ production suitability. HRM-Text-1B is a base reasoning model, not a polished
60
+ chat assistant.
61
+
62
+ ## Quickstart
63
+
64
+ Install HRM-mlx:
65
 
66
  ```bash
67
  git clone https://github.com/Aryagm/HRM-mlx.git
 
80
  snapshot_download(
81
  repo_id="Aryagm/HRM-Text-1B-MLX-4bit",
82
  local_dir="exports/hrm-text-1b-mlx-mxfp4",
 
83
  )
84
  PY
85
  ```
 
96
  --metal-swiglu
97
  ```
98
 
99
+ Expected final expression:
100
+
101
+ ```text
102
+ x(2 ln(x) - 1) / (ln(x))^2
103
+ ```
104
+
105
+ ## Performance
106
+
107
+ Measured on a MacBook Pro M4 Max with a 32-core GPU:
108
 
109
+ | Runtime | Decode tok/s | vs CPU |
110
+ |---|---:|---:|
111
+ | PyTorch CPU FP32 | 5.2 | 1.0x |
112
+ | PyTorch MPS BF16 | 22.0 | 4.3x |
113
+ | MLX BF16 | 24.7 | 4.8x |
114
+ | MLX 4-bit | 38.5 | 7.5x |
115
+ | HRM-mlx fast path | 56.0 | 10.9x |
116
+
117
+ Benchmark shape: 512 prompt tokens, 128 generated tokens. Absolute numbers vary
118
+ by chip, MLX version, thermals, and system load.
119
+
120
+ Fastest tested configuration:
121
 
122
  ```text
123
  MXFP4 weights + MLX fast RMSNorm/RoPE/SDPA + custom Metal SwiGLU
124
  ```
125
 
126
+ ## Evaluation
127
+
128
+ This checkpoint has not been evaluated with a formal benchmark suite.
129
+
130
+ In a small qualitative check, 4-bit MXFP4 matched BF16 on simple math and short
131
+ reasoning prompts, including the derivative of `(x^2) / ln(x)`. A contradictory
132
+ functional-equation prompt was unstable for both BF16 and 4-bit, which appears
133
+ to be a base-model or prompting limitation rather than a quantization-specific
134
+ failure.
135
+
136
+ ## Limitations
137
+
138
+ - HRM-Text-1B is a base model and can produce incomplete or unstable answers.
139
+ - Long answers may need a generous `--max-tokens` value because the model often
140
+ reasons before giving a final expression.
141
+ - This checkpoint is currently intended for HRM-mlx, not generic Transformers
142
+ loading.
143
+ - The Hugging Face hosted inference widget is disabled because this is an MLX
144
+ checkpoint with a custom runtime path.
145
+
146
+ ## How This Checkpoint Was Produced
147
+
148
+ The checkpoint was generated with HRM-mlx:
149
+
150
+ ```bash
151
+ hrm-mlx-quantize \
152
+ --model-dir exports/hrm-text-1b-hf \
153
+ --out-dir exports/hrm-text-1b-mlx-mxfp4 \
154
+ --bits 4 \
155
+ --group-size 32 \
156
+ --mode mxfp4
157
+ ```
158
 
159
+ ## Citation
160
 
161
+ Please cite the upstream HRM-Text release when using this checkpoint:
 
 
162
 
163
+ - Base model: [sapientinc/HRM-Text-1B](https://huggingface.co/sapientinc/HRM-Text-1B)
164
+ - MLX runtime: [Aryagm/HRM-mlx](https://github.com/Aryagm/HRM-mlx)