Aryagm
/

HRM-Text-1B-MLX-4bit

@@ -1,6 +1,7 @@
 ---
 license: apache-2.0
 base_model: sapientinc/HRM-Text-1B
 library_name: mlx
 pipeline_tag: text-generation
 inference: false
@@ -11,14 +12,27 @@ tags:
   - quantized
   - mxfp4
   - hrm
 ---
-# HRM-Text-1B MLX 4-bit
-This is a persisted 4-bit MXFP4 MLX checkpoint for
 [sapientinc/HRM-Text-1B](https://huggingface.co/sapientinc/HRM-Text-1B).
-It is intended for use with [HRM-mlx](https://github.com/Aryagm/HRM-mlx) on
-Apple Silicon.
 The checkpoint keeps the full HRM recurrent inference loop:
@@ -33,7 +47,21 @@ H_cycles * (L_cycles + 1) = 2 * (3 + 1) = 8 stack passes/token
 - `quantization.json`: quantization metadata
 - `tokenizer.json`, `tokenizer_config.json`: tokenizer files copied from the base model
-## Usage
 ```bash
 git clone https://github.com/Aryagm/HRM-mlx.git
@@ -52,7 +80,6 @@ from huggingface_hub import snapshot_download
 snapshot_download(
     repo_id="Aryagm/HRM-Text-1B-MLX-4bit",
     local_dir="exports/hrm-text-1b-mlx-mxfp4",
-    local_dir_use_symlinks=False,
 )
 PY
 ```
@@ -69,24 +96,69 @@ hrm-mlx \
   --metal-swiglu
 ```
-## Benchmark
-On a MacBook Pro M4 Max, 32-core GPU, this checkpoint reaches about
-56 decode tokens/sec with HRM-mlx's fast path:
 ```text
 MXFP4 weights + MLX fast RMSNorm/RoPE/SDPA + custom Metal SwiGLU
 ```
-Benchmark shape: 512 prompt tokens, 128 generated tokens. Absolute numbers vary
-by chip and system load.
-## Quality Notes
-This is a quantized inference checkpoint, not a new finetune. In a small
-qualitative check, 4-bit MXFP4 matched BF16 on simple math and short reasoning
-prompts, including the derivative of `(x^2) / ln(x)`. This is not a formal eval.
-HRM-Text-1B is a base reasoning model, not a polished chat assistant. It can
-produce incomplete or unstable answers on some prompts, especially when the
-prompt is underspecified or contradictory.

 ---
 license: apache-2.0
 base_model: sapientinc/HRM-Text-1B
+base_model_relation: quantized
 library_name: mlx
 pipeline_tag: text-generation
 inference: false
   - quantized
   - mxfp4
   - hrm
+  - reasoning
 ---
+# HRM-Text-1B-MLX-4bit
+## Model Details
+This repository contains a persisted 4-bit MXFP4 MLX checkpoint for
 [sapientinc/HRM-Text-1B](https://huggingface.co/sapientinc/HRM-Text-1B).
+It is intended for fast local inference on Apple Silicon with
+[HRM-mlx](https://github.com/Aryagm/HRM-mlx).
+This is not a new finetune. It is a quantized inference checkpoint derived from
+the public HRM-Text-1B weights.
+- **Base model:** `sapientinc/HRM-Text-1B`
+- **Runtime:** MLX
+- **Quantization:** 4-bit MXFP4
+- **Group size:** 32
+- **Primary target:** Apple Silicon
+- **License:** Apache-2.0
 The checkpoint keeps the full HRM recurrent inference loop:
 - `quantization.json`: quantization metadata
 - `tokenizer.json`, `tokenizer_config.json`: tokenizer files copied from the base model
+## Intended Use
+Use this checkpoint for local HRM-Text inference on Apple Silicon through
+HRM-mlx. It is useful when you want the HRM recurrent reasoning architecture
+without downloading the original 2.2 GB checkpoint and quantizing it locally.
+## Out-of-Scope Use
+This model card does not claim general assistant quality, safety alignment, or
+production suitability. HRM-Text-1B is a base reasoning model, not a polished
+chat assistant.
+## Quickstart
+Install HRM-mlx:
 ```bash
 git clone https://github.com/Aryagm/HRM-mlx.git
 snapshot_download(
     repo_id="Aryagm/HRM-Text-1B-MLX-4bit",
     local_dir="exports/hrm-text-1b-mlx-mxfp4",
 )
 PY
 ```
   --metal-swiglu
 ```
+Expected final expression:
+```text
+x(2 ln(x) - 1) / (ln(x))^2
+```
+## Performance
+Measured on a MacBook Pro M4 Max with a 32-core GPU:
+| Runtime | Decode tok/s | vs CPU |
+|---|---:|---:|
+| PyTorch CPU FP32 | 5.2 | 1.0x |
+| PyTorch MPS BF16 | 22.0 | 4.3x |
+| MLX BF16 | 24.7 | 4.8x |
+| MLX 4-bit | 38.5 | 7.5x |
+| HRM-mlx fast path | 56.0 | 10.9x |
+Benchmark shape: 512 prompt tokens, 128 generated tokens. Absolute numbers vary
+by chip, MLX version, thermals, and system load.
+Fastest tested configuration:
 ```text
 MXFP4 weights + MLX fast RMSNorm/RoPE/SDPA + custom Metal SwiGLU
 ```
+## Evaluation
+This checkpoint has not been evaluated with a formal benchmark suite.
+In a small qualitative check, 4-bit MXFP4 matched BF16 on simple math and short
+reasoning prompts, including the derivative of `(x^2) / ln(x)`. A contradictory
+functional-equation prompt was unstable for both BF16 and 4-bit, which appears
+to be a base-model or prompting limitation rather than a quantization-specific
+failure.
+## Limitations
+- HRM-Text-1B is a base model and can produce incomplete or unstable answers.
+- Long answers may need a generous `--max-tokens` value because the model often
+  reasons before giving a final expression.
+- This checkpoint is currently intended for HRM-mlx, not generic Transformers
+  loading.
+- The Hugging Face hosted inference widget is disabled because this is an MLX
+  checkpoint with a custom runtime path.
+## How This Checkpoint Was Produced
+The checkpoint was generated with HRM-mlx:
+```bash
+hrm-mlx-quantize \
+  --model-dir exports/hrm-text-1b-hf \
+  --out-dir exports/hrm-text-1b-mlx-mxfp4 \
+  --bits 4 \
+  --group-size 32 \
+  --mode mxfp4
+```
+## Citation
+Please cite the upstream HRM-Text release when using this checkpoint:
+- Base model: [sapientinc/HRM-Text-1B](https://huggingface.co/sapientinc/HRM-Text-1B)
+- MLX runtime: [Aryagm/HRM-mlx](https://github.com/Aryagm/HRM-mlx)