Instructions to use Aryagm/HRM-Text-1B-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Aryagm/HRM-Text-1B-MLX-4bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("Aryagm/HRM-Text-1B-MLX-4bit") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- MLX LM
How to use Aryagm/HRM-Text-1B-MLX-4bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "Aryagm/HRM-Text-1B-MLX-4bit" --prompt "Once upon a time"
| license: apache-2.0 | |
| base_model: sapientinc/HRM-Text-1B | |
| base_model_relation: quantized | |
| library_name: mlx | |
| pipeline_tag: text-generation | |
| inference: false | |
| tags: | |
| - mlx | |
| - apple-silicon | |
| - text-generation | |
| - quantized | |
| - mxfp4 | |
| - hrm | |
| - reasoning | |
| # HRM-Text-1B MLX 4-bit | |
| This is a persisted 4-bit MXFP4 MLX checkpoint for | |
| [sapientinc/HRM-Text-1B](https://huggingface.co/sapientinc/HRM-Text-1B). | |
| It is intended for use with [HRM-mlx](https://github.com/Aryagm/HRM-mlx) on | |
| Apple Silicon. | |
| This is not a new finetune. It is a quantized inference checkpoint derived from | |
| the public HRM-Text-1B weights. | |
| The checkpoint keeps the full HRM recurrent inference loop: | |
| ```text | |
| H_cycles * (L_cycles + 1) = 2 * (3 + 1) = 8 stack passes/token | |
| ``` | |
| ## Files | |
| - `model.safetensors`: MLX-format 4-bit MXFP4 weights | |
| - `config.json`: HRM-Text config with MLX metadata | |
| - `quantization.json`: quantization metadata | |
| - `tokenizer.json`, `tokenizer_config.json`: tokenizer files copied from the base model | |
| ## Usage | |
| ```bash | |
| git clone https://github.com/Aryagm/HRM-mlx.git | |
| cd HRM-mlx | |
| python3 -m venv .venv | |
| source .venv/bin/activate | |
| pip install -e . | |
| ``` | |
| Download this checkpoint: | |
| ```bash | |
| python - <<'PY' | |
| from huggingface_hub import snapshot_download | |
| snapshot_download( | |
| repo_id="Aryagm/HRM-Text-1B-MLX-4bit", | |
| local_dir="exports/hrm-text-1b-mlx-mxfp4", | |
| ) | |
| PY | |
| ``` | |
| Generate: | |
| ```bash | |
| hrm-mlx \ | |
| --model-dir exports/hrm-text-1b-mlx-mxfp4 \ | |
| --prompt '<|im_start|><|quad_end|><|object_ref_end|>What is the derivative of (x^2) / ln(x)? Give the final simplified expression.<|im_end|>' \ | |
| --max-tokens 420 \ | |
| --temperature 0.7 \ | |
| --dtype bfloat16 \ | |
| --metal-swiglu | |
| ``` | |
| Expected final expression: | |
| ```text | |
| x(2 ln(x) - 1) / (ln(x))^2 | |
| ``` | |
| ## Benchmark | |
| On a MacBook Pro M4 Max, 32-core GPU, this checkpoint reaches about | |
| 56 decode tokens/sec with HRM-mlx's fast path: | |
| ```text | |
| MXFP4 weights + MLX fast RMSNorm/RoPE/SDPA + custom Metal SwiGLU | |
| ``` | |
| Benchmark shape: 512 prompt tokens, 128 generated tokens. Absolute numbers vary | |
| by chip and system load. | |
| ## Quality Notes | |
| This checkpoint has not been evaluated with a formal benchmark suite. In a small | |
| qualitative check, 4-bit MXFP4 matched BF16 on simple math and short reasoning | |
| prompts, including the derivative of `(x^2) / ln(x)`. | |
| HRM-Text-1B is a base reasoning model, not a polished chat assistant. It can | |
| produce incomplete or unstable answers on some prompts, especially when the | |
| prompt is underspecified or contradictory. | |