How to use from the
Use from the
MLX library
# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm
# if on a CUDA device, also pip install mlx[cuda]

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Aryagm/HRM-Text-1B-MLX-4bit")

prompt = "Once upon a time in"
text = generate(model, tokenizer, prompt=prompt, verbose=True)

HRM-Text-1B MLX 4-bit

This is a persisted 4-bit MXFP4 MLX checkpoint for sapientinc/HRM-Text-1B. It is intended for use with HRM-mlx on Apple Silicon.

This is not a new finetune. It is a quantized inference checkpoint derived from the public HRM-Text-1B weights.

The checkpoint keeps the full HRM recurrent inference loop:

H_cycles * (L_cycles + 1) = 2 * (3 + 1) = 8 stack passes/token

Files

  • model.safetensors: MLX-format 4-bit MXFP4 weights
  • config.json: HRM-Text config with MLX metadata
  • quantization.json: quantization metadata
  • tokenizer.json, tokenizer_config.json: tokenizer files copied from the base model

Usage

git clone https://github.com/Aryagm/HRM-mlx.git
cd HRM-mlx
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

Download this checkpoint:

python - <<'PY'
from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="Aryagm/HRM-Text-1B-MLX-4bit",
    local_dir="exports/hrm-text-1b-mlx-mxfp4",
)
PY

Generate:

hrm-mlx \
  --model-dir exports/hrm-text-1b-mlx-mxfp4 \
  --prompt '<|im_start|><|quad_end|><|object_ref_end|>What is the derivative of (x^2) / ln(x)? Give the final simplified expression.<|im_end|>' \
  --max-tokens 420 \
  --temperature 0.7 \
  --dtype bfloat16 \
  --metal-swiglu

Expected final expression:

x(2 ln(x) - 1) / (ln(x))^2

Benchmark

On a MacBook Pro M4 Max, 32-core GPU, this checkpoint reaches about 56 decode tokens/sec with HRM-mlx's fast path:

MXFP4 weights + MLX fast RMSNorm/RoPE/SDPA + custom Metal SwiGLU

Benchmark shape: 512 prompt tokens, 128 generated tokens. Absolute numbers vary by chip and system load.

Quality Notes

This checkpoint has not been evaluated with a formal benchmark suite. In a small qualitative check, 4-bit MXFP4 matched BF16 on simple math and short reasoning prompts, including the derivative of (x^2) / ln(x).

HRM-Text-1B is a base reasoning model, not a polished chat assistant. It can produce incomplete or unstable answers on some prompts, especially when the prompt is underspecified or contradictory.

Downloads last month
265
Safetensors
Model size
0.3B params
Tensor type
U8
U32
BF16
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for Aryagm/HRM-Text-1B-MLX-4bit

Quantized
(4)
this model