--- license: apache-2.0 base_model: sapientinc/HRM-Text-1B base_model_relation: quantized library_name: mlx pipeline_tag: text-generation inference: false tags: - mlx - apple-silicon - text-generation - quantized - mxfp4 - hrm - reasoning --- # HRM-Text-1B MLX 4-bit This is a persisted 4-bit MXFP4 MLX checkpoint for [sapientinc/HRM-Text-1B](https://huggingface.co/sapientinc/HRM-Text-1B). It is intended for use with [HRM-mlx](https://github.com/Aryagm/HRM-mlx) on Apple Silicon. This is not a new finetune. It is a quantized inference checkpoint derived from the public HRM-Text-1B weights. The checkpoint keeps the full HRM recurrent inference loop: ```text H_cycles * (L_cycles + 1) = 2 * (3 + 1) = 8 stack passes/token ``` ## Files - `model.safetensors`: MLX-format 4-bit MXFP4 weights - `config.json`: HRM-Text config with MLX metadata - `quantization.json`: quantization metadata - `tokenizer.json`, `tokenizer_config.json`: tokenizer files copied from the base model ## Usage ```bash git clone https://github.com/Aryagm/HRM-mlx.git cd HRM-mlx python3 -m venv .venv source .venv/bin/activate pip install -e . ``` Download this checkpoint: ```bash python - <<'PY' from huggingface_hub import snapshot_download snapshot_download( repo_id="Aryagm/HRM-Text-1B-MLX-4bit", local_dir="exports/hrm-text-1b-mlx-mxfp4", ) PY ``` Generate: ```bash hrm-mlx \ --model-dir exports/hrm-text-1b-mlx-mxfp4 \ --prompt '<|im_start|><|quad_end|><|object_ref_end|>What is the derivative of (x^2) / ln(x)? Give the final simplified expression.<|im_end|>' \ --max-tokens 420 \ --temperature 0.7 \ --dtype bfloat16 \ --metal-swiglu ``` Expected final expression: ```text x(2 ln(x) - 1) / (ln(x))^2 ``` ## Benchmark On a MacBook Pro M4 Max, 32-core GPU, this checkpoint reaches about 56 decode tokens/sec with HRM-mlx's fast path: ```text MXFP4 weights + MLX fast RMSNorm/RoPE/SDPA + custom Metal SwiGLU ``` Benchmark shape: 512 prompt tokens, 128 generated tokens. Absolute numbers vary by chip and system load. ## Quality Notes This checkpoint has not been evaluated with a formal benchmark suite. In a small qualitative check, 4-bit MXFP4 matched BF16 on simple math and short reasoning prompts, including the derivative of `(x^2) / ln(x)`. HRM-Text-1B is a base reasoning model, not a polished chat assistant. It can produce incomplete or unstable answers on some prompts, especially when the prompt is underspecified or contradictory.