Aryagm
/

HRM-Text-1B-MLX-4bit

Text Generation

Model card Files Files and versions

HRM-Text-1B-MLX-4bit / README.md

Aryagm's picture

Use concise model card

054f232 verified 2 days ago

|

history blame contribute delete

2.48 kB

	---
	license: apache-2.0
	base_model: sapientinc/HRM-Text-1B
	base_model_relation: quantized
	library_name: mlx
	pipeline_tag: text-generation
	inference: false
	tags:
	- mlx
	- apple-silicon
	- text-generation
	- quantized
	- mxfp4
	- hrm
	- reasoning
	---

	# HRM-Text-1B MLX 4-bit

	This is a persisted 4-bit MXFP4 MLX checkpoint for
	[sapientinc/HRM-Text-1B](https://huggingface.co/sapientinc/HRM-Text-1B).
	It is intended for use with [HRM-mlx](https://github.com/Aryagm/HRM-mlx) on
	Apple Silicon.

	This is not a new finetune. It is a quantized inference checkpoint derived from
	the public HRM-Text-1B weights.

	The checkpoint keeps the full HRM recurrent inference loop:

	```text
	H_cycles * (L_cycles + 1) = 2 * (3 + 1) = 8 stack passes/token
	```

	## Files

	- `model.safetensors`: MLX-format 4-bit MXFP4 weights
	- `config.json`: HRM-Text config with MLX metadata
	- `quantization.json`: quantization metadata
	- `tokenizer.json`, `tokenizer_config.json`: tokenizer files copied from the base model

	## Usage

	```bash
	git clone https://github.com/Aryagm/HRM-mlx.git
	cd HRM-mlx
	python3 -m venv .venv
	source .venv/bin/activate
	pip install -e .
	```

	Download this checkpoint:

	```bash
	python - <<'PY'
	from huggingface_hub import snapshot_download

	snapshot_download(
	repo_id="Aryagm/HRM-Text-1B-MLX-4bit",
	local_dir="exports/hrm-text-1b-mlx-mxfp4",
	)
	PY
	```

	Generate:

	```bash
	hrm-mlx \
	--model-dir exports/hrm-text-1b-mlx-mxfp4 \
	--prompt '<\|im_start\|><\|quad_end\|><\|object_ref_end\|>What is the derivative of (x^2) / ln(x)? Give the final simplified expression.<\|im_end\|>' \
	--max-tokens 420 \
	--temperature 0.7 \
	--dtype bfloat16 \
	--metal-swiglu
	```

	Expected final expression:

	```text
	x(2 ln(x) - 1) / (ln(x))^2
	```

	## Benchmark

	On a MacBook Pro M4 Max, 32-core GPU, this checkpoint reaches about
	56 decode tokens/sec with HRM-mlx's fast path:

	```text
	MXFP4 weights + MLX fast RMSNorm/RoPE/SDPA + custom Metal SwiGLU
	```

	Benchmark shape: 512 prompt tokens, 128 generated tokens. Absolute numbers vary
	by chip and system load.

	## Quality Notes

	This checkpoint has not been evaluated with a formal benchmark suite. In a small
	qualitative check, 4-bit MXFP4 matched BF16 on simple math and short reasoning
	prompts, including the derivative of `(x^2) / ln(x)`.

	HRM-Text-1B is a base reasoning model, not a polished chat assistant. It can
	produce incomplete or unstable answers on some prompts, especially when the
	prompt is underspecified or contradictory.