HRM-Text-1B GGUF

This repository contains a BF16 GGUF conversion of sapientinc/HRM-Text-1B and validated Q8_0, Q6_K, and Q5_K_M quantizations derived from that BF16 GGUF.

The GGUF files use:

  • general.architecture = hrm_text
  • BF16 source tensor storage or standard llama.cpp quantized tensor storage
  • the original tokenizer from tokenizer.json
  • no injected chat template

This is not a chat model and is not instruction tuned. "Useful output" for this repository means alignment with the original Transformers model on the same prompt, not chat-assistant behavior.

Compatibility Notice

Standard upstream llama.cpp, Ollama, LM Studio, and llama-cpp-python are expected not to load this file until hrm_text is supported upstream.

Use the included patch:

runtime/llama.cpp-hrm_text.patch

The patch was built against:

ggml-org/llama.cpp commit 6a257d44633d4a752183ed778b88d2924d0a6b9d

Only the normal causal generation path is implemented in the patched runtime. Prefix-LM bidirectional token_type_ids are not supported by the llama.cpp path in this release.

Files

File Description
HRM-Text-1B-BF16.gguf BF16 GGUF conversion of sapientinc/HRM-Text-1B
HRM-Text-1B-Q8_0.gguf Validated Q8_0 quantization from BF16
HRM-Text-1B-Q6_K.gguf Validated Q6_K quantization from BF16
HRM-Text-1B-Q5_K_M.gguf Validated Q5_K_M quantization from BF16
runtime/llama.cpp-hrm_text.patch Patch adding hrm_text conversion and runtime support to the clean llama.cpp base commit
reports/validation/final_report.md Human-readable conversion and validation report
reports/validation/quantization_report.md Quantization report, hashes, and pass/fail summary
reports/validation/baseline_transformers.json Transformers baseline prompts, logits, and continuations
reports/validation/bf16_tensor_validation.json Tensor-level GGUF validation
reports/validation/bf16_vs_hf.json Runtime logit and text validation
reports/validation/q8_0_vs_bf16.json Q8_0 vs BF16 runtime validation
reports/validation/q6_k_vs_bf16.json Q6_K vs BF16 runtime validation
reports/validation/q5_k_m_vs_bf16.json Q5_K_M vs BF16 runtime validation

Provenance

Item Value
Source model sapientinc/HRM-Text-1B
Source snapshot SHA 2285b999f6fb8a5b16e0cc313a9e8e4fe447140d
Source model.safetensors SHA256 F8FE2B2BF6948414E8E8D6538659198726D98F967C55B533B7AABE8A1FA9A584
BF16 GGUF SHA256 2DD5E2EF55E40C46DB0D0CB4CF1427A4E72DA34FEE36F0D2B73D081D0E1C2010
BF16 GGUF size 2,367,995,648 bytes
llama.cpp base commit 6a257d44633d4a752183ed778b88d2924d0a6b9d

Available GGUF Files

Variant File Size (bytes) SHA256
BF16 HRM-Text-1B-BF16.gguf 2367995648 2DD5E2EF55E40C46DB0D0CB4CF1427A4E72DA34FEE36F0D2B73D081D0E1C2010
Q8_0 HRM-Text-1B-Q8_0.gguf 1259126560 C0729C267C3421E1F6DE0488AC5448E98EA30E56514DAF210596B70AC3F9786D
Q6_K HRM-Text-1B-Q6_K.gguf 972668704 24D93CA4EF4A02CFE415E3EA56A78AD65198A165A4157B928004B58DBDA2D93C
Q5_K_M HRM-Text-1B-Q5_K_M.gguf 851509024 F6CE71A076EC897174C555D810ED6E379767D52F9396D485B42E42BF8DB1D0B7

Validation Summary

Validation was performed from a clean source snapshot and a clean llama.cpp base checkout.

Check Result
Tensor validation Pass, 259/259 tensors found and compared
Tensor values BF16 tensor bits match HF after expected BF16 conversion
Prompt token IDs Match for all validation prompts
Next-token top-1 Match on 4/4 prompts
Top-10 overlap 10/10 for all prompts
Text validation BF16 GGUF continuations are aligned with Transformers baseline

Quantized variants were validated against the BF16 GGUF:

Variant Token IDs Top-1 matches Min top-10 overlap New loop check Result
Q8_0 Pass 4/4 9/10 Pass Pass
Q6_K Pass 4/4 9/10 Pass Pass
Q5_K_M Pass 4/4 9/10 Pass Pass

Full-vocab mean absolute logit error:

Prompt MAE
The quick brown fox 0.0199148655
In a distant future, humanity 0.0051696529
Question: What is 2+2?\nAnswer: 0.0076530445
def fibonacci(n): 0.0045031775

The original model already repeats on some prompts. Repetition by itself is not treated as a conversion failure unless it is newly introduced by the GGUF runtime. The BF16 GGUF validation did not reproduce the unrelated garbage pattern seen in a previous broken conversion attempt.

Example Runtime Setup

Download this repository:

pip install -U huggingface_hub
hf download sinimiini/HRM-Text-1B-GGUF --local-dir HRM-Text-1B-GGUF

Patch and build llama.cpp:

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
git checkout 6a257d44633d4a752183ed778b88d2924d0a6b9d
git apply ..\HRM-Text-1B-GGUF\runtime\llama.cpp-hrm_text.patch
cmake -B build -S . -DGGML_NATIVE=OFF
cmake --build build --config Release --target llama-cli llama-completion llama-results

Run a short causal-generation smoke test:

.\build\bin\Release\llama-cli.exe -m ..\HRM-Text-1B-GGUF\HRM-Text-1B-BF16.gguf -p "The quick brown fox" -n 32 --temp 0 --no-conversation

Depending on the generator binary and llama.cpp build type, the executable may be under build\bin\llama-cli.exe instead of build\bin\Release\llama-cli.exe.

Limitations

  • hrm_text is a custom GGUF architecture in this conversion.
  • Generic GGUF runners will not work until they implement the HRM runtime graph.
  • Prefix-LM bidirectional attention with token_type_ids is not implemented in the patched llama.cpp path.

License

The source model is released under the Apache 2.0 license. See LICENSE.

Downloads last month
-
GGUF
Model size
1B params
Architecture
hrm_text
Hardware compatibility
Log In to add your hardware

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sinimiini/HRM-Text-1B-GGUF

Quantized
(4)
this model