File size: 6,373 Bytes
1c10575
 
 
 
 
 
 
 
 
 
 
050ba04
 
 
1c10575
 
 
 
 
 
 
 
 
 
 
 
050ba04
1c10575
050ba04
1c10575
 
050ba04
1c10575
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
050ba04
 
 
1c10575
 
050ba04
1c10575
 
 
050ba04
 
 
1c10575
 
 
 
 
 
 
 
050ba04
 
1c10575
 
050ba04
 
 
 
 
 
 
 
 
1c10575
 
 
 
 
 
 
 
 
 
 
 
 
050ba04
 
 
 
 
 
 
 
1c10575
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---
license: apache-2.0
language:
- en
library_name: gguf
pipeline_tag: text-generation
base_model: sapientinc/HRM-Text-1B
base_model_relation: quantized
tags:
- gguf
- bf16
- q8_0
- q6_k
- q5_k_m
- quantized
- llama.cpp
- hrm
- hierarchical-reasoning
- prefix-lm
- pre-alignment
- non-chat
- non-instruction-tuned
---

# HRM-Text-1B GGUF

This repository contains a BF16 GGUF conversion of [`sapientinc/HRM-Text-1B`](https://huggingface.co/sapientinc/HRM-Text-1B) and validated `Q8_0`, `Q6_K`, and `Q5_K_M` quantizations derived from that BF16 GGUF.

The GGUF files use:

- `general.architecture = hrm_text`
- BF16 source tensor storage or standard `llama.cpp` quantized tensor storage
- the original tokenizer from `tokenizer.json`
- no injected chat template

This is not a chat model and is not instruction tuned. "Useful output" for this repository means alignment with the original Transformers model on the same prompt, not chat-assistant behavior.

## Compatibility Notice

Standard upstream `llama.cpp`, Ollama, LM Studio, and `llama-cpp-python` are expected not to load this file until `hrm_text` is supported upstream.

Use the included patch:

```text
runtime/llama.cpp-hrm_text.patch
```

The patch was built against:

```text
ggml-org/llama.cpp commit 6a257d44633d4a752183ed778b88d2924d0a6b9d
```

Only the normal causal generation path is implemented in the patched runtime. Prefix-LM bidirectional `token_type_ids` are not supported by the `llama.cpp` path in this release.

## Files

| File | Description |
| --- | --- |
| `HRM-Text-1B-BF16.gguf` | BF16 GGUF conversion of `sapientinc/HRM-Text-1B` |
| `HRM-Text-1B-Q8_0.gguf` | Validated `Q8_0` quantization from BF16 |
| `HRM-Text-1B-Q6_K.gguf` | Validated `Q6_K` quantization from BF16 |
| `HRM-Text-1B-Q5_K_M.gguf` | Validated `Q5_K_M` quantization from BF16 |
| `runtime/llama.cpp-hrm_text.patch` | Patch adding `hrm_text` conversion and runtime support to the clean `llama.cpp` base commit |
| `reports/validation/final_report.md` | Human-readable conversion and validation report |
| `reports/validation/quantization_report.md` | Quantization report, hashes, and pass/fail summary |
| `reports/validation/baseline_transformers.json` | Transformers baseline prompts, logits, and continuations |
| `reports/validation/bf16_tensor_validation.json` | Tensor-level GGUF validation |
| `reports/validation/bf16_vs_hf.json` | Runtime logit and text validation |
| `reports/validation/q8_0_vs_bf16.json` | `Q8_0` vs BF16 runtime validation |
| `reports/validation/q6_k_vs_bf16.json` | `Q6_K` vs BF16 runtime validation |
| `reports/validation/q5_k_m_vs_bf16.json` | `Q5_K_M` vs BF16 runtime validation |

## Provenance

| Item | Value |
| --- | --- |
| Source model | `sapientinc/HRM-Text-1B` |
| Source snapshot SHA | `2285b999f6fb8a5b16e0cc313a9e8e4fe447140d` |
| Source `model.safetensors` SHA256 | `F8FE2B2BF6948414E8E8D6538659198726D98F967C55B533B7AABE8A1FA9A584` |
| BF16 GGUF SHA256 | `2DD5E2EF55E40C46DB0D0CB4CF1427A4E72DA34FEE36F0D2B73D081D0E1C2010` |
| BF16 GGUF size | `2,367,995,648` bytes |
| llama.cpp base commit | `6a257d44633d4a752183ed778b88d2924d0a6b9d` |

## Available GGUF Files

| Variant | File | Size (bytes) | SHA256 |
| --- | --- | ---: | --- |
| BF16 | `HRM-Text-1B-BF16.gguf` | `2367995648` | `2DD5E2EF55E40C46DB0D0CB4CF1427A4E72DA34FEE36F0D2B73D081D0E1C2010` |
| Q8_0 | `HRM-Text-1B-Q8_0.gguf` | `1259126560` | `C0729C267C3421E1F6DE0488AC5448E98EA30E56514DAF210596B70AC3F9786D` |
| Q6_K | `HRM-Text-1B-Q6_K.gguf` | `972668704` | `24D93CA4EF4A02CFE415E3EA56A78AD65198A165A4157B928004B58DBDA2D93C` |
| Q5_K_M | `HRM-Text-1B-Q5_K_M.gguf` | `851509024` | `F6CE71A076EC897174C555D810ED6E379767D52F9396D485B42E42BF8DB1D0B7` |

## Validation Summary

Validation was performed from a clean source snapshot and a clean `llama.cpp` base checkout.

| Check | Result |
| --- | --- |
| Tensor validation | Pass, `259/259` tensors found and compared |
| Tensor values | BF16 tensor bits match HF after expected BF16 conversion |
| Prompt token IDs | Match for all validation prompts |
| Next-token top-1 | Match on `4/4` prompts |
| Top-10 overlap | `10/10` for all prompts |
| Text validation | BF16 GGUF continuations are aligned with Transformers baseline |

Quantized variants were validated against the BF16 GGUF:

| Variant | Token IDs | Top-1 matches | Min top-10 overlap | New loop check | Result |
| --- | --- | ---: | ---: | --- | --- |
| Q8_0 | Pass | `4/4` | `9/10` | Pass | Pass |
| Q6_K | Pass | `4/4` | `9/10` | Pass | Pass |
| Q5_K_M | Pass | `4/4` | `9/10` | Pass | Pass |

Full-vocab mean absolute logit error:

| Prompt | MAE |
| --- | ---: |
| `The quick brown fox` | `0.0199148655` |
| `In a distant future, humanity` | `0.0051696529` |
| `Question: What is 2+2?\nAnswer:` | `0.0076530445` |
| `def fibonacci(n):` | `0.0045031775` |

The original model already repeats on some prompts. Repetition by itself is not treated as a conversion failure unless it is newly introduced by the GGUF runtime. The BF16 GGUF validation did not reproduce the unrelated garbage pattern seen in a previous broken conversion attempt.

## Example Runtime Setup

Download this repository:

```powershell
pip install -U huggingface_hub
hf download sinimiini/HRM-Text-1B-GGUF --local-dir HRM-Text-1B-GGUF
```

Patch and build `llama.cpp`:

```powershell
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
git checkout 6a257d44633d4a752183ed778b88d2924d0a6b9d
git apply ..\HRM-Text-1B-GGUF\runtime\llama.cpp-hrm_text.patch
cmake -B build -S . -DGGML_NATIVE=OFF
cmake --build build --config Release --target llama-cli llama-completion llama-results
```

Run a short causal-generation smoke test:

```powershell
.\build\bin\Release\llama-cli.exe -m ..\HRM-Text-1B-GGUF\HRM-Text-1B-BF16.gguf -p "The quick brown fox" -n 32 --temp 0 --no-conversation
```

Depending on the generator binary and `llama.cpp` build type, the executable may be under `build\bin\llama-cli.exe` instead of `build\bin\Release\llama-cli.exe`.

## Limitations

- `hrm_text` is a custom GGUF architecture in this conversion.
- Generic GGUF runners will not work until they implement the HRM runtime graph.
- Prefix-LM bidirectional attention with `token_type_ids` is not implemented in the patched `llama.cpp` path.

## License

The source model is released under the Apache 2.0 license. See [`LICENSE`](./LICENSE).