sinimiini commited on
Commit
050ba04
·
verified ·
1 Parent(s): 1c10575

Update model card for validated quantizations

Browse files
Files changed (1) hide show
  1. README.md +37 -6
README.md CHANGED
@@ -9,6 +9,9 @@ base_model_relation: quantized
9
  tags:
10
  - gguf
11
  - bf16
 
 
 
12
  - quantized
13
  - llama.cpp
14
  - hrm
@@ -21,12 +24,12 @@ tags:
21
 
22
  # HRM-Text-1B GGUF
23
 
24
- This repository contains a BF16 GGUF conversion of [`sapientinc/HRM-Text-1B`](https://huggingface.co/sapientinc/HRM-Text-1B).
25
 
26
- The GGUF uses:
27
 
28
  - `general.architecture = hrm_text`
29
- - BF16 tensor storage
30
  - the original tokenizer from `tokenizer.json`
31
  - no injected chat template
32
 
@@ -55,11 +58,19 @@ Only the normal causal generation path is implemented in the patched runtime. Pr
55
  | File | Description |
56
  | --- | --- |
57
  | `HRM-Text-1B-BF16.gguf` | BF16 GGUF conversion of `sapientinc/HRM-Text-1B` |
 
 
 
58
  | `runtime/llama.cpp-hrm_text.patch` | Patch adding `hrm_text` conversion and runtime support to the clean `llama.cpp` base commit |
59
  | `reports/validation/final_report.md` | Human-readable conversion and validation report |
 
60
  | `reports/validation/baseline_transformers.json` | Transformers baseline prompts, logits, and continuations |
61
  | `reports/validation/bf16_tensor_validation.json` | Tensor-level GGUF validation |
62
  | `reports/validation/bf16_vs_hf.json` | Runtime logit and text validation |
 
 
 
 
63
 
64
  ## Provenance
65
 
@@ -68,10 +79,21 @@ Only the normal causal generation path is implemented in the patched runtime. Pr
68
  | Source model | `sapientinc/HRM-Text-1B` |
69
  | Source snapshot SHA | `2285b999f6fb8a5b16e0cc313a9e8e4fe447140d` |
70
  | Source `model.safetensors` SHA256 | `F8FE2B2BF6948414E8E8D6538659198726D98F967C55B533B7AABE8A1FA9A584` |
71
- | GGUF SHA256 | `2DD5E2EF55E40C46DB0D0CB4CF1427A4E72DA34FEE36F0D2B73D081D0E1C2010` |
72
- | GGUF size | `2,367,995,648` bytes |
73
  | llama.cpp base commit | `6a257d44633d4a752183ed778b88d2924d0a6b9d` |
74
 
 
 
 
 
 
 
 
 
 
 
 
75
  ## Validation Summary
76
 
77
  Validation was performed from a clean source snapshot and a clean `llama.cpp` base checkout.
@@ -85,6 +107,15 @@ Validation was performed from a clean source snapshot and a clean `llama.cpp` ba
85
  | Top-10 overlap | `10/10` for all prompts |
86
  | Text validation | BF16 GGUF continuations are aligned with Transformers baseline |
87
 
 
 
 
 
 
 
 
 
 
88
  Full-vocab mean absolute logit error:
89
 
90
  | Prompt | MAE |
@@ -129,7 +160,7 @@ Depending on the generator binary and `llama.cpp` build type, the executable may
129
  - `hrm_text` is a custom GGUF architecture in this conversion.
130
  - Generic GGUF runners will not work until they implement the HRM runtime graph.
131
  - Prefix-LM bidirectional attention with `token_type_ids` is not implemented in the patched `llama.cpp` path.
132
- - Q8_0 and other quantized variants are intentionally not included in this repository.
133
 
134
  ## License
135
 
 
9
  tags:
10
  - gguf
11
  - bf16
12
+ - q8_0
13
+ - q6_k
14
+ - q5_k_m
15
  - quantized
16
  - llama.cpp
17
  - hrm
 
24
 
25
  # HRM-Text-1B GGUF
26
 
27
+ This repository contains a BF16 GGUF conversion of [`sapientinc/HRM-Text-1B`](https://huggingface.co/sapientinc/HRM-Text-1B) and validated `Q8_0`, `Q6_K`, and `Q5_K_M` quantizations derived from that BF16 GGUF.
28
 
29
+ The GGUF files use:
30
 
31
  - `general.architecture = hrm_text`
32
+ - BF16 source tensor storage or standard `llama.cpp` quantized tensor storage
33
  - the original tokenizer from `tokenizer.json`
34
  - no injected chat template
35
 
 
58
  | File | Description |
59
  | --- | --- |
60
  | `HRM-Text-1B-BF16.gguf` | BF16 GGUF conversion of `sapientinc/HRM-Text-1B` |
61
+ | `HRM-Text-1B-Q8_0.gguf` | Validated `Q8_0` quantization from BF16 |
62
+ | `HRM-Text-1B-Q6_K.gguf` | Validated `Q6_K` quantization from BF16 |
63
+ | `HRM-Text-1B-Q5_K_M.gguf` | Validated `Q5_K_M` quantization from BF16 |
64
  | `runtime/llama.cpp-hrm_text.patch` | Patch adding `hrm_text` conversion and runtime support to the clean `llama.cpp` base commit |
65
  | `reports/validation/final_report.md` | Human-readable conversion and validation report |
66
+ | `reports/validation/quantization_report.md` | Quantization report, hashes, and pass/fail summary |
67
  | `reports/validation/baseline_transformers.json` | Transformers baseline prompts, logits, and continuations |
68
  | `reports/validation/bf16_tensor_validation.json` | Tensor-level GGUF validation |
69
  | `reports/validation/bf16_vs_hf.json` | Runtime logit and text validation |
70
+ | `reports/validation/q8_0_vs_bf16.json` | `Q8_0` vs BF16 runtime validation |
71
+ | `reports/validation/q6_k_vs_bf16.json` | `Q6_K` vs BF16 runtime validation |
72
+ | `reports/validation/q5_k_m_vs_bf16.json` | `Q5_K_M` vs BF16 runtime validation |
73
+ | `reports/validation/q4_k_m_vs_bf16.json` | Failed `Q4_K_M` validation report; the `Q4_K_M` GGUF is not uploaded |
74
 
75
  ## Provenance
76
 
 
79
  | Source model | `sapientinc/HRM-Text-1B` |
80
  | Source snapshot SHA | `2285b999f6fb8a5b16e0cc313a9e8e4fe447140d` |
81
  | Source `model.safetensors` SHA256 | `F8FE2B2BF6948414E8E8D6538659198726D98F967C55B533B7AABE8A1FA9A584` |
82
+ | BF16 GGUF SHA256 | `2DD5E2EF55E40C46DB0D0CB4CF1427A4E72DA34FEE36F0D2B73D081D0E1C2010` |
83
+ | BF16 GGUF size | `2,367,995,648` bytes |
84
  | llama.cpp base commit | `6a257d44633d4a752183ed778b88d2924d0a6b9d` |
85
 
86
+ ## Available GGUF Files
87
+
88
+ | Variant | File | Size (bytes) | SHA256 |
89
+ | --- | --- | ---: | --- |
90
+ | BF16 | `HRM-Text-1B-BF16.gguf` | `2367995648` | `2DD5E2EF55E40C46DB0D0CB4CF1427A4E72DA34FEE36F0D2B73D081D0E1C2010` |
91
+ | Q8_0 | `HRM-Text-1B-Q8_0.gguf` | `1259126560` | `C0729C267C3421E1F6DE0488AC5448E98EA30E56514DAF210596B70AC3F9786D` |
92
+ | Q6_K | `HRM-Text-1B-Q6_K.gguf` | `972668704` | `24D93CA4EF4A02CFE415E3EA56A78AD65198A165A4157B928004B58DBDA2D93C` |
93
+ | Q5_K_M | `HRM-Text-1B-Q5_K_M.gguf` | `851509024` | `F6CE71A076EC897174C555D810ED6E379767D52F9396D485B42E42BF8DB1D0B7` |
94
+
95
+ `Q4_K_M` was generated and tested locally but is not uploaded. It introduced a new single-token repetition loop for one validation prompt, so it failed the release gate.
96
+
97
  ## Validation Summary
98
 
99
  Validation was performed from a clean source snapshot and a clean `llama.cpp` base checkout.
 
107
  | Top-10 overlap | `10/10` for all prompts |
108
  | Text validation | BF16 GGUF continuations are aligned with Transformers baseline |
109
 
110
+ Quantized variants were validated against the BF16 GGUF:
111
+
112
+ | Variant | Token IDs | Top-1 matches | Min top-10 overlap | New loop check | Result |
113
+ | --- | --- | ---: | ---: | --- | --- |
114
+ | Q8_0 | Pass | `4/4` | `9/10` | Pass | Pass |
115
+ | Q6_K | Pass | `4/4` | `9/10` | Pass | Pass |
116
+ | Q5_K_M | Pass | `4/4` | `9/10` | Pass | Pass |
117
+ | Q4_K_M | Pass | `3/4` | `8/10` | Fail | Not uploaded |
118
+
119
  Full-vocab mean absolute logit error:
120
 
121
  | Prompt | MAE |
 
160
  - `hrm_text` is a custom GGUF architecture in this conversion.
161
  - Generic GGUF runners will not work until they implement the HRM runtime graph.
162
  - Prefix-LM bidirectional attention with `token_type_ids` is not implemented in the patched `llama.cpp` path.
163
+ - `Q4_K_M` is intentionally not included because strict validation found a new single-token repetition loop.
164
 
165
  ## License
166