sinimiini commited on
Commit
1c10575
·
verified ·
1 Parent(s): b1b4cfa

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ HRM-Text-1B-BF16.gguf filter=lfs diff=lfs merge=lfs -text
HRM-Text-1B-BF16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2dd5e2ef55e40c46db0d0cb4cf1427a4e72da34fee36f0d2b73d081d0e1c2010
3
+ size 2367995648
LICENSE ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
5
+
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7
+
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your work.
180
+
181
+ To apply the Apache License to your work, attach the following
182
+ boilerplate notice, with the fields enclosed by brackets "[]"
183
+ replaced with your own identifying information. (Don't include
184
+ the brackets!) The text should be enclosed in the appropriate
185
+ comment syntax for the file format. We also recommend that a
186
+ file or class name and description of purpose be included on the
187
+ same "printed page" as the copyright notice for easier
188
+ identification within third-party archives.
189
+
190
+ Copyright [yyyy] [name of copyright owner]
191
+
192
+ Licensed under the Apache License, Version 2.0 (the "License");
193
+ you may not use this file except in compliance with the License.
194
+ You may obtain a copy of the License at
195
+
196
+ http://www.apache.org/licenses/LICENSE-2.0
197
+
198
+ Unless required by applicable law or agreed to in writing, software
199
+ distributed under the License is distributed on an "AS IS" BASIS,
200
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201
+ See the License for the specific language governing permissions and
202
+ limitations under the License.
README.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: gguf
6
+ pipeline_tag: text-generation
7
+ base_model: sapientinc/HRM-Text-1B
8
+ base_model_relation: quantized
9
+ tags:
10
+ - gguf
11
+ - bf16
12
+ - quantized
13
+ - llama.cpp
14
+ - hrm
15
+ - hierarchical-reasoning
16
+ - prefix-lm
17
+ - pre-alignment
18
+ - non-chat
19
+ - non-instruction-tuned
20
+ ---
21
+
22
+ # HRM-Text-1B GGUF
23
+
24
+ This repository contains a BF16 GGUF conversion of [`sapientinc/HRM-Text-1B`](https://huggingface.co/sapientinc/HRM-Text-1B).
25
+
26
+ The GGUF uses:
27
+
28
+ - `general.architecture = hrm_text`
29
+ - BF16 tensor storage
30
+ - the original tokenizer from `tokenizer.json`
31
+ - no injected chat template
32
+
33
+ This is not a chat model and is not instruction tuned. "Useful output" for this repository means alignment with the original Transformers model on the same prompt, not chat-assistant behavior.
34
+
35
+ ## Compatibility Notice
36
+
37
+ Standard upstream `llama.cpp`, Ollama, LM Studio, and `llama-cpp-python` are expected not to load this file until `hrm_text` is supported upstream.
38
+
39
+ Use the included patch:
40
+
41
+ ```text
42
+ runtime/llama.cpp-hrm_text.patch
43
+ ```
44
+
45
+ The patch was built against:
46
+
47
+ ```text
48
+ ggml-org/llama.cpp commit 6a257d44633d4a752183ed778b88d2924d0a6b9d
49
+ ```
50
+
51
+ Only the normal causal generation path is implemented in the patched runtime. Prefix-LM bidirectional `token_type_ids` are not supported by the `llama.cpp` path in this release.
52
+
53
+ ## Files
54
+
55
+ | File | Description |
56
+ | --- | --- |
57
+ | `HRM-Text-1B-BF16.gguf` | BF16 GGUF conversion of `sapientinc/HRM-Text-1B` |
58
+ | `runtime/llama.cpp-hrm_text.patch` | Patch adding `hrm_text` conversion and runtime support to the clean `llama.cpp` base commit |
59
+ | `reports/validation/final_report.md` | Human-readable conversion and validation report |
60
+ | `reports/validation/baseline_transformers.json` | Transformers baseline prompts, logits, and continuations |
61
+ | `reports/validation/bf16_tensor_validation.json` | Tensor-level GGUF validation |
62
+ | `reports/validation/bf16_vs_hf.json` | Runtime logit and text validation |
63
+
64
+ ## Provenance
65
+
66
+ | Item | Value |
67
+ | --- | --- |
68
+ | Source model | `sapientinc/HRM-Text-1B` |
69
+ | Source snapshot SHA | `2285b999f6fb8a5b16e0cc313a9e8e4fe447140d` |
70
+ | Source `model.safetensors` SHA256 | `F8FE2B2BF6948414E8E8D6538659198726D98F967C55B533B7AABE8A1FA9A584` |
71
+ | GGUF SHA256 | `2DD5E2EF55E40C46DB0D0CB4CF1427A4E72DA34FEE36F0D2B73D081D0E1C2010` |
72
+ | GGUF size | `2,367,995,648` bytes |
73
+ | llama.cpp base commit | `6a257d44633d4a752183ed778b88d2924d0a6b9d` |
74
+
75
+ ## Validation Summary
76
+
77
+ Validation was performed from a clean source snapshot and a clean `llama.cpp` base checkout.
78
+
79
+ | Check | Result |
80
+ | --- | --- |
81
+ | Tensor validation | Pass, `259/259` tensors found and compared |
82
+ | Tensor values | BF16 tensor bits match HF after expected BF16 conversion |
83
+ | Prompt token IDs | Match for all validation prompts |
84
+ | Next-token top-1 | Match on `4/4` prompts |
85
+ | Top-10 overlap | `10/10` for all prompts |
86
+ | Text validation | BF16 GGUF continuations are aligned with Transformers baseline |
87
+
88
+ Full-vocab mean absolute logit error:
89
+
90
+ | Prompt | MAE |
91
+ | --- | ---: |
92
+ | `The quick brown fox` | `0.0199148655` |
93
+ | `In a distant future, humanity` | `0.0051696529` |
94
+ | `Question: What is 2+2?\nAnswer:` | `0.0076530445` |
95
+ | `def fibonacci(n):` | `0.0045031775` |
96
+
97
+ The original model already repeats on some prompts. Repetition by itself is not treated as a conversion failure unless it is newly introduced by the GGUF runtime. The BF16 GGUF validation did not reproduce the unrelated garbage pattern seen in a previous broken conversion attempt.
98
+
99
+ ## Example Runtime Setup
100
+
101
+ Download this repository:
102
+
103
+ ```powershell
104
+ pip install -U huggingface_hub
105
+ hf download sinimiini/HRM-Text-1B-GGUF --local-dir HRM-Text-1B-GGUF
106
+ ```
107
+
108
+ Patch and build `llama.cpp`:
109
+
110
+ ```powershell
111
+ git clone https://github.com/ggml-org/llama.cpp
112
+ cd llama.cpp
113
+ git checkout 6a257d44633d4a752183ed778b88d2924d0a6b9d
114
+ git apply ..\HRM-Text-1B-GGUF\runtime\llama.cpp-hrm_text.patch
115
+ cmake -B build -S . -DGGML_NATIVE=OFF
116
+ cmake --build build --config Release --target llama-cli llama-completion llama-results
117
+ ```
118
+
119
+ Run a short causal-generation smoke test:
120
+
121
+ ```powershell
122
+ .\build\bin\Release\llama-cli.exe -m ..\HRM-Text-1B-GGUF\HRM-Text-1B-BF16.gguf -p "The quick brown fox" -n 32 --temp 0 --no-conversation
123
+ ```
124
+
125
+ Depending on the generator binary and `llama.cpp` build type, the executable may be under `build\bin\llama-cli.exe` instead of `build\bin\Release\llama-cli.exe`.
126
+
127
+ ## Limitations
128
+
129
+ - `hrm_text` is a custom GGUF architecture in this conversion.
130
+ - Generic GGUF runners will not work until they implement the HRM runtime graph.
131
+ - Prefix-LM bidirectional attention with `token_type_ids` is not implemented in the patched `llama.cpp` path.
132
+ - Q8_0 and other quantized variants are intentionally not included in this repository.
133
+
134
+ ## License
135
+
136
+ The source model is released under the Apache 2.0 license. See [`LICENSE`](./LICENSE).
reports/validation/baseline_transformers.json ADDED
@@ -0,0 +1,521 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_dir": "D:\\Projects\\Active\\HRM-Text\\hrm-text-gguf\\fresh\\models\\hf\\HRM-Text-1B",
3
+ "created_at_unix": 1779317536.288255,
4
+ "elapsed_seconds": 110.19880509376526,
5
+ "torch_dtype": "float32",
6
+ "prompts": [
7
+ {
8
+ "prompt": "The quick brown fox",
9
+ "input_ids": [
10
+ 341,
11
+ 3568,
12
+ 9054,
13
+ 20102
14
+ ],
15
+ "next_top20": [
16
+ {
17
+ "token_id": 20102,
18
+ "logit": 14.282323837280273
19
+ },
20
+ {
21
+ "token_id": 40036,
22
+ "logit": 7.984903335571289
23
+ },
24
+ {
25
+ "token_id": 11979,
26
+ "logit": 6.678519248962402
27
+ },
28
+ {
29
+ "token_id": 3568,
30
+ "logit": 6.593014717102051
31
+ },
32
+ {
33
+ "token_id": 4349,
34
+ "logit": 6.3277907371521
35
+ },
36
+ {
37
+ "token_id": 42,
38
+ "logit": 6.296438694000244
39
+ },
40
+ {
41
+ "token_id": 3152,
42
+ "logit": 5.423942565917969
43
+ },
44
+ {
45
+ "token_id": 2301,
46
+ "logit": 5.353857517242432
47
+ },
48
+ {
49
+ "token_id": 44,
50
+ "logit": 5.17809534072876
51
+ },
52
+ {
53
+ "token_id": 3558,
54
+ "logit": 4.873218059539795
55
+ },
56
+ {
57
+ "token_id": 3412,
58
+ "logit": 4.821749210357666
59
+ },
60
+ {
61
+ "token_id": 5002,
62
+ "logit": 4.811767578125
63
+ },
64
+ {
65
+ "token_id": 761,
66
+ "logit": 4.7139716148376465
67
+ },
68
+ {
69
+ "token_id": 19444,
70
+ "logit": 4.702877521514893
71
+ },
72
+ {
73
+ "token_id": 446,
74
+ "logit": 4.699982166290283
75
+ },
76
+ {
77
+ "token_id": 16001,
78
+ "logit": 4.579901218414307
79
+ },
80
+ {
81
+ "token_id": 322,
82
+ "logit": 4.416025638580322
83
+ },
84
+ {
85
+ "token_id": 3297,
86
+ "logit": 4.394478797912598
87
+ },
88
+ {
89
+ "token_id": 8636,
90
+ "logit": 4.318077564239502
91
+ },
92
+ {
93
+ "token_id": 12672,
94
+ "logit": 4.272546768188477
95
+ }
96
+ ],
97
+ "generated_ids": [
98
+ 341,
99
+ 3568,
100
+ 9054,
101
+ 20102,
102
+ 20102,
103
+ 20102,
104
+ 20102,
105
+ 20102,
106
+ 20102,
107
+ 20102,
108
+ 20102,
109
+ 20102,
110
+ 20102,
111
+ 20102,
112
+ 20102,
113
+ 20102,
114
+ 20102,
115
+ 20102,
116
+ 20102,
117
+ 20102,
118
+ 20102,
119
+ 20102,
120
+ 20102,
121
+ 20102,
122
+ 20102,
123
+ 20102,
124
+ 20102,
125
+ 20102,
126
+ 20102,
127
+ 20102,
128
+ 20102,
129
+ 20102,
130
+ 20102,
131
+ 20102,
132
+ 20102,
133
+ 20102
134
+ ],
135
+ "generated_text": "The quick brown fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox"
136
+ },
137
+ {
138
+ "prompt": "In a distant future, humanity",
139
+ "input_ids": [
140
+ 931,
141
+ 236,
142
+ 95,
143
+ 9494,
144
+ 2025,
145
+ 42,
146
+ 13279
147
+ ],
148
+ "next_top20": [
149
+ {
150
+ "token_id": 385,
151
+ "logit": 8.0965576171875
152
+ },
153
+ {
154
+ "token_id": 42,
155
+ "logit": 7.368254661560059
156
+ },
157
+ {
158
+ "token_id": 322,
159
+ "logit": 7.224921226501465
160
+ },
161
+ {
162
+ "token_id": 5106,
163
+ "logit": 6.963152885437012
164
+ },
165
+ {
166
+ "token_id": 5835,
167
+ "logit": 6.56166934967041
168
+ },
169
+ {
170
+ "token_id": 769,
171
+ "logit": 6.253977298736572
172
+ },
173
+ {
174
+ "token_id": 599,
175
+ "logit": 6.24298620223999
176
+ },
177
+ {
178
+ "token_id": 1980,
179
+ "logit": 6.066655158996582
180
+ },
181
+ {
182
+ "token_id": 295,
183
+ "logit": 6.027203559875488
184
+ },
185
+ {
186
+ "token_id": 2156,
187
+ "logit": 5.429737091064453
188
+ },
189
+ {
190
+ "token_id": 379,
191
+ "logit": 5.259855270385742
192
+ },
193
+ {
194
+ "token_id": 5226,
195
+ "logit": 5.044851303100586
196
+ },
197
+ {
198
+ "token_id": 10755,
199
+ "logit": 4.90900993347168
200
+ },
201
+ {
202
+ "token_id": 446,
203
+ "logit": 4.904206275939941
204
+ },
205
+ {
206
+ "token_id": 236,
207
+ "logit": 4.781126976013184
208
+ },
209
+ {
210
+ "token_id": 2026,
211
+ "logit": 4.646615982055664
212
+ },
213
+ {
214
+ "token_id": 307,
215
+ "logit": 4.629412651062012
216
+ },
217
+ {
218
+ "token_id": 1287,
219
+ "logit": 4.488158702850342
220
+ },
221
+ {
222
+ "token_id": 3891,
223
+ "logit": 4.160832405090332
224
+ },
225
+ {
226
+ "token_id": 5671,
227
+ "logit": 4.064340114593506
228
+ }
229
+ ],
230
+ "generated_ids": [
231
+ 931,
232
+ 236,
233
+ 95,
234
+ 9494,
235
+ 2025,
236
+ 42,
237
+ 13279,
238
+ 385,
239
+ 9494,
240
+ 2025,
241
+ 42,
242
+ 13279,
243
+ 385,
244
+ 9494,
245
+ 2025,
246
+ 42,
247
+ 13279,
248
+ 385,
249
+ 9494,
250
+ 2025,
251
+ 42,
252
+ 13279,
253
+ 385,
254
+ 9494,
255
+ 2025,
256
+ 42,
257
+ 13279,
258
+ 385,
259
+ 9494,
260
+ 2025,
261
+ 42,
262
+ 13279,
263
+ 385,
264
+ 9494,
265
+ 2025,
266
+ 42,
267
+ 13279,
268
+ 385,
269
+ 9494
270
+ ],
271
+ "generated_text": "In a distant future, humanity's distant future, humanity's distant future, humanity's distant future, humanity's distant future, humanity's distant future, humanity's distant future, humanity's distant"
272
+ },
273
+ {
274
+ "prompt": "Question: What is 2+2?\nAnswer:",
275
+ "input_ids": [
276
+ 1069,
277
+ 56,
278
+ 866,
279
+ 322,
280
+ 236,
281
+ 48,
282
+ 41,
283
+ 48,
284
+ 1047,
285
+ 1180,
286
+ 56
287
+ ],
288
+ "next_top20": [
289
+ {
290
+ "token_id": 236,
291
+ "logit": 9.961008071899414
292
+ },
293
+ {
294
+ "token_id": 391,
295
+ "logit": 6.013479709625244
296
+ },
297
+ {
298
+ "token_id": 315,
299
+ "logit": 5.94870138168335
300
+ },
301
+ {
302
+ "token_id": 866,
303
+ "logit": 5.673839569091797
304
+ },
305
+ {
306
+ "token_id": 395,
307
+ "logit": 5.604125499725342
308
+ },
309
+ {
310
+ "token_id": 5444,
311
+ "logit": 5.225252151489258
312
+ },
313
+ {
314
+ "token_id": 4746,
315
+ "logit": 4.788036346435547
316
+ },
317
+ {
318
+ "token_id": 1827,
319
+ "logit": 4.560125827789307
320
+ },
321
+ {
322
+ "token_id": 369,
323
+ "logit": 3.9722537994384766
324
+ },
325
+ {
326
+ "token_id": 970,
327
+ "logit": 3.7374000549316406
328
+ },
329
+ {
330
+ "token_id": 725,
331
+ "logit": 3.7227249145507812
332
+ },
333
+ {
334
+ "token_id": 421,
335
+ "logit": 3.6337995529174805
336
+ },
337
+ {
338
+ "token_id": 731,
339
+ "logit": 3.230243444442749
340
+ },
341
+ {
342
+ "token_id": 401,
343
+ "logit": 3.176685094833374
344
+ },
345
+ {
346
+ "token_id": 2286,
347
+ "logit": 3.169259786605835
348
+ },
349
+ {
350
+ "token_id": 1422,
351
+ "logit": 3.161581516265869
352
+ },
353
+ {
354
+ "token_id": 10528,
355
+ "logit": 3.150108575820923
356
+ },
357
+ {
358
+ "token_id": 1544,
359
+ "logit": 3.148630380630493
360
+ },
361
+ {
362
+ "token_id": 973,
363
+ "logit": 3.134061336517334
364
+ },
365
+ {
366
+ "token_id": 4286,
367
+ "logit": 3.118112325668335
368
+ }
369
+ ],
370
+ "generated_ids": [
371
+ 1069,
372
+ 56,
373
+ 866,
374
+ 322,
375
+ 236,
376
+ 48,
377
+ 41,
378
+ 48,
379
+ 1047,
380
+ 1180,
381
+ 56,
382
+ 236,
383
+ 52,
384
+ 11
385
+ ],
386
+ "generated_text": "Question: What is 2+2?\nAnswer: 6<|box_end|>"
387
+ },
388
+ {
389
+ "prompt": "def fibonacci(n):",
390
+ "input_ids": [
391
+ 26763,
392
+ 8430,
393
+ 21728,
394
+ 3145,
395
+ 10018
396
+ ],
397
+ "next_top20": [
398
+ {
399
+ "token_id": 56,
400
+ "logit": 10.336507797241211
401
+ },
402
+ {
403
+ "token_id": 2904,
404
+ "logit": 7.036001205444336
405
+ },
406
+ {
407
+ "token_id": 11,
408
+ "logit": 6.869954586029053
409
+ },
410
+ {
411
+ "token_id": 10018,
412
+ "logit": 6.520134449005127
413
+ },
414
+ {
415
+ "token_id": 108,
416
+ "logit": 6.356081008911133
417
+ },
418
+ {
419
+ "token_id": 318,
420
+ "logit": 6.157332420349121
421
+ },
422
+ {
423
+ "token_id": 236,
424
+ "logit": 6.079721927642822
425
+ },
426
+ {
427
+ "token_id": 38,
428
+ "logit": 5.885505676269531
429
+ },
430
+ {
431
+ "token_id": 68,
432
+ "logit": 5.052194118499756
433
+ },
434
+ {
435
+ "token_id": 39,
436
+ "logit": 4.714491844177246
437
+ },
438
+ {
439
+ "token_id": 76,
440
+ "logit": 4.675869464874268
441
+ },
442
+ {
443
+ "token_id": 57,
444
+ "logit": 4.6666717529296875
445
+ },
446
+ {
447
+ "token_id": 59,
448
+ "logit": 4.655692100524902
449
+ },
450
+ {
451
+ "token_id": 1941,
452
+ "logit": 4.59184455871582
453
+ },
454
+ {
455
+ "token_id": 401,
456
+ "logit": 4.578482151031494
457
+ },
458
+ {
459
+ "token_id": 395,
460
+ "logit": 4.54650354385376
461
+ },
462
+ {
463
+ "token_id": 8430,
464
+ "logit": 4.537847518920898
465
+ },
466
+ {
467
+ "token_id": 95,
468
+ "logit": 4.34259033203125
469
+ },
470
+ {
471
+ "token_id": 364,
472
+ "logit": 4.28465461730957
473
+ },
474
+ {
475
+ "token_id": 3145,
476
+ "logit": 4.255484580993652
477
+ }
478
+ ],
479
+ "generated_ids": [
480
+ 26763,
481
+ 8430,
482
+ 21728,
483
+ 3145,
484
+ 10018,
485
+ 56,
486
+ 56,
487
+ 56,
488
+ 56,
489
+ 56,
490
+ 56,
491
+ 56,
492
+ 56,
493
+ 56,
494
+ 56,
495
+ 56,
496
+ 56,
497
+ 56,
498
+ 56,
499
+ 56,
500
+ 56,
501
+ 56,
502
+ 56,
503
+ 56,
504
+ 56,
505
+ 56,
506
+ 56,
507
+ 56,
508
+ 56,
509
+ 56,
510
+ 56,
511
+ 56,
512
+ 56,
513
+ 56,
514
+ 56,
515
+ 56,
516
+ 56
517
+ ],
518
+ "generated_text": "def fibonacci(n):::::::::::::::::::::::::::::::::"
519
+ }
520
+ ]
521
+ }
reports/validation/bf16_tensor_validation.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "gguf": "fresh\\out\\gguf\\HRM-Text-1B-BF16.gguf",
3
+ "hf_dir": "fresh\\models\\hf\\HRM-Text-1B",
4
+ "metadata": {
5
+ "general.architecture": "hrm_text",
6
+ "hrm_text.block_count": 128,
7
+ "hrm_text.layers_per_stack": 16,
8
+ "hrm_text.h_cycles": 2,
9
+ "hrm_text.l_cycles": 3,
10
+ "hrm_text.prefix_lm": true,
11
+ "hrm_text.embedding_scale": 39.191837310791016
12
+ },
13
+ "expected_tensor_count": 259,
14
+ "actual_tensor_count": 259,
15
+ "compared_tensor_count": 259,
16
+ "passed": true,
17
+ "failures": []
18
+ }
reports/validation/bf16_vs_hf.json ADDED
@@ -0,0 +1,782 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "hf_dir": "fresh\\models\\hf\\HRM-Text-1B",
3
+ "gguf": "fresh\\out\\gguf\\HRM-Text-1B-BF16.gguf",
4
+ "created_at_unix": 1779321040.6954496,
5
+ "elapsed_seconds": 128.6622235774994,
6
+ "hf_dtype": "float32",
7
+ "checks": {
8
+ "prompt_tokens_match": true,
9
+ "same_next_top1_all_prompts": true,
10
+ "top10_overlap_at_least_8_all_prompts": true,
11
+ "full_vocab_mean_abs_logit_error_lte_0_03_all_prompts": true,
12
+ "passed": true
13
+ },
14
+ "prompts": [
15
+ {
16
+ "prompt": "The quick brown fox",
17
+ "hf_input_ids": [
18
+ 341,
19
+ 3568,
20
+ 9054,
21
+ 20102
22
+ ],
23
+ "gguf_input_ids": [
24
+ 341,
25
+ 3568,
26
+ 9054,
27
+ 20102
28
+ ],
29
+ "tokens_match": true,
30
+ "hf_next_top20": [
31
+ {
32
+ "token_id": 20102,
33
+ "logit": 14.282323837280273
34
+ },
35
+ {
36
+ "token_id": 40036,
37
+ "logit": 7.984903335571289
38
+ },
39
+ {
40
+ "token_id": 11979,
41
+ "logit": 6.678519248962402
42
+ },
43
+ {
44
+ "token_id": 3568,
45
+ "logit": 6.593014717102051
46
+ },
47
+ {
48
+ "token_id": 4349,
49
+ "logit": 6.3277907371521
50
+ },
51
+ {
52
+ "token_id": 42,
53
+ "logit": 6.296438694000244
54
+ },
55
+ {
56
+ "token_id": 3152,
57
+ "logit": 5.423942565917969
58
+ },
59
+ {
60
+ "token_id": 2301,
61
+ "logit": 5.353857517242432
62
+ },
63
+ {
64
+ "token_id": 44,
65
+ "logit": 5.17809534072876
66
+ },
67
+ {
68
+ "token_id": 3558,
69
+ "logit": 4.873218059539795
70
+ },
71
+ {
72
+ "token_id": 3412,
73
+ "logit": 4.821749210357666
74
+ },
75
+ {
76
+ "token_id": 5002,
77
+ "logit": 4.811767578125
78
+ },
79
+ {
80
+ "token_id": 761,
81
+ "logit": 4.7139716148376465
82
+ },
83
+ {
84
+ "token_id": 19444,
85
+ "logit": 4.702877521514893
86
+ },
87
+ {
88
+ "token_id": 446,
89
+ "logit": 4.699982166290283
90
+ },
91
+ {
92
+ "token_id": 16001,
93
+ "logit": 4.579901218414307
94
+ },
95
+ {
96
+ "token_id": 322,
97
+ "logit": 4.416025638580322
98
+ },
99
+ {
100
+ "token_id": 3297,
101
+ "logit": 4.394478797912598
102
+ },
103
+ {
104
+ "token_id": 8636,
105
+ "logit": 4.318077564239502
106
+ },
107
+ {
108
+ "token_id": 12672,
109
+ "logit": 4.272546768188477
110
+ }
111
+ ],
112
+ "gguf_next_top20": [
113
+ {
114
+ "token_id": 20102,
115
+ "logit": 14.297157287597656
116
+ },
117
+ {
118
+ "token_id": 40036,
119
+ "logit": 7.994879722595215
120
+ },
121
+ {
122
+ "token_id": 11979,
123
+ "logit": 6.699657440185547
124
+ },
125
+ {
126
+ "token_id": 3568,
127
+ "logit": 6.626206398010254
128
+ },
129
+ {
130
+ "token_id": 4349,
131
+ "logit": 6.35108757019043
132
+ },
133
+ {
134
+ "token_id": 42,
135
+ "logit": 6.271071434020996
136
+ },
137
+ {
138
+ "token_id": 3152,
139
+ "logit": 5.424627304077148
140
+ },
141
+ {
142
+ "token_id": 2301,
143
+ "logit": 5.3609843254089355
144
+ },
145
+ {
146
+ "token_id": 44,
147
+ "logit": 5.178180694580078
148
+ },
149
+ {
150
+ "token_id": 3558,
151
+ "logit": 4.875790596008301
152
+ },
153
+ {
154
+ "token_id": 5002,
155
+ "logit": 4.839702606201172
156
+ },
157
+ {
158
+ "token_id": 3412,
159
+ "logit": 4.824095249176025
160
+ },
161
+ {
162
+ "token_id": 761,
163
+ "logit": 4.718862056732178
164
+ },
165
+ {
166
+ "token_id": 19444,
167
+ "logit": 4.698058128356934
168
+ },
169
+ {
170
+ "token_id": 446,
171
+ "logit": 4.689098358154297
172
+ },
173
+ {
174
+ "token_id": 16001,
175
+ "logit": 4.590203762054443
176
+ },
177
+ {
178
+ "token_id": 3297,
179
+ "logit": 4.408023357391357
180
+ },
181
+ {
182
+ "token_id": 322,
183
+ "logit": 4.394444942474365
184
+ },
185
+ {
186
+ "token_id": 8636,
187
+ "logit": 4.318354606628418
188
+ },
189
+ {
190
+ "token_id": 12672,
191
+ "logit": 4.2834038734436035
192
+ }
193
+ ],
194
+ "top1_match": true,
195
+ "top10_overlap": 10,
196
+ "full_vocab_mean_abs_logit_error": 0.019914865493774414,
197
+ "top20_union_mean_abs_logit_error": 0.012285495176911354,
198
+ "hf_generated_suffix": " fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox",
199
+ "gguf_generated_suffix": " fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox fox\n\n"
200
+ },
201
+ {
202
+ "prompt": "In a distant future, humanity",
203
+ "hf_input_ids": [
204
+ 931,
205
+ 236,
206
+ 95,
207
+ 9494,
208
+ 2025,
209
+ 42,
210
+ 13279
211
+ ],
212
+ "gguf_input_ids": [
213
+ 931,
214
+ 236,
215
+ 95,
216
+ 9494,
217
+ 2025,
218
+ 42,
219
+ 13279
220
+ ],
221
+ "tokens_match": true,
222
+ "hf_next_top20": [
223
+ {
224
+ "token_id": 385,
225
+ "logit": 8.0965576171875
226
+ },
227
+ {
228
+ "token_id": 42,
229
+ "logit": 7.368254661560059
230
+ },
231
+ {
232
+ "token_id": 322,
233
+ "logit": 7.224921226501465
234
+ },
235
+ {
236
+ "token_id": 5106,
237
+ "logit": 6.963152885437012
238
+ },
239
+ {
240
+ "token_id": 5835,
241
+ "logit": 6.56166934967041
242
+ },
243
+ {
244
+ "token_id": 769,
245
+ "logit": 6.253977298736572
246
+ },
247
+ {
248
+ "token_id": 599,
249
+ "logit": 6.24298620223999
250
+ },
251
+ {
252
+ "token_id": 1980,
253
+ "logit": 6.066655158996582
254
+ },
255
+ {
256
+ "token_id": 295,
257
+ "logit": 6.027203559875488
258
+ },
259
+ {
260
+ "token_id": 2156,
261
+ "logit": 5.429737091064453
262
+ },
263
+ {
264
+ "token_id": 379,
265
+ "logit": 5.259855270385742
266
+ },
267
+ {
268
+ "token_id": 5226,
269
+ "logit": 5.044851303100586
270
+ },
271
+ {
272
+ "token_id": 10755,
273
+ "logit": 4.90900993347168
274
+ },
275
+ {
276
+ "token_id": 446,
277
+ "logit": 4.904206275939941
278
+ },
279
+ {
280
+ "token_id": 236,
281
+ "logit": 4.781126976013184
282
+ },
283
+ {
284
+ "token_id": 2026,
285
+ "logit": 4.646615982055664
286
+ },
287
+ {
288
+ "token_id": 307,
289
+ "logit": 4.629412651062012
290
+ },
291
+ {
292
+ "token_id": 1287,
293
+ "logit": 4.488158702850342
294
+ },
295
+ {
296
+ "token_id": 3891,
297
+ "logit": 4.160832405090332
298
+ },
299
+ {
300
+ "token_id": 5671,
301
+ "logit": 4.064340114593506
302
+ }
303
+ ],
304
+ "gguf_next_top20": [
305
+ {
306
+ "token_id": 385,
307
+ "logit": 8.094616889953613
308
+ },
309
+ {
310
+ "token_id": 42,
311
+ "logit": 7.381193161010742
312
+ },
313
+ {
314
+ "token_id": 322,
315
+ "logit": 7.22892951965332
316
+ },
317
+ {
318
+ "token_id": 5106,
319
+ "logit": 6.968372344970703
320
+ },
321
+ {
322
+ "token_id": 5835,
323
+ "logit": 6.5758562088012695
324
+ },
325
+ {
326
+ "token_id": 769,
327
+ "logit": 6.294882297515869
328
+ },
329
+ {
330
+ "token_id": 599,
331
+ "logit": 6.259007453918457
332
+ },
333
+ {
334
+ "token_id": 1980,
335
+ "logit": 6.067761421203613
336
+ },
337
+ {
338
+ "token_id": 295,
339
+ "logit": 6.029950141906738
340
+ },
341
+ {
342
+ "token_id": 2156,
343
+ "logit": 5.428796768188477
344
+ },
345
+ {
346
+ "token_id": 379,
347
+ "logit": 5.263803005218506
348
+ },
349
+ {
350
+ "token_id": 5226,
351
+ "logit": 5.066612243652344
352
+ },
353
+ {
354
+ "token_id": 446,
355
+ "logit": 4.910100936889648
356
+ },
357
+ {
358
+ "token_id": 10755,
359
+ "logit": 4.904460430145264
360
+ },
361
+ {
362
+ "token_id": 236,
363
+ "logit": 4.739671230316162
364
+ },
365
+ {
366
+ "token_id": 2026,
367
+ "logit": 4.661045074462891
368
+ },
369
+ {
370
+ "token_id": 307,
371
+ "logit": 4.627893447875977
372
+ },
373
+ {
374
+ "token_id": 1287,
375
+ "logit": 4.507448196411133
376
+ },
377
+ {
378
+ "token_id": 3891,
379
+ "logit": 4.162245273590088
380
+ },
381
+ {
382
+ "token_id": 5671,
383
+ "logit": 4.059610366821289
384
+ }
385
+ ],
386
+ "top1_match": true,
387
+ "top10_overlap": 10,
388
+ "full_vocab_mean_abs_logit_error": 0.005169652868062258,
389
+ "top20_union_mean_abs_logit_error": 0.010950112715363503,
390
+ "hf_generated_suffix": "'s distant future, humanity's distant future, humanity's distant future, humanity's distant future, humanity's distant future, humanity's distant future, humanity's distant",
391
+ "gguf_generated_suffix": "'s distant future, humanity's distant future, humanity's distant future, humanity's distant future, humanity's distant future, humanity's distant future, humanity's distant\n\n"
392
+ },
393
+ {
394
+ "prompt": "Question: What is 2+2?\nAnswer:",
395
+ "hf_input_ids": [
396
+ 1069,
397
+ 56,
398
+ 866,
399
+ 322,
400
+ 236,
401
+ 48,
402
+ 41,
403
+ 48,
404
+ 1047,
405
+ 1180,
406
+ 56
407
+ ],
408
+ "gguf_input_ids": [
409
+ 1069,
410
+ 56,
411
+ 866,
412
+ 322,
413
+ 236,
414
+ 48,
415
+ 41,
416
+ 48,
417
+ 1047,
418
+ 1180,
419
+ 56
420
+ ],
421
+ "tokens_match": true,
422
+ "hf_next_top20": [
423
+ {
424
+ "token_id": 236,
425
+ "logit": 9.961008071899414
426
+ },
427
+ {
428
+ "token_id": 391,
429
+ "logit": 6.013479709625244
430
+ },
431
+ {
432
+ "token_id": 315,
433
+ "logit": 5.94870138168335
434
+ },
435
+ {
436
+ "token_id": 866,
437
+ "logit": 5.673839569091797
438
+ },
439
+ {
440
+ "token_id": 395,
441
+ "logit": 5.604125499725342
442
+ },
443
+ {
444
+ "token_id": 5444,
445
+ "logit": 5.225252151489258
446
+ },
447
+ {
448
+ "token_id": 4746,
449
+ "logit": 4.788036346435547
450
+ },
451
+ {
452
+ "token_id": 1827,
453
+ "logit": 4.560125827789307
454
+ },
455
+ {
456
+ "token_id": 369,
457
+ "logit": 3.9722537994384766
458
+ },
459
+ {
460
+ "token_id": 970,
461
+ "logit": 3.7374000549316406
462
+ },
463
+ {
464
+ "token_id": 725,
465
+ "logit": 3.7227249145507812
466
+ },
467
+ {
468
+ "token_id": 421,
469
+ "logit": 3.6337995529174805
470
+ },
471
+ {
472
+ "token_id": 731,
473
+ "logit": 3.230243444442749
474
+ },
475
+ {
476
+ "token_id": 401,
477
+ "logit": 3.176685094833374
478
+ },
479
+ {
480
+ "token_id": 2286,
481
+ "logit": 3.169259786605835
482
+ },
483
+ {
484
+ "token_id": 1422,
485
+ "logit": 3.161581516265869
486
+ },
487
+ {
488
+ "token_id": 10528,
489
+ "logit": 3.150108575820923
490
+ },
491
+ {
492
+ "token_id": 1544,
493
+ "logit": 3.148630380630493
494
+ },
495
+ {
496
+ "token_id": 973,
497
+ "logit": 3.134061336517334
498
+ },
499
+ {
500
+ "token_id": 4286,
501
+ "logit": 3.118112325668335
502
+ }
503
+ ],
504
+ "gguf_next_top20": [
505
+ {
506
+ "token_id": 236,
507
+ "logit": 9.97547435760498
508
+ },
509
+ {
510
+ "token_id": 391,
511
+ "logit": 6.012524604797363
512
+ },
513
+ {
514
+ "token_id": 315,
515
+ "logit": 5.968222618103027
516
+ },
517
+ {
518
+ "token_id": 866,
519
+ "logit": 5.672043323516846
520
+ },
521
+ {
522
+ "token_id": 395,
523
+ "logit": 5.606063365936279
524
+ },
525
+ {
526
+ "token_id": 5444,
527
+ "logit": 5.2098188400268555
528
+ },
529
+ {
530
+ "token_id": 4746,
531
+ "logit": 4.777042388916016
532
+ },
533
+ {
534
+ "token_id": 1827,
535
+ "logit": 4.54682731628418
536
+ },
537
+ {
538
+ "token_id": 369,
539
+ "logit": 3.9637503623962402
540
+ },
541
+ {
542
+ "token_id": 970,
543
+ "logit": 3.765744686126709
544
+ },
545
+ {
546
+ "token_id": 725,
547
+ "logit": 3.7245254516601562
548
+ },
549
+ {
550
+ "token_id": 421,
551
+ "logit": 3.630314350128174
552
+ },
553
+ {
554
+ "token_id": 731,
555
+ "logit": 3.2161593437194824
556
+ },
557
+ {
558
+ "token_id": 401,
559
+ "logit": 3.1881794929504395
560
+ },
561
+ {
562
+ "token_id": 2286,
563
+ "logit": 3.168586254119873
564
+ },
565
+ {
566
+ "token_id": 1422,
567
+ "logit": 3.151968240737915
568
+ },
569
+ {
570
+ "token_id": 10528,
571
+ "logit": 3.148939847946167
572
+ },
573
+ {
574
+ "token_id": 1544,
575
+ "logit": 3.1278254985809326
576
+ },
577
+ {
578
+ "token_id": 973,
579
+ "logit": 3.1256284713745117
580
+ },
581
+ {
582
+ "token_id": 4286,
583
+ "logit": 3.0892391204833984
584
+ }
585
+ ],
586
+ "top1_match": true,
587
+ "top10_overlap": 10,
588
+ "full_vocab_mean_abs_logit_error": 0.0076530445367097855,
589
+ "top20_union_mean_abs_logit_error": 0.01078406535089016,
590
+ "hf_generated_suffix": " 6<|box_end|>",
591
+ "gguf_generated_suffix": " 6 [end of text]\n\n\n"
592
+ },
593
+ {
594
+ "prompt": "def fibonacci(n):",
595
+ "hf_input_ids": [
596
+ 26763,
597
+ 8430,
598
+ 21728,
599
+ 3145,
600
+ 10018
601
+ ],
602
+ "gguf_input_ids": [
603
+ 26763,
604
+ 8430,
605
+ 21728,
606
+ 3145,
607
+ 10018
608
+ ],
609
+ "tokens_match": true,
610
+ "hf_next_top20": [
611
+ {
612
+ "token_id": 56,
613
+ "logit": 10.336507797241211
614
+ },
615
+ {
616
+ "token_id": 2904,
617
+ "logit": 7.036001205444336
618
+ },
619
+ {
620
+ "token_id": 11,
621
+ "logit": 6.869954586029053
622
+ },
623
+ {
624
+ "token_id": 10018,
625
+ "logit": 6.520134449005127
626
+ },
627
+ {
628
+ "token_id": 108,
629
+ "logit": 6.356081008911133
630
+ },
631
+ {
632
+ "token_id": 318,
633
+ "logit": 6.157332420349121
634
+ },
635
+ {
636
+ "token_id": 236,
637
+ "logit": 6.079721927642822
638
+ },
639
+ {
640
+ "token_id": 38,
641
+ "logit": 5.885505676269531
642
+ },
643
+ {
644
+ "token_id": 68,
645
+ "logit": 5.052194118499756
646
+ },
647
+ {
648
+ "token_id": 39,
649
+ "logit": 4.714491844177246
650
+ },
651
+ {
652
+ "token_id": 76,
653
+ "logit": 4.675869464874268
654
+ },
655
+ {
656
+ "token_id": 57,
657
+ "logit": 4.6666717529296875
658
+ },
659
+ {
660
+ "token_id": 59,
661
+ "logit": 4.655692100524902
662
+ },
663
+ {
664
+ "token_id": 1941,
665
+ "logit": 4.59184455871582
666
+ },
667
+ {
668
+ "token_id": 401,
669
+ "logit": 4.578482151031494
670
+ },
671
+ {
672
+ "token_id": 395,
673
+ "logit": 4.54650354385376
674
+ },
675
+ {
676
+ "token_id": 8430,
677
+ "logit": 4.537847518920898
678
+ },
679
+ {
680
+ "token_id": 95,
681
+ "logit": 4.34259033203125
682
+ },
683
+ {
684
+ "token_id": 364,
685
+ "logit": 4.28465461730957
686
+ },
687
+ {
688
+ "token_id": 3145,
689
+ "logit": 4.255484580993652
690
+ }
691
+ ],
692
+ "gguf_next_top20": [
693
+ {
694
+ "token_id": 56,
695
+ "logit": 10.327644348144531
696
+ },
697
+ {
698
+ "token_id": 2904,
699
+ "logit": 7.035848140716553
700
+ },
701
+ {
702
+ "token_id": 11,
703
+ "logit": 6.860950946807861
704
+ },
705
+ {
706
+ "token_id": 10018,
707
+ "logit": 6.495933532714844
708
+ },
709
+ {
710
+ "token_id": 108,
711
+ "logit": 6.341927528381348
712
+ },
713
+ {
714
+ "token_id": 318,
715
+ "logit": 6.146999359130859
716
+ },
717
+ {
718
+ "token_id": 236,
719
+ "logit": 6.0702805519104
720
+ },
721
+ {
722
+ "token_id": 38,
723
+ "logit": 5.865842819213867
724
+ },
725
+ {
726
+ "token_id": 68,
727
+ "logit": 5.02256965637207
728
+ },
729
+ {
730
+ "token_id": 39,
731
+ "logit": 4.66829252243042
732
+ },
733
+ {
734
+ "token_id": 76,
735
+ "logit": 4.6676859855651855
736
+ },
737
+ {
738
+ "token_id": 59,
739
+ "logit": 4.662688255310059
740
+ },
741
+ {
742
+ "token_id": 57,
743
+ "logit": 4.66217041015625
744
+ },
745
+ {
746
+ "token_id": 1941,
747
+ "logit": 4.598584175109863
748
+ },
749
+ {
750
+ "token_id": 401,
751
+ "logit": 4.574923515319824
752
+ },
753
+ {
754
+ "token_id": 395,
755
+ "logit": 4.532867431640625
756
+ },
757
+ {
758
+ "token_id": 8430,
759
+ "logit": 4.5034050941467285
760
+ },
761
+ {
762
+ "token_id": 95,
763
+ "logit": 4.318176746368408
764
+ },
765
+ {
766
+ "token_id": 364,
767
+ "logit": 4.2872209548950195
768
+ },
769
+ {
770
+ "token_id": 48,
771
+ "logit": 4.243288516998291
772
+ }
773
+ ],
774
+ "top1_match": true,
775
+ "top10_overlap": 10,
776
+ "full_vocab_mean_abs_logit_error": 0.0045031774789094925,
777
+ "top20_union_mean_abs_logit_error": 0.01457904651761055,
778
+ "hf_generated_suffix": "::::::::::::::::::::::::::::::::",
779
+ "gguf_generated_suffix": "::::::::::::::::::::::::::::::::\n\n"
780
+ }
781
+ ]
782
+ }
reports/validation/final_report.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # HRM-Text-1B BF16 GGUF Validation Report
2
+
3
+ Date: 2026-05-21
4
+
5
+ ## Sources
6
+
7
+ - HF model: `sapientinc/HRM-Text-1B`
8
+ - HF snapshot SHA: `2285b999f6fb8a5b16e0cc313a9e8e4fe447140d`
9
+ - HF `model.safetensors` SHA256: `F8FE2B2BF6948414E8E8D6538659198726D98F967C55B533B7AABE8A1FA9A584`
10
+ - llama.cpp base commit: `6a257d44633d4a752183ed778b88d2924d0a6b9d`
11
+
12
+ ## Artifacts
13
+
14
+ - GGUF: `fresh\out\gguf\HRM-Text-1B-BF16.gguf`
15
+ - GGUF SHA256: `2DD5E2EF55E40C46DB0D0CB4CF1427A4E72DA34FEE36F0D2B73D081D0E1C2010`
16
+ - Baseline report: `fresh\reports\validation\baseline_transformers.json`
17
+ - Tensor validation: `fresh\reports\validation\bf16_tensor_validation.json`
18
+ - Runtime validation: `fresh\reports\validation\bf16_vs_hf.json`
19
+
20
+ Q8_0 quantization was intentionally skipped.
21
+
22
+ ## Commands
23
+
24
+ ```powershell
25
+ python fresh\tools\baseline_transformers.py --model-dir fresh\models\hf\HRM-Text-1B --out fresh\reports\validation\baseline_transformers.json --hf-modules-cache fresh\cache\hf_modules --threads 4
26
+
27
+ python fresh\third_party\llama.cpp\convert_hf_to_gguf.py fresh\models\hf\HRM-Text-1B --outfile fresh\out\gguf\HRM-Text-1B-BF16.gguf --outtype bf16 --model-name HRM-Text-1B
28
+
29
+ cmd.exe /c "call ""C:\Program Files\Microsoft Visual Studio\18\Community\Common7\Tools\VsDevCmd.bat"" -arch=x64 -host_arch=x64 && cmake --build build-hrm-nmake --target llama-cli llama-completion llama-results"
30
+
31
+ python fresh\tools\validate_gguf_tensors.py --hf-dir fresh\models\hf\HRM-Text-1B --gguf fresh\out\gguf\HRM-Text-1B-BF16.gguf --out fresh\reports\validation\bf16_tensor_validation.json
32
+
33
+ python fresh\tools\validate_bf16_runtime.py --hf-dir fresh\models\hf\HRM-Text-1B --gguf fresh\out\gguf\HRM-Text-1B-BF16.gguf --llama-results fresh\third_party\llama.cpp\build-hrm-nmake\bin\llama-results.exe --llama-completion fresh\third_party\llama.cpp\build-hrm-nmake\bin\llama-completion.exe --out fresh\reports\validation\bf16_vs_hf.json --work-dir fresh\reports\validation\runtime_tmp --hf-modules-cache fresh\cache\hf_modules --threads 4 --n-generate 32 --hf-dtype float32
34
+ ```
35
+
36
+ ## Results
37
+
38
+ - Build: pass.
39
+ - Tensor validation: pass. 259/259 tensors found and compared; BF16 tensor bits match HF after expected BF16 conversion.
40
+ - Runtime validation: pass. Prompt token IDs match for all prompts. Next-token top-1 matches 4/4 prompts. Top-10 overlap is 10/10 for all prompts.
41
+ - Full-vocab mean absolute logit error:
42
+ - `The quick brown fox`: `0.0199148655`
43
+ - `In a distant future, humanity`: `0.0051696529`
44
+ - `Question: What is 2+2?\nAnswer:`: `0.0076530445`
45
+ - `def fibonacci(n):`: `0.0045031775`
46
+ - Text validation: pass. The BF16 GGUF continuations are aligned with the Transformers baseline. Existing repetition is inherited from the source model, not introduced by conversion.
47
+
48
+ The runtime comparison uses HF weights loaded as Float32 from the BF16 checkpoint, matching llama.cpp CPU behavior: BF16-stored weights with Float32 compute/accumulation.
runtime/llama.cpp-hrm_text.patch ADDED
@@ -0,0 +1,556 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ diff --git a/conversion/__init__.py b/conversion/__init__.py
2
+ index 2c38123df..ecf1be2db 100644
3
+ --- a/conversion/__init__.py
4
+ +++ b/conversion/__init__.py
5
+ @@ -95,6 +95,7 @@ TEXT_MODEL_MAP: dict[str, str] = {
6
+ "HunYuanDenseV1ForCausalLM": "hunyuan",
7
+ "HunYuanMoEV1ForCausalLM": "hunyuan",
8
+ "HunYuanVLForConditionalGeneration": "hunyuan",
9
+ + "HrmTextForCausalLM": "hrm_text",
10
+ "IQuestCoderForCausalLM": "llama",
11
+ "InternLM2ForCausalLM": "internlm",
12
+ "InternLM3ForCausalLM": "internlm",
13
+ diff --git a/conversion/hrm_text.py b/conversion/hrm_text.py
14
+ new file mode 100644
15
+ index 000000000..1f29ab55e
16
+ --- /dev/null
17
+ +++ b/conversion/hrm_text.py
18
+ @@ -0,0 +1,120 @@
19
+ +from __future__ import annotations
20
+ +
21
+ +import re
22
+ +import json
23
+ +
24
+ +from typing import Iterable, TYPE_CHECKING
25
+ +
26
+ +import torch
27
+ +
28
+ +if TYPE_CHECKING:
29
+ + from torch import Tensor
30
+ +
31
+ +from .base import ModelBase, TextModel, gguf, logger
32
+ +
33
+ +
34
+ +@ModelBase.register("HrmTextForCausalLM")
35
+ +class HrmTextModel(TextModel):
36
+ + model_arch = gguf.MODEL_ARCH.HRM_TEXT
37
+ +
38
+ + def __init__(self, *args, **kwargs):
39
+ + super().__init__(*args, **kwargs)
40
+ +
41
+ + with open(self.dir_model / "config.json", "r", encoding="utf-8") as f:
42
+ + self.raw_hparams = json.load(f)
43
+ +
44
+ + self.layers_per_stack = self.raw_hparams["num_hidden_layers"]
45
+ + self.h_cycles = self.raw_hparams["H_cycles"]
46
+ + self.l_cycles = self.raw_hparams["L_cycles"]
47
+ + self.physical_block_count = self.layers_per_stack * 2
48
+ + self.cache_block_count = self.layers_per_stack * self.h_cycles * (self.l_cycles + 1)
49
+ +
50
+ + # GGUF tensors store one physical L stack followed by one physical H stack.
51
+ + # The runtime expands these 32 physical layers across 128 KV-cache slots.
52
+ + self.block_count = self.physical_block_count
53
+ + self.tensor_map = gguf.get_tensor_name_map(self.model_arch, self.block_count)
54
+ +
55
+ + def set_vocab(self):
56
+ + # HRM-Text ships a Qwen2-style tokenizer.json. Keep it as a plain tokenizer;
57
+ + # do not add a chat template for validation GGUFs.
58
+ + self._set_vocab_gpt2()
59
+ +
60
+ + def get_vocab_base_pre(self, tokenizer) -> str:
61
+ + del tokenizer
62
+ + return "qwen2"
63
+ +
64
+ + def set_gguf_parameters(self):
65
+ + hp = self.raw_hparams
66
+ + head_dim = hp["head_dim"]
67
+ +
68
+ + self.gguf_writer.add_context_length(hp["max_position_embeddings"])
69
+ + self.gguf_writer.add_embedding_length(hp["hidden_size"])
70
+ + self.gguf_writer.add_block_count(self.cache_block_count)
71
+ + self.gguf_writer.add_feed_forward_length(hp["intermediate_size"])
72
+ + self.gguf_writer.add_head_count(hp["num_attention_heads"])
73
+ + self.gguf_writer.add_head_count_kv(hp["num_key_value_heads"])
74
+ + self.gguf_writer.add_key_length(head_dim)
75
+ + self.gguf_writer.add_value_length(head_dim)
76
+ + self.gguf_writer.add_rope_dimension_count(head_dim)
77
+ + self.gguf_writer.add_rope_freq_base(hp.get("rope_theta", 10000.0))
78
+ + self.gguf_writer.add_layer_norm_rms_eps(hp["rms_norm_eps"])
79
+ + self.gguf_writer.add_embedding_scale(hp["embedding_scale"])
80
+ +
81
+ + arch = self.gguf_writer.arch
82
+ + self.gguf_writer.add_uint32(gguf.Keys.LLM.HRM_LAYERS_PER_STACK.format(arch=arch), self.layers_per_stack)
83
+ + self.gguf_writer.add_uint32(gguf.Keys.LLM.HRM_H_CYCLES.format(arch=arch), self.h_cycles)
84
+ + self.gguf_writer.add_uint32(gguf.Keys.LLM.HRM_L_CYCLES.format(arch=arch), self.l_cycles)
85
+ + self.gguf_writer.add_bool(gguf.Keys.LLM.HRM_PREFIX_LM.format(arch=arch), bool(hp.get("prefix_lm", False)))
86
+ +
87
+ + def _format(self, key: gguf.MODEL_TENSOR, bid: int | None = None, suffix: str = ".weight") -> str:
88
+ + return self.format_tensor_name(key, bid=bid, suffix=suffix)
89
+ +
90
+ + def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:
91
+ + if name == "model.embed_tokens.weight":
92
+ + yield self._format(gguf.MODEL_TENSOR.TOKEN_EMBD), data_torch
93
+ + return
94
+ +
95
+ + if name == "lm_head.weight":
96
+ + yield self._format(gguf.MODEL_TENSOR.OUTPUT), data_torch
97
+ + return
98
+ +
99
+ + if name == "model.z_L_init":
100
+ + yield self._format(gguf.MODEL_TENSOR.HRM_Z_L_INIT, suffix=""), data_torch
101
+ + return
102
+ +
103
+ + match = re.fullmatch(r"model\.([LH])_module\.layers\.(\d+)\.(.+)", name)
104
+ + if match is None:
105
+ + raise ValueError(f"Can not map tensor {name!r}")
106
+ +
107
+ + stack, layer_s, tensor_name = match.groups()
108
+ + layer_idx = int(layer_s)
109
+ + if layer_idx >= self.layers_per_stack:
110
+ + raise ValueError(f"Layer index {layer_idx} outside HRM stack size {self.layers_per_stack}")
111
+ +
112
+ + physical_bid = layer_idx + (self.layers_per_stack if stack == "H" else 0)
113
+ +
114
+ + if tensor_name == "attn.gqkv_proj.weight":
115
+ + gate, q, k, v = torch.chunk(data_torch, 4, dim=0)
116
+ + logger.debug("Split %s as gate, q, k, v", name)
117
+ + yield self._format(gguf.MODEL_TENSOR.ATTN_GATE, physical_bid), gate.contiguous()
118
+ + yield self._format(gguf.MODEL_TENSOR.ATTN_Q, physical_bid), q.contiguous()
119
+ + yield self._format(gguf.MODEL_TENSOR.ATTN_K, physical_bid), k.contiguous()
120
+ + yield self._format(gguf.MODEL_TENSOR.ATTN_V, physical_bid), v.contiguous()
121
+ + return
122
+ +
123
+ + if tensor_name == "attn.o_proj.weight":
124
+ + yield self._format(gguf.MODEL_TENSOR.ATTN_OUT, physical_bid), data_torch
125
+ + return
126
+ +
127
+ + if tensor_name == "mlp.gate_up_proj.weight":
128
+ + gate, up = torch.chunk(data_torch, 2, dim=0)
129
+ + logger.debug("Split %s as gate, up", name)
130
+ + yield self._format(gguf.MODEL_TENSOR.FFN_GATE, physical_bid), gate.contiguous()
131
+ + yield self._format(gguf.MODEL_TENSOR.FFN_UP, physical_bid), up.contiguous()
132
+ + return
133
+ +
134
+ + if tensor_name == "mlp.down_proj.weight":
135
+ + yield self._format(gguf.MODEL_TENSOR.FFN_DOWN, physical_bid), data_torch
136
+ + return
137
+ +
138
+ + raise ValueError(f"Can not map tensor {name!r}")
139
+ diff --git a/gguf-py/gguf/constants.py b/gguf-py/gguf/constants.py
140
+ index 7fdcf03d7..b84cc8827 100644
141
+ --- a/gguf-py/gguf/constants.py
142
+ +++ b/gguf-py/gguf/constants.py
143
+ @@ -144,6 +144,10 @@ class Keys:
144
+ TOKEN_SHIFT_COUNT = "{arch}.token_shift_count"
145
+ INTERLEAVE_MOE_LAYER_STEP = "{arch}.interleave_moe_layer_step"
146
+ FULL_ATTENTION_INTERVAL = "{arch}.full_attention_interval"
147
+ + HRM_LAYERS_PER_STACK = "{arch}.layers_per_stack"
148
+ + HRM_H_CYCLES = "{arch}.h_cycles"
149
+ + HRM_L_CYCLES = "{arch}.l_cycles"
150
+ + HRM_PREFIX_LM = "{arch}.prefix_lm"
151
+ ACTIVATION_SPARSITY_SCALE = "{arch}.activation_sparsity_scale"
152
+ ALTUP_ACTIVE_IDX = "{arch}.altup.active_idx"
153
+ ALTUP_NUM_INPUTS = "{arch}.altup.num_inputs"
154
+ @@ -410,6 +414,7 @@ class MODEL_ARCH(IntEnum):
155
+ QWEN3 = auto()
156
+ QWEN3MOE = auto()
157
+ QWEN3NEXT = auto()
158
+ + HRM_TEXT = auto()
159
+ QWEN3VL = auto()
160
+ QWEN3VLMOE = auto()
161
+ QWEN35 = auto()
162
+ @@ -527,6 +532,7 @@ class MODEL_TENSOR(IntEnum):
163
+ TOKEN_TYPES = auto()
164
+ POS_EMBD = auto()
165
+ OUTPUT = auto()
166
+ + HRM_Z_L_INIT = auto()
167
+ DENSE_2_OUT = auto() # embeddinggemma 2_Dense
168
+ DENSE_3_OUT = auto() # embeddinggemma 3_Dense
169
+ OUTPUT_NORM = auto()
170
+ @@ -925,6 +931,7 @@ MODEL_ARCH_NAMES: dict[MODEL_ARCH, str] = {
171
+ MODEL_ARCH.QWEN3: "qwen3",
172
+ MODEL_ARCH.QWEN3MOE: "qwen3moe",
173
+ MODEL_ARCH.QWEN3NEXT: "qwen3next",
174
+ + MODEL_ARCH.HRM_TEXT: "hrm_text",
175
+ MODEL_ARCH.QWEN3VL: "qwen3vl",
176
+ MODEL_ARCH.QWEN3VLMOE: "qwen3vlmoe",
177
+ MODEL_ARCH.QWEN35: "qwen35",
178
+ @@ -1042,6 +1049,7 @@ TENSOR_NAMES: dict[MODEL_TENSOR, str] = {
179
+ MODEL_TENSOR.POS_EMBD: "position_embd",
180
+ MODEL_TENSOR.OUTPUT_NORM: "output_norm",
181
+ MODEL_TENSOR.OUTPUT: "output",
182
+ + MODEL_TENSOR.HRM_Z_L_INIT: "hrm.z_l_init",
183
+ MODEL_TENSOR.DENSE_2_OUT: "dense_2", # embeddinggemma 2_Dense
184
+ MODEL_TENSOR.DENSE_3_OUT: "dense_3", # embeddinggemma 2_Dense
185
+ MODEL_TENSOR.ROPE_FREQS: "rope_freqs",
186
+ @@ -2057,6 +2065,19 @@ MODEL_TENSORS: dict[MODEL_ARCH, list[MODEL_TENSOR]] = {
187
+ MODEL_TENSOR.SSM_BETA_ALPHA,
188
+ MODEL_TENSOR.SSM_OUT
189
+ ],
190
+ + MODEL_ARCH.HRM_TEXT: [
191
+ + MODEL_TENSOR.TOKEN_EMBD,
192
+ + MODEL_TENSOR.OUTPUT,
193
+ + MODEL_TENSOR.HRM_Z_L_INIT,
194
+ + MODEL_TENSOR.ATTN_Q,
195
+ + MODEL_TENSOR.ATTN_K,
196
+ + MODEL_TENSOR.ATTN_V,
197
+ + MODEL_TENSOR.ATTN_GATE,
198
+ + MODEL_TENSOR.ATTN_OUT,
199
+ + MODEL_TENSOR.FFN_GATE,
200
+ + MODEL_TENSOR.FFN_DOWN,
201
+ + MODEL_TENSOR.FFN_UP,
202
+ + ],
203
+ MODEL_ARCH.QWEN3VL: [
204
+ MODEL_TENSOR.TOKEN_EMBD,
205
+ MODEL_TENSOR.OUTPUT_NORM,
206
+ diff --git a/src/llama-arch.cpp b/src/llama-arch.cpp
207
+ index c9eead18a..5b8ee3781 100644
208
+ --- a/src/llama-arch.cpp
209
+ +++ b/src/llama-arch.cpp
210
+ @@ -37,6 +37,7 @@ static const std::map<llm_arch, const char *> LLM_ARCH_NAMES = {
211
+ { LLM_ARCH_QWEN3, "qwen3" },
212
+ { LLM_ARCH_QWEN3MOE, "qwen3moe" },
213
+ { LLM_ARCH_QWEN3NEXT, "qwen3next" },
214
+ + { LLM_ARCH_HRM_TEXT, "hrm_text" },
215
+ { LLM_ARCH_QWEN3VL, "qwen3vl" },
216
+ { LLM_ARCH_QWEN3VLMOE, "qwen3vlmoe" },
217
+ { LLM_ARCH_QWEN35, "qwen35" },
218
+ @@ -209,6 +210,10 @@ static const std::map<llm_kv, const char *> LLM_KV_NAMES = {
219
+ { LLM_KV_TOKEN_SHIFT_COUNT, "%s.token_shift_count" },
220
+ { LLM_KV_INTERLEAVE_MOE_LAYER_STEP, "%s.interleave_moe_layer_step" },
221
+ { LLM_KV_FULL_ATTENTION_INTERVAL, "%s.full_attention_interval" },
222
+ + { LLM_KV_HRM_LAYERS_PER_STACK, "%s.layers_per_stack" },
223
+ + { LLM_KV_HRM_H_CYCLES, "%s.h_cycles" },
224
+ + { LLM_KV_HRM_L_CYCLES, "%s.l_cycles" },
225
+ + { LLM_KV_HRM_PREFIX_LM, "%s.prefix_lm" },
226
+
227
+ { LLM_KV_ATTENTION_HEAD_COUNT, "%s.attention.head_count" },
228
+ { LLM_KV_ATTENTION_HEAD_COUNT_KV, "%s.attention.head_count_kv" },
229
+ @@ -346,6 +351,7 @@ static const std::map<llm_tensor, const char *> LLM_TENSOR_NAMES = {
230
+ { LLM_TENSOR_OUTPUT_NORM, "output_norm" },
231
+ { LLM_TENSOR_OUTPUT_NORM_LFM2, "token_embd_norm" }, // fix for wrong tensor name
232
+ { LLM_TENSOR_OUTPUT, "output" },
233
+ + { LLM_TENSOR_HRM_Z_L_INIT, "hrm.z_l_init" },
234
+ { LLM_TENSOR_ROPE_FREQS, "rope_freqs" },
235
+ { LLM_TENSOR_ATTN_NORM, "blk.%d.attn_norm" },
236
+ { LLM_TENSOR_ATTN_Q, "blk.%d.attn_q" },
237
+ @@ -565,6 +571,7 @@ static const std::map<llm_tensor, llm_tensor_info> LLM_TENSOR_INFOS = {
238
+ {LLM_TENSOR_POS_EMBD, {LLM_TENSOR_LAYER_INPUT, GGML_OP_GET_ROWS}},
239
+ {LLM_TENSOR_TOKEN_TYPES, {LLM_TENSOR_LAYER_INPUT, GGML_OP_GET_ROWS}},
240
+ {LLM_TENSOR_TOKEN_EMBD_NORM, {LLM_TENSOR_LAYER_REPEATING, GGML_OP_MUL}}, // do the norms on the first layer (not the input layer)
241
+ + {LLM_TENSOR_HRM_Z_L_INIT, {LLM_TENSOR_LAYER_INPUT, GGML_OP_MUL}},
242
+ {LLM_TENSOR_OUTPUT, {LLM_TENSOR_LAYER_OUTPUT, GGML_OP_MUL_MAT}},
243
+ {LLM_TENSOR_CLS, {LLM_TENSOR_LAYER_OUTPUT, GGML_OP_MUL_MAT}},
244
+ {LLM_TENSOR_CLS_OUT, {LLM_TENSOR_LAYER_OUTPUT, GGML_OP_MUL_MAT}},
245
+ diff --git a/src/llama-arch.h b/src/llama-arch.h
246
+ index 89cf16cc3..fa04b684b 100644
247
+ --- a/src/llama-arch.h
248
+ +++ b/src/llama-arch.h
249
+ @@ -41,6 +41,7 @@ enum llm_arch {
250
+ LLM_ARCH_QWEN3,
251
+ LLM_ARCH_QWEN3MOE,
252
+ LLM_ARCH_QWEN3NEXT,
253
+ + LLM_ARCH_HRM_TEXT,
254
+ LLM_ARCH_QWEN3VL,
255
+ LLM_ARCH_QWEN3VLMOE,
256
+ LLM_ARCH_QWEN35,
257
+ @@ -213,6 +214,10 @@ enum llm_kv {
258
+ LLM_KV_TOKEN_SHIFT_COUNT,
259
+ LLM_KV_INTERLEAVE_MOE_LAYER_STEP,
260
+ LLM_KV_FULL_ATTENTION_INTERVAL,
261
+ + LLM_KV_HRM_LAYERS_PER_STACK,
262
+ + LLM_KV_HRM_H_CYCLES,
263
+ + LLM_KV_HRM_L_CYCLES,
264
+ + LLM_KV_HRM_PREFIX_LM,
265
+
266
+ LLM_KV_ATTENTION_HEAD_COUNT,
267
+ LLM_KV_ATTENTION_HEAD_COUNT_KV,
268
+ @@ -354,6 +359,7 @@ enum llm_tensor {
269
+ LLM_TENSOR_DENSE_2_OUT,
270
+ LLM_TENSOR_DENSE_3_OUT,
271
+ LLM_TENSOR_OUTPUT,
272
+ + LLM_TENSOR_HRM_Z_L_INIT,
273
+ LLM_TENSOR_OUTPUT_NORM,
274
+ LLM_TENSOR_OUTPUT_NORM_LFM2, // fix for wrong tensor name
275
+ LLM_TENSOR_ROPE_FREQS,
276
+ diff --git a/src/llama-context.cpp b/src/llama-context.cpp
277
+ index ad36c0666..fa80f4260 100644
278
+ --- a/src/llama-context.cpp
279
+ +++ b/src/llama-context.cpp
280
+ @@ -2208,6 +2208,9 @@ uint32_t llama_context::graph_max_nodes(uint32_t n_tokens) const {
281
+ if (model.arch == LLM_ARCH_QWEN3NEXT || model.arch == LLM_ARCH_KIMI_LINEAR || model.arch == LLM_ARCH_QWEN35 || model.arch == LLM_ARCH_QWEN35MOE) {
282
+ return std::max<uint32_t>(n_tokens * 40, 32u * model.n_tensors());
283
+ }
284
+ + if (model.arch == LLM_ARCH_HRM_TEXT) {
285
+ + return std::max<uint32_t>(n_tokens * 80, 64u * model.n_tensors());
286
+ + }
287
+ uint32_t res = std::max<uint32_t>(1024u, 8u*model.n_tensors());
288
+ for (const auto & lora : model.loras) {
289
+ res += lora->get_n_nodes();
290
+ diff --git a/src/llama-hparams.h b/src/llama-hparams.h
291
+ index e2d051edc..812598f69 100644
292
+ --- a/src/llama-hparams.h
293
+ +++ b/src/llama-hparams.h
294
+ @@ -164,6 +164,12 @@ struct llama_hparams {
295
+ float f_embedding_scale = 0.0f;
296
+ float f_attention_scale = 0.0f;
297
+
298
+ + // HRM-Text recurrence metadata. n_layer remains the expanded KV-cache slot count.
299
+ + uint32_t n_hrm_layer_per_stack = 0;
300
+ + uint32_t n_hrm_h_cycles = 0;
301
+ + uint32_t n_hrm_l_cycles = 0;
302
+ + bool hrm_prefix_lm = false;
303
+ +
304
+ // grok-2
305
+ float f_attn_out_scale = 0.0f;
306
+ uint32_t attn_temp_length = 0;
307
+ diff --git a/src/llama-model-saver.cpp b/src/llama-model-saver.cpp
308
+ index 528e4c9c0..8a6e009c6 100644
309
+ --- a/src/llama-model-saver.cpp
310
+ +++ b/src/llama-model-saver.cpp
311
+ @@ -245,6 +245,10 @@ void llama_model_saver::add_kv_from_model() {
312
+ add_kv(LLM_KV_TOKEN_SHIFT_COUNT, hparams.token_shift_count);
313
+ add_kv(LLM_KV_INTERLEAVE_MOE_LAYER_STEP, hparams.n_moe_layer_step);
314
+ // add_kv(LLM_KV_FULL_ATTENTION_INTERVAL, ???);
315
+ + add_kv(LLM_KV_HRM_LAYERS_PER_STACK, hparams.n_hrm_layer_per_stack);
316
+ + add_kv(LLM_KV_HRM_H_CYCLES, hparams.n_hrm_h_cycles);
317
+ + add_kv(LLM_KV_HRM_L_CYCLES, hparams.n_hrm_l_cycles);
318
+ + add_kv(LLM_KV_HRM_PREFIX_LM, hparams.hrm_prefix_lm);
319
+
320
+ add_kv(LLM_KV_ATTENTION_HEAD_COUNT, hparams.n_head_arr, true);
321
+ add_kv(LLM_KV_ATTENTION_HEAD_COUNT_KV, hparams.n_head_kv_arr, true);
322
+ diff --git a/src/llama-model.cpp b/src/llama-model.cpp
323
+ index 8bf20a716..a3cc996aa 100644
324
+ --- a/src/llama-model.cpp
325
+ +++ b/src/llama-model.cpp
326
+ @@ -96,6 +96,8 @@ static llama_model * llama_model_mapping(llm_arch arch, const llama_model_params
327
+ return new llama_model_qwen2moe(params);
328
+ case LLM_ARCH_QWEN3:
329
+ return new llama_model_qwen3(params);
330
+ + case LLM_ARCH_HRM_TEXT:
331
+ + return new llama_model_hrm_text(params);
332
+ case LLM_ARCH_QWEN3MOE:
333
+ return new llama_model_qwen3moe(params);
334
+ case LLM_ARCH_QWEN3VL:
335
+ @@ -2339,6 +2341,7 @@ llama_rope_type llama_model_rope_type(const llama_model * model) {
336
+ case LLM_ARCH_PANGU_EMBED:
337
+ case LLM_ARCH_AFMOE:
338
+ case LLM_ARCH_QWEN3NEXT:
339
+ + case LLM_ARCH_HRM_TEXT:
340
+ case LLM_ARCH_MIMO2:
341
+ case LLM_ARCH_STEP35:
342
+ return LLAMA_ROPE_TYPE_NEOX;
343
+ diff --git a/src/models/hrm-text.cpp b/src/models/hrm-text.cpp
344
+ new file mode 100644
345
+ index 000000000..e0a3e9f59
346
+ --- /dev/null
347
+ +++ b/src/models/hrm-text.cpp
348
+ @@ -0,0 +1,183 @@
349
+ +#include "models.h"
350
+ +
351
+ +#include <cmath>
352
+ +#include <vector>
353
+ +
354
+ +void llama_model_hrm_text::load_arch_hparams(llama_model_loader & ml) {
355
+ + ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps);
356
+ + ml.get_key(LLM_KV_EMBEDDING_SCALE, hparams.f_embedding_scale);
357
+ + ml.get_key(LLM_KV_HRM_LAYERS_PER_STACK, hparams.n_hrm_layer_per_stack);
358
+ + ml.get_key(LLM_KV_HRM_H_CYCLES, hparams.n_hrm_h_cycles);
359
+ + ml.get_key(LLM_KV_HRM_L_CYCLES, hparams.n_hrm_l_cycles);
360
+ + ml.get_key(LLM_KV_HRM_PREFIX_LM, hparams.hrm_prefix_lm, false);
361
+ +
362
+ + switch (hparams.n_embd) {
363
+ + case 1536: type = LLM_TYPE_1B; break;
364
+ + default: type = LLM_TYPE_UNKNOWN;
365
+ + }
366
+ +}
367
+ +
368
+ +void llama_model_hrm_text::load_arch_tensors(llama_model_loader &) {
369
+ + LLAMA_LOAD_LOCALS;
370
+ +
371
+ + const int64_t n_stack = hparams.n_hrm_layer_per_stack;
372
+ + const int64_t n_cycle_slots = n_stack * (hparams.n_hrm_l_cycles + 1);
373
+ +
374
+ + tok_embd = create_tensor(tn(LLM_TENSOR_TOKEN_EMBD, "weight"), {n_embd, n_vocab}, 0);
375
+ + output = create_tensor(tn(LLM_TENSOR_OUTPUT, "weight"), {n_embd, n_vocab}, 0);
376
+ +
377
+ + hrm_z_l_init = create_tensor(tn(LLM_TENSOR_HRM_Z_L_INIT), {n_embd}, 0);
378
+ +
379
+ + std::vector<bool> loaded_physical(2 * n_stack, false);
380
+ +
381
+ + for (int il = 0; il < n_layer; ++il) {
382
+ + auto & layer = layers[il];
383
+ +
384
+ + const int64_t layer_in_stack = il % n_stack;
385
+ + const int64_t phase = (il % n_cycle_slots) / n_stack;
386
+ + const bool is_h_stack = phase == int64_t(hparams.n_hrm_l_cycles);
387
+ + const int physical_bid = int((is_h_stack ? n_stack : 0) + layer_in_stack);
388
+ +
389
+ + const int flags = loaded_physical[physical_bid] ? TENSOR_DUPLICATED : 0;
390
+ + loaded_physical[physical_bid] = true;
391
+ +
392
+ + create_tensor_qkv(layer, physical_bid,
393
+ + n_embd,
394
+ + n_embd_head_k * n_head,
395
+ + n_embd_k_gqa,
396
+ + n_embd_v_gqa,
397
+ + flags);
398
+ +
399
+ + layer.wqkv_gate = create_tensor(tn(LLM_TENSOR_ATTN_GATE, "weight", physical_bid), {n_embd, n_embd_head_k * n_head}, flags);
400
+ + layer.wo = create_tensor(tn(LLM_TENSOR_ATTN_OUT, "weight", physical_bid), {n_embd_head_k * n_head, n_embd}, flags);
401
+ +
402
+ + layer.ffn_gate = create_tensor(tn(LLM_TENSOR_FFN_GATE, "weight", physical_bid), {n_embd, n_ff}, flags);
403
+ + layer.ffn_down = create_tensor(tn(LLM_TENSOR_FFN_DOWN, "weight", physical_bid), {n_ff, n_embd}, flags);
404
+ + layer.ffn_up = create_tensor(tn(LLM_TENSOR_FFN_UP, "weight", physical_bid), {n_embd, n_ff}, flags);
405
+ + }
406
+ +}
407
+ +
408
+ +std::unique_ptr<llm_graph_context> llama_model_hrm_text::build_arch_graph(const llm_graph_params & params) const {
409
+ + return std::make_unique<graph>(*this, params);
410
+ +}
411
+ +
412
+ +llama_model_hrm_text::graph::graph(const llama_model & model_, const llm_graph_params & params) : llm_graph_context(params) {
413
+ + const auto & model = static_cast<const llama_model_hrm_text &>(model_);
414
+ +
415
+ + GGML_ASSERT(model.tok_embd != nullptr);
416
+ + GGML_ASSERT(model.output != nullptr);
417
+ + GGML_ASSERT(model.hrm_z_l_init != nullptr);
418
+ +
419
+ + const int64_t n_embd_head = hparams.n_embd_head_v();
420
+ + GGML_ASSERT(n_embd_head == hparams.n_embd_head_k());
421
+ + GGML_ASSERT(n_embd_head == n_rot);
422
+ +
423
+ + const int64_t n_stack = hparams.n_hrm_layer_per_stack;
424
+ + const int64_t h_cycles = hparams.n_hrm_h_cycles;
425
+ + const int64_t l_cycles = hparams.n_hrm_l_cycles;
426
+ +
427
+ + ggml_tensor * inp_pos = build_inp_pos();
428
+ + auto * inp_attn = build_attn_inp_kv();
429
+ + ggml_tensor * inp_out_ids = build_inp_out_ids();
430
+ +
431
+ + ggml_tensor * hidden_high = build_inp_embd(model.tok_embd);
432
+ + ggml_tensor * hidden_low = ggml_repeat(ctx0, model.hrm_z_l_init, hidden_high);
433
+ + cb(hidden_low, "hrm_z_l_init", -1);
434
+ +
435
+ + const float kq_scale = 1.0f / std::sqrt(float(n_embd_head));
436
+ +
437
+ + auto build_stack = [&](ggml_tensor * stack_inp, int slot_offset) -> ggml_tensor * {
438
+ + ggml_tensor * stack_cur = stack_inp;
439
+ +
440
+ + for (int layer_idx = 0; layer_idx < n_stack; ++layer_idx) {
441
+ + const int il = slot_offset + layer_idx;
442
+ + const auto & layer = model.layers[il];
443
+ +
444
+ + ggml_tensor * inpSA = stack_cur;
445
+ + ggml_tensor * cur = build_norm(stack_cur, nullptr, nullptr, LLM_NORM_RMS, il);
446
+ + cb(cur, "attn_norm", il);
447
+ +
448
+ + {
449
+ + ggml_tensor * attn_inp = cur;
450
+ + auto [Qcur, Kcur, Vcur] = build_qkv(layer, cur, n_embd_head, n_head, n_head_kv, il);
451
+ +
452
+ + ggml_tensor * gate = build_lora_mm(layer.wqkv_gate, attn_inp, layer.wqkv_gate_s);
453
+ + cb(gate, "attn_gate_proj", il);
454
+ +
455
+ + Qcur = ggml_rope_ext(
456
+ + ctx0, Qcur, inp_pos, nullptr,
457
+ + n_rot, rope_type, n_ctx_orig, freq_base, freq_scale,
458
+ + ext_factor, attn_factor, beta_fast, beta_slow);
459
+ + cb(Qcur, "Qcur_rope", il);
460
+ +
461
+ + Kcur = ggml_rope_ext(
462
+ + ctx0, Kcur, inp_pos, nullptr,
463
+ + n_rot, rope_type, n_ctx_orig, freq_base, freq_scale,
464
+ + ext_factor, attn_factor, beta_fast, beta_slow);
465
+ + cb(Kcur, "Kcur_rope", il);
466
+ +
467
+ + cur = build_attn(inp_attn,
468
+ + nullptr, nullptr, nullptr,
469
+ + Qcur, Kcur, Vcur, nullptr, nullptr, nullptr, kq_scale, il);
470
+ + cb(cur, "attn_out", il);
471
+ +
472
+ + gate = ggml_sigmoid(ctx0, gate);
473
+ + cb(gate, "attn_gate_sig", il);
474
+ +
475
+ + cur = ggml_mul(ctx0, cur, gate);
476
+ + cb(cur, "attn_gated", il);
477
+ +
478
+ + cur = build_lora_mm(layer.wo, cur, layer.wo_s);
479
+ + cb(cur, "attn_o_proj", il);
480
+ + }
481
+ +
482
+ + ggml_tensor * ffn_inp = ggml_add(ctx0, cur, inpSA);
483
+ + cb(ffn_inp, "ffn_inp", il);
484
+ +
485
+ + cur = build_norm(ffn_inp, nullptr, nullptr, LLM_NORM_RMS, il);
486
+ + cb(cur, "ffn_norm", il);
487
+ +
488
+ + cur = build_ffn(cur,
489
+ + layer.ffn_up, nullptr, layer.ffn_up_s,
490
+ + layer.ffn_gate, nullptr, layer.ffn_gate_s,
491
+ + layer.ffn_down, nullptr, layer.ffn_down_s,
492
+ + nullptr,
493
+ + LLM_FFN_SILU, LLM_FFN_PAR, il);
494
+ + cb(cur, "ffn_out", il);
495
+ +
496
+ + cur = ggml_add(ctx0, cur, ffn_inp);
497
+ + cur = build_cvec(cur, il);
498
+ + cb(cur, "hrm_layer_out", il);
499
+ +
500
+ + stack_cur = cur;
501
+ + }
502
+ +
503
+ + stack_cur = build_norm(stack_cur, nullptr, nullptr, LLM_NORM_RMS, slot_offset);
504
+ + cb(stack_cur, "stack_final_norm", slot_offset);
505
+ + return stack_cur;
506
+ + };
507
+ +
508
+ + for (int h = 0; h < h_cycles; ++h) {
509
+ + for (int l = 0; l < l_cycles; ++l) {
510
+ + const int slot_offset = int((h * (l_cycles + 1) + l) * n_stack);
511
+ + hidden_low = build_stack(ggml_add(ctx0, hidden_low, hidden_high), slot_offset);
512
+ + }
513
+ +
514
+ + const int slot_offset = int((h * (l_cycles + 1) + l_cycles) * n_stack);
515
+ + hidden_high = build_stack(ggml_add(ctx0, hidden_high, hidden_low), slot_offset);
516
+ + }
517
+ +
518
+ + ggml_tensor * cur = hidden_high;
519
+ +
520
+ + if (inp_out_ids) {
521
+ + cur = ggml_get_rows(ctx0, cur, inp_out_ids);
522
+ + }
523
+ +
524
+ + res->t_embd = cur;
525
+ +
526
+ + cur = build_lora_mm(model.output, cur, model.output_s);
527
+ + cb(cur, "result_output", -1);
528
+ +
529
+ + res->t_logits = cur;
530
+ + ggml_build_forward_expand(gf, cur);
531
+ +}
532
+ diff --git a/src/models/models.h b/src/models/models.h
533
+ index 7e551eb96..7da6b7f7f 100644
534
+ --- a/src/models/models.h
535
+ +++ b/src/models/models.h
536
+ @@ -515,6 +515,20 @@ struct llama_model_qwen3 : public llama_model_base {
537
+ std::unique_ptr<llm_graph_context> build_arch_graph(const llm_graph_params & params) const override;
538
+ };
539
+
540
+ +struct llama_model_hrm_text : public llama_model_base {
541
+ + llama_model_hrm_text(const struct llama_model_params & params) : llama_model_base(params) {}
542
+ + void load_arch_hparams(llama_model_loader & ml) override;
543
+ + void load_arch_tensors(llama_model_loader & ml) override;
544
+ +
545
+ + ggml_tensor * hrm_z_l_init = nullptr;
546
+ +
547
+ + struct graph : public llm_graph_context {
548
+ + graph(const llama_model & model, const llm_graph_params & params);
549
+ + };
550
+ +
551
+ + std::unique_ptr<llm_graph_context> build_arch_graph(const llm_graph_params & params) const override;
552
+ +};
553
+ +
554
+
555
+ struct llama_model_qwen3moe : public llama_model_base {
556
+ llama_model_qwen3moe(const struct llama_model_params & params) : llama_model_base(params) {}