sapientinc
/

HRM-Text-1B

@@ -8,7 +8,9 @@ tags:
 - hrm
 - hierarchical-reasoning
 - prefix-lm
-- base-model
 ---
 ![HRM-Text banner](banner.jpg)
@@ -21,20 +23,20 @@ tags:
 # HRM-Text-1B
-A 1 B-parameter base language model built on the **Hierarchical Reasoning Model (HRM)** architecture, trained from scratch on a curated text corpus by Sapient Intelligence.
-HRM is a dual-timescale recurrent architecture: two Transformer modules (H = high-level / slow, L = low-level / fast) iterate over the same input embeddings for `H_cycles × L_cycles` steps, with additive state injection (`z_L + z_H`). This gives effectively unbounded compute depth at bounded parameter count.
 ## Disclaimer
-This is a **base** model. It is pre-trained on a PrefixLM objective with condition prefix tokens and has **not** been instruction-tuned, RLHF'd, or otherwise post-trained. For any serious downstream use we recommend post-training (SFT and/or RL) on task-specific data; the base checkpoint is meant as a starting point, not a finished assistant.
-Practical guidance for prompting the raw base model:
 - **NLP tasks (classification, extraction, structured output, short-form QA)**: use the `direct` condition with 2–8 few-shot in-context examples. `direct` + few-shot is the strongest zero-extra-training setup we have measured; pure zero-shot is noticeably weaker.
 - **Reasoning / math / open-ended generation**: use the **composite condition** `synth,cot`. This is *one* composite prefix, not two alternatives — at tokenization time the comma-separated tags are mapped to their prefix tokens and concatenated, in order, into a single prefix block. So `synth,cot` produces the two-token prefix `<|quad_end|><|object_ref_end|>` (synth first, then cot), wrapped in the usual `<|im_start|>` … `<|im_end|>` envelope. Under this composite the model exhibits some chain-of-thought / instruct-like behavior — enough to answer many zero-shot math and reasoning prompts in a step-by-step style — but quality is uneven and below an instruction-tuned model of comparable size. Treat this "instruct" ability as a side effect of the pre-training mix, not a guaranteed capability.
-The four single tags and their prefix tokens (for reference; you can compose any subset, comma-separated, in the order you want them emitted):
 - `direct` → `<|object_ref_start|>` — direct answer, no CoT
 - `cot` → `<|object_ref_end|>` — chain-of-thought
@@ -43,7 +45,7 @@ The four single tags and their prefix tokens (for reference; you can compose any
 ## Requirements
-The `hrm_text` model class has been merged into Transformers `main`. The PyPI release containing it may still be in flight; until then, install Transformers directly from the upstream `main` branch:
 ```bash
 pip install --upgrade "git+https://github.com/huggingface/transformers.git@main"

 - hrm
 - hierarchical-reasoning
 - prefix-lm
+- pre-alignment
+- non-chat
+- non-instruction-tuned
 ---
 ![HRM-Text banner](banner.jpg)
 # HRM-Text-1B
+A 1 B-parameter language model checkpoint built on the **Hierarchical Reasoning Model (HRM)** architecture, trained from scratch on a curated text corpus by Sapient Intelligence.
+HRM is a dual-timescale recurrent architecture: two Transformer modules (H = high-level / slow, L = low-level / fast) iterate over the same input embeddings for `H_cycles × (L_cycles + 1)` steps, with additive state injection (`z_L + z_H`). This gives effectively unbounded compute depth at bounded parameter count.
 ## Disclaimer
+This is a **pre-alignment** model checkpoint, not a chat or instruction-following assistant. It is pre-trained on a PrefixLM objective with condition prefix tokens and has **not** been multi-turn dialogue tuned, long-context adapted, instruction-tuned, RLHF-trained, or otherwise aligned for assistant-style use. If you want to use HRM-Text like a chat model, you should perform further alignment, such as SFT and/or RL, on task-specific data. This checkpoint is meant as a starting point, not a finished assistant.
+Practical guidance for prompting the raw checkpoint:
 - **NLP tasks (classification, extraction, structured output, short-form QA)**: use the `direct` condition with 2–8 few-shot in-context examples. `direct` + few-shot is the strongest zero-extra-training setup we have measured; pure zero-shot is noticeably weaker.
 - **Reasoning / math / open-ended generation**: use the **composite condition** `synth,cot`. This is *one* composite prefix, not two alternatives — at tokenization time the comma-separated tags are mapped to their prefix tokens and concatenated, in order, into a single prefix block. So `synth,cot` produces the two-token prefix `<|quad_end|><|object_ref_end|>` (synth first, then cot), wrapped in the usual `<|im_start|>` … `<|im_end|>` envelope. Under this composite the model exhibits some chain-of-thought / instruct-like behavior — enough to answer many zero-shot math and reasoning prompts in a step-by-step style — but quality is uneven and below an instruction-tuned model of comparable size. Treat this "instruct" ability as a side effect of the pre-training mix, not a guaranteed capability.
+The four single condition tags and their assigned tokenizer special tokens (token names are legacy implementation details; you can compose any subset, comma-separated, in the order you want them emitted):
 - `direct` → `<|object_ref_start|>` — direct answer, no CoT
 - `cot` → `<|object_ref_end|>` — chain-of-thought
 ## Requirements
+Use a Transformers build that includes the `hrm_text` model class. If your installed release does not include it yet, install Transformers directly from the upstream `main` branch:
 ```bash
 pip install --upgrade "git+https://github.com/huggingface/transformers.git@main"