Text Generation
Transformers
Safetensors
English
hrm_text
hrm
hierarchical-reasoning
prefix-lm
pre-alignment
non-chat
non-instruction-tuned
Instructions to use sapientinc/HRM-Text-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sapientinc/HRM-Text-1B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="sapientinc/HRM-Text-1B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("sapientinc/HRM-Text-1B") model = AutoModelForCausalLM.from_pretrained("sapientinc/HRM-Text-1B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use sapientinc/HRM-Text-1B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sapientinc/HRM-Text-1B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sapientinc/HRM-Text-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/sapientinc/HRM-Text-1B
- SGLang
How to use sapientinc/HRM-Text-1B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "sapientinc/HRM-Text-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sapientinc/HRM-Text-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "sapientinc/HRM-Text-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sapientinc/HRM-Text-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use sapientinc/HRM-Text-1B with Docker Model Runner:
docker model run hf.co/sapientinc/HRM-Text-1B
Update README.md
Browse files
README.md
CHANGED
|
@@ -8,7 +8,9 @@ tags:
|
|
| 8 |
- hrm
|
| 9 |
- hierarchical-reasoning
|
| 10 |
- prefix-lm
|
| 11 |
-
-
|
|
|
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |

|
|
@@ -21,20 +23,20 @@ tags:
|
|
| 21 |
|
| 22 |
# HRM-Text-1B
|
| 23 |
|
| 24 |
-
A 1 B-parameter
|
| 25 |
|
| 26 |
-
HRM is a dual-timescale recurrent architecture: two Transformer modules (H = high-level / slow, L = low-level / fast) iterate over the same input embeddings for `H_cycles Γ L_cycles` steps, with additive state injection (`z_L + z_H`). This gives effectively unbounded compute depth at bounded parameter count.
|
| 27 |
|
| 28 |
## Disclaimer
|
| 29 |
|
| 30 |
-
This is a **
|
| 31 |
|
| 32 |
-
Practical guidance for prompting the raw
|
| 33 |
|
| 34 |
- **NLP tasks (classification, extraction, structured output, short-form QA)**: use the `direct` condition with 2β8 few-shot in-context examples. `direct` + few-shot is the strongest zero-extra-training setup we have measured; pure zero-shot is noticeably weaker.
|
| 35 |
- **Reasoning / math / open-ended generation**: use the **composite condition** `synth,cot`. This is *one* composite prefix, not two alternatives β at tokenization time the comma-separated tags are mapped to their prefix tokens and concatenated, in order, into a single prefix block. So `synth,cot` produces the two-token prefix `<|quad_end|><|object_ref_end|>` (synth first, then cot), wrapped in the usual `<|im_start|>` β¦ `<|im_end|>` envelope. Under this composite the model exhibits some chain-of-thought / instruct-like behavior β enough to answer many zero-shot math and reasoning prompts in a step-by-step style β but quality is uneven and below an instruction-tuned model of comparable size. Treat this "instruct" ability as a side effect of the pre-training mix, not a guaranteed capability.
|
| 36 |
|
| 37 |
-
The four single tags and their
|
| 38 |
|
| 39 |
- `direct` β `<|object_ref_start|>` β direct answer, no CoT
|
| 40 |
- `cot` β `<|object_ref_end|>` β chain-of-thought
|
|
@@ -43,7 +45,7 @@ The four single tags and their prefix tokens (for reference; you can compose any
|
|
| 43 |
|
| 44 |
## Requirements
|
| 45 |
|
| 46 |
-
|
| 47 |
|
| 48 |
```bash
|
| 49 |
pip install --upgrade "git+https://github.com/huggingface/transformers.git@main"
|
|
|
|
| 8 |
- hrm
|
| 9 |
- hierarchical-reasoning
|
| 10 |
- prefix-lm
|
| 11 |
+
- pre-alignment
|
| 12 |
+
- non-chat
|
| 13 |
+
- non-instruction-tuned
|
| 14 |
---
|
| 15 |
|
| 16 |

|
|
|
|
| 23 |
|
| 24 |
# HRM-Text-1B
|
| 25 |
|
| 26 |
+
A 1 B-parameter language model checkpoint built on the **Hierarchical Reasoning Model (HRM)** architecture, trained from scratch on a curated text corpus by Sapient Intelligence.
|
| 27 |
|
| 28 |
+
HRM is a dual-timescale recurrent architecture: two Transformer modules (H = high-level / slow, L = low-level / fast) iterate over the same input embeddings for `H_cycles Γ (L_cycles + 1)` steps, with additive state injection (`z_L + z_H`). This gives effectively unbounded compute depth at bounded parameter count.
|
| 29 |
|
| 30 |
## Disclaimer
|
| 31 |
|
| 32 |
+
This is a **pre-alignment** model checkpoint, not a chat or instruction-following assistant. It is pre-trained on a PrefixLM objective with condition prefix tokens and has **not** been multi-turn dialogue tuned, long-context adapted, instruction-tuned, RLHF-trained, or otherwise aligned for assistant-style use. If you want to use HRM-Text like a chat model, you should perform further alignment, such as SFT and/or RL, on task-specific data. This checkpoint is meant as a starting point, not a finished assistant.
|
| 33 |
|
| 34 |
+
Practical guidance for prompting the raw checkpoint:
|
| 35 |
|
| 36 |
- **NLP tasks (classification, extraction, structured output, short-form QA)**: use the `direct` condition with 2β8 few-shot in-context examples. `direct` + few-shot is the strongest zero-extra-training setup we have measured; pure zero-shot is noticeably weaker.
|
| 37 |
- **Reasoning / math / open-ended generation**: use the **composite condition** `synth,cot`. This is *one* composite prefix, not two alternatives β at tokenization time the comma-separated tags are mapped to their prefix tokens and concatenated, in order, into a single prefix block. So `synth,cot` produces the two-token prefix `<|quad_end|><|object_ref_end|>` (synth first, then cot), wrapped in the usual `<|im_start|>` β¦ `<|im_end|>` envelope. Under this composite the model exhibits some chain-of-thought / instruct-like behavior β enough to answer many zero-shot math and reasoning prompts in a step-by-step style β but quality is uneven and below an instruction-tuned model of comparable size. Treat this "instruct" ability as a side effect of the pre-training mix, not a guaranteed capability.
|
| 38 |
|
| 39 |
+
The four single condition tags and their assigned tokenizer special tokens (token names are legacy implementation details; you can compose any subset, comma-separated, in the order you want them emitted):
|
| 40 |
|
| 41 |
- `direct` β `<|object_ref_start|>` β direct answer, no CoT
|
| 42 |
- `cot` β `<|object_ref_end|>` β chain-of-thought
|
|
|
|
| 45 |
|
| 46 |
## Requirements
|
| 47 |
|
| 48 |
+
Use a Transformers build that includes the `hrm_text` model class. If your installed release does not include it yet, install Transformers directly from the upstream `main` branch:
|
| 49 |
|
| 50 |
```bash
|
| 51 |
pip install --upgrade "git+https://github.com/huggingface/transformers.git@main"
|