Instructions to use LLM-OS-Models/HRM-Text-Ko-Terminal-B-SWE-GLM-Pilot with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LLM-OS-Models/HRM-Text-Ko-Terminal-B-SWE-GLM-Pilot with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LLM-OS-Models/HRM-Text-Ko-Terminal-B-SWE-GLM-Pilot")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("LLM-OS-Models/HRM-Text-Ko-Terminal-B-SWE-GLM-Pilot", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use LLM-OS-Models/HRM-Text-Ko-Terminal-B-SWE-GLM-Pilot with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LLM-OS-Models/HRM-Text-Ko-Terminal-B-SWE-GLM-Pilot"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLM-OS-Models/HRM-Text-Ko-Terminal-B-SWE-GLM-Pilot",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/LLM-OS-Models/HRM-Text-Ko-Terminal-B-SWE-GLM-Pilot

SGLang

How to use LLM-OS-Models/HRM-Text-Ko-Terminal-B-SWE-GLM-Pilot with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LLM-OS-Models/HRM-Text-Ko-Terminal-B-SWE-GLM-Pilot" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLM-OS-Models/HRM-Text-Ko-Terminal-B-SWE-GLM-Pilot",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LLM-OS-Models/HRM-Text-Ko-Terminal-B-SWE-GLM-Pilot" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLM-OS-Models/HRM-Text-Ko-Terminal-B-SWE-GLM-Pilot",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use LLM-OS-Models/HRM-Text-Ko-Terminal-B-SWE-GLM-Pilot with Docker Model Runner:
```
docker model run hf.co/LLM-OS-Models/HRM-Text-Ko-Terminal-B-SWE-GLM-Pilot
```

gyung commited on 1 day ago

Commit

0d86d1b

verified ·

1 Parent(s): 25a637c

Add files using upload-large-folder tool

Browse files

Files changed (22) hide show

.gitattributes +9 -0
README.md +64 -0
all_config.yaml +40 -0
carry_epoch_1.0.pt +3 -0
carry_epoch_1.1.pt +3 -0
carry_epoch_1.2.pt +3 -0
carry_epoch_1.3.pt +3 -0
carry_epoch_1.4.pt +3 -0
carry_epoch_1.5.pt +3 -0
carry_epoch_1.6.pt +3 -0
carry_epoch_1.7.pt +3 -0
fsdp2_epoch_1/.metadata +3 -0
fsdp2_epoch_1/__0_0.distcp +3 -0
fsdp2_epoch_1/__1_0.distcp +3 -0
fsdp2_epoch_1/__2_0.distcp +3 -0
fsdp2_epoch_1/__3_0.distcp +3 -0
fsdp2_epoch_1/__4_0.distcp +3 -0
fsdp2_epoch_1/__5_0.distcp +3 -0
fsdp2_epoch_1/__6_0.distcp +3 -0
fsdp2_epoch_1/__7_0.distcp +3 -0
hrm_nocarry_bp_warmup.py +100 -0
train_metadata.yaml +13 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+fsdp2_epoch_1/.metadata filter=lfs diff=lfs merge=lfs -text
+fsdp2_epoch_1/__5_0.distcp filter=lfs diff=lfs merge=lfs -text
+fsdp2_epoch_1/__7_0.distcp filter=lfs diff=lfs merge=lfs -text
+fsdp2_epoch_1/__3_0.distcp filter=lfs diff=lfs merge=lfs -text
+fsdp2_epoch_1/__0_0.distcp filter=lfs diff=lfs merge=lfs -text
+fsdp2_epoch_1/__4_0.distcp filter=lfs diff=lfs merge=lfs -text
+fsdp2_epoch_1/__6_0.distcp filter=lfs diff=lfs merge=lfs -text
+fsdp2_epoch_1/__1_0.distcp filter=lfs diff=lfs merge=lfs -text
+fsdp2_epoch_1/__2_0.distcp filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,64 @@

+# HRM-Text Ko Terminal B SWE+GLM Pilot
+Date: 2026-05-23
+This repository contains a raw HRM-Text FSDP2 training checkpoint, not a
+Transformers-ready model. Convert it with `HRM-Text/conversion/convert_to_hf.py`
+after selecting a checkpoint for release.
+## Run
+| Item | Value |
+|---|---:|
+| Architecture | HRM-Text B |
+| Parameters | 435,159,040 |
+| GPUs | 8 x NVIDIA H200 |
+| Epochs | 1 |
+| Global batch | 262,144 tokens |
+| Context | 4,096 train tokens |
+| Wall time | about 7m 38s |
+| Final train loss | 3.00653 |
+| Final token accuracy | 0.46379 |
+Command:
+```bash
+WANDB_MODE=offline WANDB_DIR=/home/work/.data/wandb \
+TOKENIZERS_PARALLELISM=false OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 \
+NCCL_DEBUG=WARN TORCH_NCCL_ASYNC_ERROR_HANDLING=1 \
+torchrun --standalone --nproc_per_node=8 pretrain.py \
+  arch/size@arch=B \
+  data.path=/home/work/.data/hrm_text_prepared/sft_swe_glm_mix_v1 \
+  +checkpoint_path=/home/work/.data/hrm_text_checkpoints/koterm_b_swe_glm_pilot_v1 \
+  +project_name=HRM-Ko-Terminal \
+  +run_name=koterm_b_swe_glm_pilot_v1 \
+  epochs=1 \
+  global_batch_size=262144 \
+  lr_warmup_steps=100 \
+  +log_interval=5 \
+  checkpoint_interval=1
+```
+## Data
+Prepared dataset:
+`/home/work/.data/hrm_text_prepared/sft_swe_glm_mix_v1`
+| Source | Samples | Tokens | Processing |
+|---|---:|---:|---|
+| SWE-ZERO terminal/code trajectories | 53,868 | 182,717,999 | long instructions middle-truncated |
+| GLM-5.1 reasoning cleaned sample | 56,021 | 68,452,781 | `<think>...</think>` stripped, direct answers |
+| Total | 109,889 | 251,170,780 | PrefixLM, response-only loss |
+Tokenizer:
+`https://huggingface.co/LLM-OS-Models/HRM-Text-Ko-Terminal-Tokenizer-131K`
+## Files
+- `fsdp2_epoch_1/`: distributed FSDP2 model and optimizer checkpoint.
+- `carry_epoch_1.*.pt`: per-rank carry state.
+- `all_config.yaml`: resolved training config.
+- `train_metadata.yaml`: tokenizer and dataset metadata.
+- `hrm_nocarry_bp_warmup.py`: copied model source for reproducibility.

all_config.yaml ADDED Viewed

	@@ -0,0 +1,40 @@

+arch:
+  H_cycles: 2
+  H_override: {}
+  L_cycles: 3
+  bp_max_steps: 5
+  bp_warmup_ratio: 0.2
+  expansion: 4
+  half_layers: true
+  head: lm_head@LMHead
+  hidden_size: 1024
+  init_type: lecun_normal
+  n_layers: 12
+  name: baselines.hrm_nocarry_bp_warmup@HierarchicalReasoningModel
+  norm_eps: 1.0e-06
+  norm_type: pre
+  num_heads: 8
+  pos_emb_type: rope
+  rope_theta: 10000.0
+beta1: 0.9
+beta2: 0.95
+checkpoint_interval: 1
+checkpoint_path: /home/work/.data/hrm_text_checkpoints/koterm_b_swe_glm_pilot_v1
+data:
+  path: /home/work/.data/hrm_text_prepared/sft_swe_glm_mix_v1
+  target_only: true
+ema: 0.9999
+epochs: 1
+fwd_bwd_dtype: bfloat16
+global_batch_size: 262144
+log_interval: 5
+lr: 0.00022
+lr_min_ratio: 1.0
+lr_warmup_steps: 100
+project_name: HRM-Ko-Terminal
+resume_epoch: null
+resume_from: null
+run_name: koterm_b_swe_glm_pilot_v1
+seed: 0
+weight_decay: 0.1
+weights_only_resume_from_ema: false

carry_epoch_1.0.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4e3b089e35eacca121e8bc850c0fc138c4ef45a63cf9e370e2f852d6245db36b
+size 1309

carry_epoch_1.1.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:66952aadc5b6f1d9d38cd436e6dba2c3b6a487138c6960beb815672ddf699495
+size 1309

carry_epoch_1.2.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7a7da182d5d1dfe900b2018b6e4fe6d318c69f791b7a3c94c1727e2112a5f57d
+size 1309

carry_epoch_1.3.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:45156a7f9cef48f22d7a3c46c59d92394c3da40b2b596f2681cecece9156177e
+size 1309

carry_epoch_1.4.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:34356bd35b48ac6ce98742241b2c0e1c96147c36743415c7b0e432ae28f8bfc8
+size 1309

carry_epoch_1.5.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6f5934083e16382c0d96bab003d7f577ced0da026175aff5a4ad2aaf31c603f6
+size 1309

carry_epoch_1.6.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:93480251cc3a3285f6cea88b5f8b7c6d46672f04cacd0623127885ff4469e7d8
+size 1309

carry_epoch_1.7.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d2de90149405f8416a3f1c4bb6b69b52843f8a2f835c1b874bd13c13129fc3f5
+size 1309

fsdp2_epoch_1/.metadata ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:984cd5ca047ff6b6320306732b5cb74da526e625ed003ea674bffd0d9227368c
+size 377453

fsdp2_epoch_1/__0_0.distcp ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f991350c2018ff046cf9cb64e91e6cda0691dd9df1a4095b7c2a65e7773f625f
+size 870637105

fsdp2_epoch_1/__1_0.distcp ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:df90800179941bc59922b15a6365b7d1d00a567e1aeca89cc41deb6d6709a2fa
+size 870646096

fsdp2_epoch_1/__2_0.distcp ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0bc9f737b46b876c5ce2b8cfea94af87c0d62e11b4c44209cdafbab8874a7e49
+size 870646096

fsdp2_epoch_1/__3_0.distcp ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b3e15eb5a237bbd4ece55c0e81c958018d858f7736a6170def544c6d227c9446
+size 870645780

fsdp2_epoch_1/__4_0.distcp ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9ed8a0de5afb4be166fbd23ae8f1d6f268842b5c9bb39d2734e1b6b6c73a73a7
+size 870645780

fsdp2_epoch_1/__5_0.distcp ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5de517605507b4aba4e970c0b3ca88156bc32f0700eb9a6647ea597975899e6a
+size 870645780

fsdp2_epoch_1/__6_0.distcp ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:68acea8b343e1765c75d56903d948cab297479da6934af6c59bb7c6f2d145ef9
+size 870648468

fsdp2_epoch_1/__7_0.distcp ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4d357e127548f5f89b30289ce50f4ae3f1ed87ce732db5331b51a61bf00afc57
+size 870644519

hrm_nocarry_bp_warmup.py ADDED Viewed

	@@ -0,0 +1,100 @@

+from typing import Tuple, Dict, Any, Optional
+import torch
+from torch import nn
+from torch import Tensor
+from models.common import trunc_normal_init_
+from models.transformer import Transformer, Cache, TransformerConfig
+class HierarchicalReasoningModelConfig(TransformerConfig):
+    half_layers: bool = False
+    H_cycles: int
+    L_cycles: int
+    bp_warmup_ratio: float = 0.0
+    bp_min_steps: int = 2
+    bp_max_steps: int = 5
+    # Change some Transformer config of H-level
+    # TODO: Try asymmetric H and L module, such as different size, hidden dims, architecture, attention type, etc.
+    H_override: Dict[str, Any] = {}
+class HierarchicalReasoningModelRecurrentBlock(nn.Module):
+    def __init__(self, config: TransformerConfig) -> None:
+        super().__init__()
+        self.core = Transformer(config)
+        # Create cache function
+        self.create_cache = self.core.create_cache
+    def forward(self, hidden_states: Tensor, input_injection: Tensor, **kwargs) -> Tensor:
+        # Input injection (add)
+        # TODO: Try better alternatives, such as GRU / gating in the following papers
+        # Alternatively, "fixed" gating that does not depend on hidden state is also worth trying
+        # E.g. only depends on position and index of hidden_states dimension
+        # https://arxiv.org/pdf/1910.06764
+        # https://arxiv.org/pdf/2202.10447
+        # TODO: Asymmetric fusion is also worth trying. assign different number of tokens to H and L.
+        return self.core(hidden_states + input_injection, **kwargs)
+class HierarchicalReasoningModel(nn.Module):
+    def __init__(self, config_dict: dict) -> None:
+        super().__init__()
+        config = HierarchicalReasoningModelConfig(**config_dict)
+        if config.half_layers:
+            assert config.n_layers % 2 == 0, "n_layers must be divisible by 2."
+            config.n_layers //= 2
+        # Reasoning Layers
+        # TODO: Asymmetric.
+        self.H_level = HierarchicalReasoningModelRecurrentBlock(TransformerConfig(**(config.model_dump() | config.H_override)))
+        self.L_level = HierarchicalReasoningModelRecurrentBlock(config)
+        # Config
+        self.H_cycles = config.H_cycles
+        self.L_cycles = config.L_cycles
+        self.bp_warmup_ratio = config.bp_warmup_ratio
+        self.bp_min_steps = config.bp_min_steps
+        self.bp_max_steps = config.bp_max_steps
+        self.hidden_size = config.hidden_size
+        self.head_hint = self.H_level.core.head_hint  # Hint for LMHead init (inherit from H)
+        self.zL_init = nn.Buffer(trunc_normal_init_(torch.empty(config.hidden_size, dtype=torch.bfloat16), std=1.0), persistent=True)  # NOTE: hardcoded dtype.
+        # Create cache function
+        self.create_cache = lambda **kwargs: dict(H=[self.H_level.create_cache(**kwargs) for _i in range(self.H_cycles)],
+                                                  L=[self.L_level.create_cache(**kwargs) for _i in range(self.H_cycles * self.L_cycles)])
+    def forward(self, carry: None, x: torch.Tensor, cache: Optional[dict[str, list[list[Cache]]]] = None, bp_steps: int = 2, **seq_info) -> Tuple[None, torch.Tensor]:
+        z_H, z_L = x, self.zL_init
+        # Calculate H and L bp_steps
+        # Priortize H, and at least 1 is allocated to L.
+        H_bp_steps = min(self.H_cycles, bp_steps - 1)
+        L_bp_steps = bp_steps - H_bp_steps
+        for i in range(self.H_cycles):
+            for k in range(i * self.L_cycles, (i + 1) * self.L_cycles):
+                with torch.set_grad_enabled(torch.is_grad_enabled() and (k >= self.H_cycles * self.L_cycles - L_bp_steps)):
+                    z_L = self.L_level(z_L, z_H, **seq_info, cache=cache["L"][k] if cache is not None else None)
+            with torch.set_grad_enabled(torch.is_grad_enabled() and (i >= self.H_cycles - H_bp_steps)):
+                z_H = self.H_level(z_H, z_L, **seq_info, cache=cache["H"][i] if cache is not None else None)
+        return None, z_H
+    def compute_train_extra_args(self, train_state: Any) -> dict[str, Any]:
+        warmup_steps = train_state.total_steps * self.bp_warmup_ratio
+        progress = min(1.0, train_state.step / warmup_steps) if warmup_steps > 0 else 1.0
+        return dict(bp_steps=self.bp_min_steps + int(progress * (self.bp_max_steps - self.bp_min_steps)))
+    def initial_carry(self, batch_size: int, dtype: torch.dtype) -> None:
+        return None

train_metadata.yaml ADDED Viewed

	@@ -0,0 +1,13 @@

+max_seq_len: 4096
+tokenizer_info:
+  boq: <|im_start|>
+  condition_mapping:
+    cot: <|object_ref_end|>
+    direct: <|object_ref_start|>
+    noisy: <|quad_start|>
+    synth: <|quad_end|>
+  eoa: <|box_end|>
+  eoq: <|im_end|>
+  tokenizer_path: /home/work/.data/hrm_text_prepared/sft_swe_glm_mix_v1
+total_length: 251170780
+vocab_size: 131072