gyung commited on
Commit
b7daae3
ยท
verified ยท
1 Parent(s): 48e9e29

Add files using upload-large-folder tool

Browse files
Files changed (6) hide show
  1. .gitattributes +1 -0
  2. README.md +9 -8
  3. config.json +29 -0
  4. model.safetensors +3 -0
  5. tokenizer.json +3 -0
  6. tokenizer_config.json +8 -0
.gitattributes CHANGED
@@ -43,3 +43,4 @@ fsdp2_epoch_1/__1_0.distcp filter=lfs diff=lfs merge=lfs -text
43
  fsdp2_epoch_1/__6_0.distcp filter=lfs diff=lfs merge=lfs -text
44
  fsdp2_epoch_1/__3_0.distcp filter=lfs diff=lfs merge=lfs -text
45
  fsdp2_epoch_1/.metadata filter=lfs diff=lfs merge=lfs -text
 
 
43
  fsdp2_epoch_1/__6_0.distcp filter=lfs diff=lfs merge=lfs -text
44
  fsdp2_epoch_1/__3_0.distcp filter=lfs diff=lfs merge=lfs -text
45
  fsdp2_epoch_1/.metadata filter=lfs diff=lfs merge=lfs -text
46
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -17,7 +17,7 @@ pipeline_tag: text-generation
17
 
18
  `KoHRM-Text-1.4B`๋Š” `sapientinc/HRM-Text`์˜ PrefixLM ํ•™์Šต ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ, ํ•œ๊ตญ์–ด/์˜์–ด/์ฝ”๋”ฉ/ํ„ฐ๋ฏธ๋„/ํˆด์ฝœ ์‚ฌ์šฉ์„ฑ์„ ๋ชฉํ‘œ๋กœ scratch pretrainingํ•˜๋Š” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
19
 
20
- ์ด ์นด๋“œ๋Š” 2026-05-23 ๊ธฐ์ค€ ์ž‘์—… ์ค‘์ธ ๋ชจ๋ธ ์นด๋“œ ์ดˆ์•ˆ์ž…๋‹ˆ๋‹ค. ํ˜„์žฌ ์—…๋กœ๋“œ๋˜๋Š” epoch artifact๋Š” raw HRM-Text FSDP2 checkpoint์ด๋ฉฐ, ๋ฐ”๋กœ Transformers์—์„œ ๋กœ๋“œํ•˜๋Š” ์ตœ์ข… ๋ฐฐํฌ ํ˜•์‹์ด ์•„๋‹™๋‹ˆ๋‹ค.
21
 
22
  ## ๋ชจ๋ธ ์ •๋ณด
23
 
@@ -51,7 +51,7 @@ Tokenizer repo: `LLM-OS-Models/HRM-Text-Ko-Terminal-Tokenizer-131K`
51
 
52
  ## ํ•™์Šต ๋ฐ์ดํ„ฐ
53
 
54
- stage-0 ์ž…๋ ฅ์€ ์ „์ฒ˜๋ฆฌ ์™„๋ฃŒ๋œ 711.3M token mix์ž…๋‹ˆ๋‹ค.
55
 
56
  | ๋ฐ์ดํ„ฐ | token |
57
  |---|---:|
@@ -61,7 +61,7 @@ stage-0 ์ž…๋ ฅ์€ ์ „์ฒ˜๋ฆฌ ์™„๋ฃŒ๋œ 711.3M token mix์ž…๋‹ˆ๋‹ค.
61
  | ToolBench train tool-call task | 127.0M |
62
  | ํ•ฉ๊ณ„ | 711.3M |
63
 
64
- ์ดํ›„ stage๋Š” HRM cleaned ์›๋ณธ retokenized dataset, local terminal dataset, ์ถ”๊ฐ€ ํ•œ๊ตญ์–ด/์ฝ”๋”ฉ/ํˆด์ฝœ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ ์„ฑ๊ฒฉ์˜ `tb2_lite`, Terminal Bench 2, ToolBench eval, chi-bench๋Š” train์—์„œ ์ œ์™ธํ•ฉ๋‹ˆ๋‹ค.
65
 
66
  ## ํ•™์Šต ๋ฐฉ์‹
67
 
@@ -69,17 +69,18 @@ stage-0 ์ž…๋ ฅ์€ ์ „์ฒ˜๋ฆฌ ์™„๋ฃŒ๋œ 711.3M token mix์ž…๋‹ˆ๋‹ค.
69
  - Optimizer: HRM-Text upstream Adam-atan2
70
  - Context: 4096 tokens
71
  - Hardware: 8 x NVIDIA H200
72
- - Current stable global batch: 172,032 tokens
73
- - Checkpoint policy: epoch-level raw FSDP2 checkpoint upload
74
 
75
- ๋…ผ๋ฌธ ๊ธฐ๋ณธ global batch๋Š” 196,608 tokens์˜€์ง€๋งŒ, ์ด ๋ชจ๋ธ์€ vocab์ด 131,072๋กœ ์ปค์„œ final logits memory๊ฐ€ ๋” ํฝ๋‹ˆ๋‹ค. ์žฅ๊ธฐ run์—์„œ๋Š” OOM ์—ฌ์œ ๋ฅผ ์œ„ํ•ด 172,032 tokens๋ฅผ ๊ธฐ๋ณธ๊ฐ’์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
76
 
77
  Staged pretraining์—์„œ๋Š” checkpoint์˜ model/optimizer/EMA/carry๋ฅผ ์ด์–ด๋ฐ›๊ณ , `resume_step_offset`๊ณผ `total_steps_override`๋กœ LR schedule์„ ์ „์ฒด pretraining ๊ธฐ์ค€์— ๋งž์ถฅ๋‹ˆ๋‹ค. ์ฆ‰, ์ƒˆ ๋ฐ์ดํ„ฐ๊ฐ€ ์ค€๋น„๋  ๋•Œ๋งˆ๋‹ค ํ•™์Šต์„ ์žฌ์‹œ์ž‘ํ•˜๋˜ optimizer์™€ schedule์„ ๋Š์ง€ ์•Š๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์šด์šฉํ•ฉ๋‹ˆ๋‹ค.
78
 
79
  ## ํ˜„์žฌ ์ƒํƒœ
80
 
81
- - stage-0 training: in progress
82
- - HF upload: epoch checkpoint watcher active
 
83
  - final Transformers conversion: not yet produced
84
  - public benchmark score: not yet evaluated for this model
85
 
 
17
 
18
  `KoHRM-Text-1.4B`๋Š” `sapientinc/HRM-Text`์˜ PrefixLM ํ•™์Šต ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ, ํ•œ๊ตญ์–ด/์˜์–ด/์ฝ”๋”ฉ/ํ„ฐ๋ฏธ๋„/ํˆด์ฝœ ์‚ฌ์šฉ์„ฑ์„ ๋ชฉํ‘œ๋กœ scratch pretrainingํ•˜๋Š” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
19
 
20
+ ์ด ์นด๋“œ๋Š” 2026-05-23 ๊ธฐ์ค€ ์ž‘์—… ์ค‘์ธ ๋ชจ๋ธ ์นด๋“œ ์ดˆ์•ˆ์ž…๋‹ˆ๋‹ค. ํ˜„์žฌ ๋ฉ”์ธ artifact๋Š” `model.safetensors` ์•ˆ์ „ ํฌ๋งท ๋ณ€ํ™˜๋ณธ์ž…๋‹ˆ๋‹ค. raw HRM-Text FSDP2 checkpoint๋Š” ๋กœ์ปฌ ์žฌ๊ฐœ/๋ณต๊ตฌ์šฉ์ด๋ฉฐ, Hugging Face ๋ฉ”์ธ repo์—๋Š” unsafe scan ๊ฒฝ๊ณ ๋ฅผ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด ์˜ฌ๋ฆฌ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
21
 
22
  ## ๋ชจ๋ธ ์ •๋ณด
23
 
 
51
 
52
  ## ํ•™์Šต ๋ฐ์ดํ„ฐ
53
 
54
+ stage-0/stage0b ์ž…๋ ฅ์€ ์ „์ฒ˜๋ฆฌ ์™„๋ฃŒ๋œ 711.3M token mix์ž…๋‹ˆ๋‹ค.
55
 
56
  | ๋ฐ์ดํ„ฐ | token |
57
  |---|---:|
 
61
  | ToolBench train tool-call task | 127.0M |
62
  | ํ•ฉ๊ณ„ | 711.3M |
63
 
64
+ ํ˜„์žฌ stage-1์€ HRM cleaned fast-cap V1Dataset 14.55B tokens๋กœ ํ•™์Šต ์ค‘์ž…๋‹ˆ๋‹ค. ์ดํ›„ stage๋Š” local terminal dataset, ์ถ”๊ฐ€ ํ•œ๊ตญ์–ด/์ฝ”๋”ฉ/ํˆด์ฝœ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ ์„ฑ๊ฒฉ์˜ `tb2_lite`, Terminal Bench 2, ToolBench eval, chi-bench๋Š” train์—์„œ ์ œ์™ธํ•ฉ๋‹ˆ๋‹ค.
65
 
66
  ## ํ•™์Šต ๋ฐฉ์‹
67
 
 
69
  - Optimizer: HRM-Text upstream Adam-atan2
70
  - Context: 4096 tokens
71
  - Hardware: 8 x NVIDIA H200
72
+ - Current stage-1 global batch: 262,144 tokens
73
+ - Checkpoint policy: main repo์—๋Š” `model.safetensors` ๋ณ€ํ™˜๋ณธ ์—…๋กœ๋“œ, raw FSDP2๋Š” ๋กœ์ปฌ ๋ณด๊ด€
74
 
75
+ stage-1์€ 8 x H200์—์„œ `global_batch_size=262144`๋กœ ์‹คํ–‰ ์ค‘์ด๋ฉฐ, ๊ด€์ธก VRAM์€ GPU0 ์•ฝ 118GB, ๋‚˜๋จธ์ง€ ์•ฝ 116GB์ž…๋‹ˆ๋‹ค. ์•ˆ์ • ์†๋„๋Š” ์•ฝ `1.09-1.10 sec/step`, ์•ฝ 238k-240k tokens/sec์ž…๋‹ˆ๋‹ค. ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธฐ๋ฉด `196608` batch๋กœ ๋˜๋Œ๋ ค resumeํ•ฉ๋‹ˆ๋‹ค.
76
 
77
  Staged pretraining์—์„œ๋Š” checkpoint์˜ model/optimizer/EMA/carry๋ฅผ ์ด์–ด๋ฐ›๊ณ , `resume_step_offset`๊ณผ `total_steps_override`๋กœ LR schedule์„ ์ „์ฒด pretraining ๊ธฐ์ค€์— ๋งž์ถฅ๋‹ˆ๋‹ค. ์ฆ‰, ์ƒˆ ๋ฐ์ดํ„ฐ๊ฐ€ ์ค€๋น„๋  ๋•Œ๋งˆ๋‹ค ํ•™์Šต์„ ์žฌ์‹œ์ž‘ํ•˜๋˜ optimizer์™€ schedule์„ ๋Š์ง€ ์•Š๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์šด์šฉํ•ฉ๋‹ˆ๋‹ค.
78
 
79
  ## ํ˜„์žฌ ์ƒํƒœ
80
 
81
+ - stage-0/stage0b training: complete
82
+ - stage0b safetensors HF upload: complete
83
+ - stage-1 HRM fast-cap training: in progress
84
  - final Transformers conversion: not yet produced
85
  - public benchmark score: not yet evaluated for this model
86
 
config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "hrm_text",
3
+ "architectures": [
4
+ "HrmTextForCausalLM"
5
+ ],
6
+ "vocab_size": 131072,
7
+ "hidden_size": 1536,
8
+ "intermediate_size": 4096,
9
+ "num_hidden_layers": 32,
10
+ "num_attention_heads": 12,
11
+ "num_key_value_heads": 12,
12
+ "head_dim": 128,
13
+ "H_cycles": 2,
14
+ "L_cycles": 3,
15
+ "L_bp_steps": [
16
+ 0,
17
+ 3
18
+ ],
19
+ "max_position_embeddings": 4096,
20
+ "rms_norm_eps": 1e-06,
21
+ "rope_theta": 10000.0,
22
+ "tie_word_embeddings": false,
23
+ "initializer_range": 0.025515518153991442,
24
+ "embedding_scale": 39.191835884530846,
25
+ "prefix_lm": true,
26
+ "pad_token_id": 0,
27
+ "bos_token_id": 2,
28
+ "eos_token_id": 35
29
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dbbb870b21983eebac4215d1b613709e5cd6c45f7e2bf830ae2910037e5781c9
3
+ size 2768259784
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8f544a7ef438e3589b0448ca9532824cbcb2fa43e6ad36642781803490f7ffb
3
+ size 11458193
tokenizer_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "bos_token": "<|im_start|>",
4
+ "eos_token": "<|box_end|>",
5
+ "is_local": true,
6
+ "model_max_length": 1000000000000000019884624838656,
7
+ "tokenizer_class": "TokenizersBackend"
8
+ }