gyung commited on
Commit
8235e2e
ยท
verified ยท
1 Parent(s): 4fd3d07

Update model card with corrected TB2-lite evaluation

Browse files
Files changed (1) hide show
  1. README.md +130 -1
README.md CHANGED
@@ -20,9 +20,138 @@ base_model: LiquidAI/LFM2-2.6B
20
 
21
  - Base model: `LiquidAI/LFM2-2.6B`
22
  - Training setup: `2 epochs, Unsloth SFT`
23
- - Evaluation snapshot: `2026-05-08 16:04:00 UTC`
24
  - Evaluation result id: `lfm2_2p6b_sft_unsloth_e2`
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ## ํ‰๊ฐ€ ๊ฒฐ๊ณผ
27
 
28
  ํ‰๊ฐ€๋Š” corrected TB2-lite replay set์—์„œ vLLM์œผ๋กœ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ˆœ์œ„ ์ ์ˆ˜๋Š” `100 * avg_command_f1`๋งŒ ์‚ฌ์šฉํ•˜๊ณ , `first_cmd_exact_pct`๋Š” ๋ณด์กฐ ์ง€ํ‘œ๋กœ๋งŒ ๋ด…๋‹ˆ๋‹ค.
 
20
 
21
  - Base model: `LiquidAI/LFM2-2.6B`
22
  - Training setup: `2 epochs, Unsloth SFT`
23
+ - Evaluation snapshot: `2026-05-08 16:08:38 UTC`
24
  - Evaluation result id: `lfm2_2p6b_sft_unsloth_e2`
25
 
26
+ ## Quickstart
27
+
28
+ ์„ค์น˜์™€ ๋กœ๊ทธ์ธ:
29
+
30
+ ```bash
31
+ pip install -U vllm transformers huggingface_hub
32
+ huggingface-cli login
33
+ ```
34
+
35
+ ๊ด€๋ จ ์ฝ”๋“œ:
36
+
37
+ - GitHub: https://github.com/LLM-OS-Models/Terminal
38
+ - vLLM ํ‰๊ฐ€ ์‹คํ–‰: `tb2_lite/scripts/replay_eval.py`
39
+ - chat template/fallback ์ƒ์„ฑ: `tb2_lite/scripts/prompt_builder.py`
40
+ - JSON/command ์ฑ„์ : `tb2_lite/scripts/replay_metrics.py`
41
+
42
+ vLLM ์ง์ ‘ ์‹คํ–‰ ์˜ˆ์‹œ. ํ‰๊ฐ€ ์ฝ”๋“œ์™€ ๋™์ผํ•˜๊ฒŒ chat template์„ ์šฐ์„  ์‚ฌ์šฉํ•˜๊ณ , template์ด ์—†์œผ๋ฉด ChatML/Gemma fallback์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
43
+
44
+ ```python
45
+ from transformers import AutoTokenizer
46
+ from vllm import LLM, SamplingParams
47
+
48
+ model_id = "LLM-OS-Models/LFM2-2.6B-Terminal-SFT-2Epoch-Unsloth"
49
+ tp = 1
50
+
51
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
52
+ llm = LLM(
53
+ model=model_id,
54
+ tokenizer=model_id,
55
+ trust_remote_code=True,
56
+ dtype="bfloat16",
57
+ tensor_parallel_size=tp,
58
+ max_model_len=49152,
59
+ gpu_memory_utilization=0.92,
60
+ )
61
+
62
+ messages = [
63
+ {"role": "system", "content": "You are a terminal automation assistant. Return JSON only."},
64
+ {"role": "user", "content": "Inspect the current directory and list Python files."},
65
+ ]
66
+
67
+ def render_chatml(messages):
68
+ parts = []
69
+ for message in messages:
70
+ role = "assistant" if message["role"] == "assistant" else message["role"]
71
+ if role == "tool":
72
+ role = "user"
73
+ parts.append(f"<|im_start|>{role}\n{message['content']}<|im_end|>\n")
74
+ parts.append("<|im_start|>assistant\n")
75
+ return "".join(parts)
76
+
77
+ def render_gemma4_turn(messages, empty_thought_channel=False):
78
+ parts = ["<bos>"]
79
+ for message in messages:
80
+ role = "model" if message["role"] == "assistant" else message["role"]
81
+ if role == "tool":
82
+ role = "user"
83
+ parts.append(f"<|turn>{role}\n{message['content'].strip()}<turn|>\n")
84
+ parts.append("<|turn>model\n")
85
+ if empty_thought_channel:
86
+ parts.append("<|channel>thought\n<channel|>")
87
+ return "".join(parts)
88
+
89
+ def render_prompt(model_id, tokenizer, messages):
90
+ model_key = model_id.lower()
91
+ if "gemma-4" in model_key:
92
+ try:
93
+ return tokenizer.apply_chat_template(
94
+ messages,
95
+ tokenize=False,
96
+ add_generation_prompt=True,
97
+ enable_thinking=False,
98
+ )
99
+ except Exception:
100
+ return render_gemma4_turn(
101
+ messages,
102
+ empty_thought_channel=("26b" in model_key or "31b" in model_key),
103
+ )
104
+ try:
105
+ return tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
106
+ except Exception:
107
+ return render_chatml(messages)
108
+
109
+ prompt = render_prompt(model_id, tokenizer, messages)
110
+ sampling = SamplingParams(
111
+ temperature=0.0,
112
+ top_p=1.0,
113
+ max_tokens=1024,
114
+ repetition_penalty=1.0,
115
+ )
116
+ outputs = llm.generate([prompt], sampling_params=sampling)
117
+ print(outputs[0].outputs[0].text)
118
+ ```
119
+
120
+ ๊ถŒ์žฅ ์ถœ๋ ฅ ํ˜•์‹:
121
+
122
+ ```json
123
+ {
124
+ "analysis": "brief reasoning about the next terminal action",
125
+ "plan": "short execution plan",
126
+ "commands": [
127
+ {"keystrokes": "ls -la\n", "duration": 0.1}
128
+ ],
129
+ "task_complete": false
130
+ }
131
+ ```
132
+
133
+ ํ‰๊ฐ€์™€ ๋™์ผํ•œ replay ๋ช…๋ น:
134
+
135
+ ```bash
136
+ python tb2_lite/scripts/replay_eval.py \
137
+ --model LLM-OS-Models/LFM2-2.6B-Terminal-SFT-2Epoch-Unsloth \
138
+ --model-short lfm2_2p6b_sft_unsloth_e2 \
139
+ --eval-path tb2_lite/data/replay_full.jsonl \
140
+ --output-dir /home/work/.data/tb2_lite_eval/corrected_readme_models_vllm \
141
+ --dtype bfloat16 \
142
+ --tp 1 \
143
+ --max-model-len 49152 \
144
+ --max-tokens 1024 \
145
+ --temperature 0.0 \
146
+ --top-p 1.0 \
147
+ --gpu-memory-utilization 0.92 \
148
+ --language-model-only
149
+ ```
150
+
151
+ - ๊ธฐ๋ณธ ๊ถŒ์žฅ tensor parallel: `1`. OOM์ด๋ฉด `--tp`์™€ `tensor_parallel_size`๋ฅผ 2/4/8๋กœ ์˜ฌ๋ฆฌ์„ธ์š”.
152
+ - corrected TB2-lite ํ‰๊ฐ€๋Š” `temperature=0.0`, `top_p=1.0`, `max_tokens=1024`๋กœ ๊ณ ์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.
153
+ - Gemma 4๋Š” JSON ์ถœ๋ ฅ์„ ์œ„ํ•ด `enable_thinking=False`๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , 26B/31B ๊ณ„์—ด์€ ํ‰๊ฐ€ ์ฝ”๋“œ์—์„œ empty thought channel ์ฒ˜๋ฆฌ๋ฅผ ์ž๋™ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
154
+
155
  ## ํ‰๊ฐ€ ๊ฒฐ๊ณผ
156
 
157
  ํ‰๊ฐ€๋Š” corrected TB2-lite replay set์—์„œ vLLM์œผ๋กœ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ˆœ์œ„ ์ ์ˆ˜๋Š” `100 * avg_command_f1`๋งŒ ์‚ฌ์šฉํ•˜๊ณ , `first_cmd_exact_pct`๋Š” ๋ณด์กฐ ์ง€ํ‘œ๋กœ๋งŒ ๋ด…๋‹ˆ๋‹ค.