gyung commited on
Commit
195e494
·
verified ·
1 Parent(s): eb9d417

Update model card with corrected TB2-lite evaluation

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -20,7 +20,7 @@ base_model: google/gemma-4-E2B
20
 
21
  - Base model: `google/gemma-4-E2B`
22
  - Training setup: `2 epochs, Gemma native Liquid preprocessing`
23
- - Evaluation snapshot: `2026-05-09 00:58:37 UTC`
24
  - Evaluation result id: `gemma4_e2b_base_native_e2`
25
 
26
  ## Quickstart
@@ -159,7 +159,7 @@ python tb2_lite/scripts/replay_eval.py \
159
 
160
  평가는 corrected TB2-lite replay set에서 vLLM으로 수행했습니다. 순위 점수는 `100 * avg_command_f1`만 사용하고, `first_cmd_exact_pct`는 보조 지표로만 봅니다.
161
 
162
- - Rank: `4 / 6`
163
  - Score: `16.22`
164
  - Command F1: `0.1622`
165
  - Command precision: `0.2747`
@@ -172,7 +172,7 @@ python tb2_lite/scripts/replay_eval.py \
172
  - Template status: `model_specific_or_mixed`
173
  - Rank eligible: `True`
174
  - Eval timestamp: `2026-05-09T00:56:16.652395`
175
- - 현재 집계된 평가 결과 수: `6`
176
 
177
  Prompt/template audit:
178
 
 
20
 
21
  - Base model: `google/gemma-4-E2B`
22
  - Training setup: `2 epochs, Gemma native Liquid preprocessing`
23
+ - Evaluation snapshot: `2026-05-10 13:03:28 UTC`
24
  - Evaluation result id: `gemma4_e2b_base_native_e2`
25
 
26
  ## Quickstart
 
159
 
160
  평가는 corrected TB2-lite replay set에서 vLLM으로 수행했습니다. 순위 점수는 `100 * avg_command_f1`만 사용하고, `first_cmd_exact_pct`는 보조 지표로만 봅니다.
161
 
162
+ - Rank: `6 / 8`
163
  - Score: `16.22`
164
  - Command F1: `0.1622`
165
  - Command precision: `0.2747`
 
172
  - Template status: `model_specific_or_mixed`
173
  - Rank eligible: `True`
174
  - Eval timestamp: `2026-05-09T00:56:16.652395`
175
+ - 현재 집계된 평가 결과 수: `8`
176
 
177
  Prompt/template audit:
178