EgoNormia-Cosmos-Reason2-2B-v7b-cot-mcq
MCQ-only SFT fine-tune of nvidia/Cosmos-Reason2-2B on the EgoNormia social norm benchmark. This v7b run trains only the 3 MCQ tasks, with glued long-form CoT traces inserted in <think> format.
Training
| Parameter | Value |
|---|---|
| Base model | nvidia/Cosmos-Reason2-2B (Qwen3-VL-2B) |
| Tasks | Action + Justification + Sensibility (MCQ-only) |
| Train samples | 4890 |
| Training file | data/egonormia_llava_v7_cot_mcq3_train.json |
| CoT style | Glued long CoT in <think> blocks |
| CoT length | mean 84.1 words, median 87 |
| Epochs | 6 |
| Global batch | 64 (8 replicas x 8 per replica) |
| Learning rate | 1e-5 (cosine decay, 3% warmup) |
| Context length | 8192 |
| Video input | 8 frames |
| Hardware | 8x GPU |
| Run dir | outputs/egonormia_sft_v7b_cot_mcq3_stepmatched_seed42/20260305030715/ |
| Uploaded checkpoint | step_90 / 456 total steps |
MCQ Evaluation (200 verified test samples)
No-think
| Checkpoint | Action | Justification | Both | S-IoU | Parse |
|---|---|---|---|---|---|
v7b step_90 |
82.0% | 90.0% | 75.0% | 0.585 | 97.5% |
Think mode
| Checkpoint | Action | Justification | Both | S-IoU | Parse |
|---|---|---|---|---|---|
v7b step_180 + think |
74.0% | 95.5% | 71.5% | 0.623 | 100.0% |
Notes
- v7b no-think has the best peak action accuracy in this repo family (82.0%), but does not pass the 77% joint-accuracy gate.
- Think mode fixes formatting and parse stability, but costs about 7.5-8 action points relative to the no-think best checkpoint.
- The main failure mode is prompt mismatch: training examples always include the
<think>formatting instruction, while no-think eval removes it. At some checkpoints the model drifts into free-form justification text and parse rate collapses. - Relative to v6b, long CoT traces are much less effective in think mode: v7b think peaks at 71.5% both, while v6b think reaches 77.5%.
Usage
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration
model = Qwen3VLForConditionalGeneration.from_pretrained(
"robertzty/EgoNormia-Cosmos-Reason2-2B-v7b-cot-mcq",
torch_dtype="bfloat16",
device_map="auto",
)
processor = AutoProcessor.from_pretrained("robertzty/EgoNormia-Cosmos-Reason2-2B-v7b-cot-mcq")
- Downloads last month
- 6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support