EgoNormia-Cosmos-Reason2-2B-v7b-cot-mcq

MCQ-only SFT fine-tune of nvidia/Cosmos-Reason2-2B on the EgoNormia social norm benchmark. This v7b run trains only the 3 MCQ tasks, with glued long-form CoT traces inserted in <think> format.

Training

Parameter Value
Base model nvidia/Cosmos-Reason2-2B (Qwen3-VL-2B)
Tasks Action + Justification + Sensibility (MCQ-only)
Train samples 4890
Training file data/egonormia_llava_v7_cot_mcq3_train.json
CoT style Glued long CoT in <think> blocks
CoT length mean 84.1 words, median 87
Epochs 6
Global batch 64 (8 replicas x 8 per replica)
Learning rate 1e-5 (cosine decay, 3% warmup)
Context length 8192
Video input 8 frames
Hardware 8x GPU
Run dir outputs/egonormia_sft_v7b_cot_mcq3_stepmatched_seed42/20260305030715/
Uploaded checkpoint step_90 / 456 total steps

MCQ Evaluation (200 verified test samples)

No-think

Checkpoint Action Justification Both S-IoU Parse
v7b step_90 82.0% 90.0% 75.0% 0.585 97.5%

Think mode

Checkpoint Action Justification Both S-IoU Parse
v7b step_180 + think 74.0% 95.5% 71.5% 0.623 100.0%

Notes

  • v7b no-think has the best peak action accuracy in this repo family (82.0%), but does not pass the 77% joint-accuracy gate.
  • Think mode fixes formatting and parse stability, but costs about 7.5-8 action points relative to the no-think best checkpoint.
  • The main failure mode is prompt mismatch: training examples always include the <think> formatting instruction, while no-think eval removes it. At some checkpoints the model drifts into free-form justification text and parse rate collapses.
  • Relative to v6b, long CoT traces are much less effective in think mode: v7b think peaks at 71.5% both, while v6b think reaches 77.5%.

Usage

from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

model = Qwen3VLForConditionalGeneration.from_pretrained(
    "robertzty/EgoNormia-Cosmos-Reason2-2B-v7b-cot-mcq",
    torch_dtype="bfloat16",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("robertzty/EgoNormia-Cosmos-Reason2-2B-v7b-cot-mcq")
Downloads last month
6
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for robertzty/EgoNormia-Cosmos-Reason2-2B-v7b-cot-mcq

Finetuned
(9)
this model