EgoNormia-Cosmos-Reason2-2B-v7b-cot-mcq

MCQ-only SFT fine-tune of nvidia/Cosmos-Reason2-2B on the EgoNormia social norm benchmark. This v7b run trains only the 3 MCQ tasks, with glued long-form CoT traces inserted in <think> format.

Training

Parameter	Value
Base model	nvidia/Cosmos-Reason2-2B (Qwen3-VL-2B)
Tasks	Action + Justification + Sensibility (MCQ-only)
Train samples	4890
Training file	`data/egonormia_llava_v7_cot_mcq3_train.json`
CoT style	Glued long CoT in `<think>` blocks
CoT length	mean 84.1 words, median 87
Epochs	6
Global batch	64 (8 replicas x 8 per replica)
Learning rate	1e-5 (cosine decay, 3% warmup)
Context length	8192
Video input	8 frames
Hardware	8x GPU
Run dir	`outputs/egonormia_sft_v7b_cot_mcq3_stepmatched_seed42/20260305030715/`
Uploaded checkpoint	`step_90` / 456 total steps

MCQ Evaluation (200 verified test samples)

No-think

Checkpoint	Action	Justification	Both	S-IoU	Parse
v7b `step_90`	82.0%	90.0%	75.0%	0.585	97.5%

Think mode

Checkpoint	Action	Justification	Both	S-IoU	Parse
v7b `step_180` + think	74.0%	95.5%	71.5%	0.623	100.0%

Notes

v7b no-think has the best peak action accuracy in this repo family (82.0%), but does not pass the 77% joint-accuracy gate.
Think mode fixes formatting and parse stability, but costs about 7.5-8 action points relative to the no-think best checkpoint.
The main failure mode is prompt mismatch: training examples always include the <think> formatting instruction, while no-think eval removes it. At some checkpoints the model drifts into free-form justification text and parse rate collapses.
Relative to v6b, long CoT traces are much less effective in think mode: v7b think peaks at 71.5% both, while v6b think reaches 77.5%.

Usage

from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

model = Qwen3VLForConditionalGeneration.from_pretrained(
    "robertzty/EgoNormia-Cosmos-Reason2-2B-v7b-cot-mcq",
    torch_dtype="bfloat16",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("robertzty/EgoNormia-Cosmos-Reason2-2B-v7b-cot-mcq")

Downloads last month: 6

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for robertzty/EgoNormia-Cosmos-Reason2-2B-v7b-cot-mcq

Base model

Qwen/Qwen3-VL-2B-Instruct

Finetuned

nvidia/Cosmos-Reason2-2B

Finetuned

(9)

this model