Module 0 Scene Encoder (v2)
Joint pretraining for movie-scene understanding with:
- supervised contrastive learning (temperature warmup + 80-negative effective batch)
- masked language modelling on annotation scene_text
- auxiliary scene-label prediction (SmoothL1 regression, label-smoothed classification)
Backbone
- base_model: microsoft/deberta-v3-base
- embedding_dim: 256
Dataset Summary
- all_scenes: 11118
- train_scenes: 9593
- val_scenes: 1525
- all_films: 80
- train_positive_avg_candidates: 6.0
- val_positive_avg_candidates: 6.0
- mlm_chunks: 15118
Best Validation Metrics
- val_loss: 3.6035
- val_contrastive_loss: 3.0713
- val_aux_loss: 0.8188
- val_r_at_1: 0.0766
- val_temperature: 0.1349
- avg_cls_acc: 0.5151
- avg_reg_mae: 0.9068
- avg_bin_acc: 0.7234
- val_task/emotional_valence: 1.1655
- val_task/scene_interaction_tone: 1.2965
- val_task/conflict_nature: 1.3281
- val_task/acoustic_space: 1.6019
- val_task/reality_layer: 0.3974
- val_task/score_dynamic_shape: 1.1894
- val_task/narrative_arc_position: 1.1239
- val_task/foreshadowing_type: 1.2793
- val_task/transition_type: 1.2792
- val_task/scene_tension_raw: 0.5380
- val_task/scene_arousal: 0.4447
- val_task/scene_valence_continuous: 0.4835
- val_task/pacing_intensity: 0.4516
- val_task/action_intensity: 0.5817
- val_task/emotional_shift_trigger: 0.5833
- val_task/emotion_tags: 0.4391
Intended Use
Use this encoder as the backbone for Module 1 by setting:
CFG["backbone"] = "/kaggle/working/module0_backbone"
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support