Module 0 Scene Encoder (v2)

Joint pretraining for movie-scene understanding with:

  • supervised contrastive learning (temperature warmup + 80-negative effective batch)
  • masked language modelling on annotation scene_text
  • auxiliary scene-label prediction (SmoothL1 regression, label-smoothed classification)

Backbone

  • base_model: microsoft/deberta-v3-base
  • embedding_dim: 256

Dataset Summary

  • all_scenes: 11118
  • train_scenes: 9593
  • val_scenes: 1525
  • all_films: 80
  • train_positive_avg_candidates: 6.0
  • val_positive_avg_candidates: 6.0
  • mlm_chunks: 15118

Best Validation Metrics

  • val_loss: 3.6035
  • val_contrastive_loss: 3.0713
  • val_aux_loss: 0.8188
  • val_r_at_1: 0.0766
  • val_temperature: 0.1349
  • avg_cls_acc: 0.5151
  • avg_reg_mae: 0.9068
  • avg_bin_acc: 0.7234
  • val_task/emotional_valence: 1.1655
  • val_task/scene_interaction_tone: 1.2965
  • val_task/conflict_nature: 1.3281
  • val_task/acoustic_space: 1.6019
  • val_task/reality_layer: 0.3974
  • val_task/score_dynamic_shape: 1.1894
  • val_task/narrative_arc_position: 1.1239
  • val_task/foreshadowing_type: 1.2793
  • val_task/transition_type: 1.2792
  • val_task/scene_tension_raw: 0.5380
  • val_task/scene_arousal: 0.4447
  • val_task/scene_valence_continuous: 0.4835
  • val_task/pacing_intensity: 0.4516
  • val_task/action_intensity: 0.5817
  • val_task/emotional_shift_trigger: 0.5833
  • val_task/emotion_tags: 0.4391

Intended Use

Use this encoder as the backbone for Module 1 by setting:

CFG["backbone"] = "/kaggle/working/module0_backbone"
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support