Cinematic Music Descriptor β€” Module 1 – Local Scene Encoder

RoBERTa-base finetuned to encode individual movie script scenes into 768-dim vectors, with multi-task heads for scene-level cinematic attributes.

Label Schema

Classification

  • emotional_valence: 4 classes
  • conflict_nature: 6 classes
  • acoustic_space: 6 classes
  • reality_layer: 5 classes

Regression

  • pacing_intensity: 1–10
  • scene_arousal: 0.0–1.0

Training Details

  • Base model: roberta-base
  • Dataset: ~11,000 scenes from 60–80 movies
  • Framework: PyTorch + HuggingFace Transformers
  • Logging: Weights & Biases

Usage

import torch
from huggingface_hub import hf_hub_download
import config as C

# Download weights  (repo ID is built from HF_REPO_ID in config.py)
# e.g. "suyashnpande/cinematic-music-descriptor-v2-module3"
repo_id = f"{C.HF_REPO_ID}-module3"
path = hf_hub_download(repo_id=repo_id,
                       filename="module3_final.pt",
                       token=C.HF_READ_TOKEN)

Citation

If you use this model, please cite the project.

Downloads last month
118
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support