Cinematic Music Descriptor — Module 1 – Local Scene Encoder

RoBERTa-base finetuned to encode individual movie script scenes into 768-dim vectors, with multi-task heads for scene-level cinematic attributes.

Label Schema

Classification

emotional_valence: 4 classes
conflict_nature: 6 classes
acoustic_space: 6 classes
reality_layer: 5 classes

Regression

pacing_intensity: 1–10
scene_arousal: 0.0–1.0

Training Details

Base model: roberta-base
Dataset: ~11,000 scenes from 60–80 movies
Framework: PyTorch + HuggingFace Transformers
Logging: Weights & Biases

Usage

import torch
from huggingface_hub import hf_hub_download
import config as C

# Download weights  (repo ID is built from HF_REPO_ID in config.py)
# e.g. "suyashnpande/cinematic-music-descriptor-v2-module3"
repo_id = f"{C.HF_REPO_ID}-module3"
path = hf_hub_download(repo_id=repo_id,
                       filename="module3_final.pt",
                       token=C.HF_READ_TOKEN)

Citation

If you use this model, please cite the project.

Downloads last month: 118

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support