RLAC - Radio Live A La Carte
Collection
A collection of models for automated segmentation of live radio broadcasts, leveraging both raw audio signals and text transcriptions to identify show • 3 items • Updated
This model uses a sophisticated Hybrid Deep Learning approach to detect radio chronicle segments from textual transcriptions (SRT files). It combines the semantic power of CamemBERT with the sequential modeling of a Bi-LSTM + CRF architecture.
Hugging Face link: eglantinefonrose/rlac-audiotranscript-segmenter-chroniques-hybrid
This approach relies on two inseparable components that must be used together:
radio_chronique_hybrid_base.pkl)
This file is a Python object (serialized with joblib) that manages the transformation of raw text into rich numerical vectors.
scaler and tfidf_vectorizer used during training.radio_chronique_hybrid_hybrid.pt)
This is a PyTorch model that takes the features prepared by the base extractor to make the final sequence-aware decision.
Both files are loaded and used by the predict.py script:
from train import RadioChroniqueClassifier, HybridSequenceClassifier
base_extractor = RadioChroniqueClassifier.load_model("models/radio_chronique_hybrid_base.pkl")
hybrid_model = HybridSequenceClassifier.load("models/radio_chronique_hybrid_hybrid.pt")
Maintained by eglantinefonrose.