Surabaya Opinion Analysis
Collection
4 items • Updated
This repository contains a set of models for performing topic modeling on tweets from Surabaya.
This repository contains two key components:
fasttext.model: A gensim FastText model trained on processed tweet text. It is used to generate semantic vector embeddings for documents.kmeans.joblib: A scikit-learn K-Means model trained on the vectors produced by the FastText model. It contains the final topic cluster centroids.Load the models using gensim, joblib, and huggingface_hub.
import joblib
from gensim.models import FastText
from huggingface_hub import hf_hub_download
# Download and load the models
REPO_ID = "Kiuyha/surabaya-opinion-tweet-clusters"
kmeans_path = hf_hub_download(repo_id=REPO_ID, filename="kmeans.joblib")
hf_hub_download(repo_id=REPO_ID, filename="fasttext.model.wv.vectors_ngrams.npy")
fasttext_path = hf_hub_download(repo_id=REPO_ID, filename="fasttext.model")
kmeans_model = joblib.load(kmeans_path)
fasttext_model = FastText.load(fasttext_path)
print(f"K-Means model loaded with {kmeans_model.n_clusters} clusters.")
print(f"FastText model loaded with vector size {fasttext_model.vector_size}.")
# You can now use these models for inference.