Topic Models for Surabaya Tweet Analysis

This repository contains a set of models for performing topic modeling on tweets from Surabaya.

Models Included

This repository contains two key components:

  1. fasttext.model: A gensim FastText model trained on processed tweet text. It is used to generate semantic vector embeddings for documents.
  2. kmeans.joblib: A scikit-learn K-Means model trained on the vectors produced by the FastText model. It contains the final topic cluster centroids.

How to Use

Load the models using gensim, joblib, and huggingface_hub.

import joblib
from gensim.models import FastText
from huggingface_hub import hf_hub_download

# Download and load the models
REPO_ID = "Kiuyha/surabaya-opinion-tweet-clusters"
kmeans_path = hf_hub_download(repo_id=REPO_ID, filename="kmeans.joblib")
hf_hub_download(repo_id=REPO_ID, filename="fasttext.model.wv.vectors_ngrams.npy")
fasttext_path = hf_hub_download(repo_id=REPO_ID, filename="fasttext.model")

kmeans_model = joblib.load(kmeans_path)
fasttext_model = FastText.load(fasttext_path)

print(f"K-Means model loaded with {kmeans_model.n_clusters} clusters.")
print(f"FastText model loaded with vector size {fasttext_model.vector_size}.")

# You can now use these models for inference.
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Kiuyha/surabaya-opinion-tweet-clusters