Update README.md
Browse files
README.md
CHANGED
|
@@ -6,10 +6,21 @@ library_name: bertopic
|
|
| 6 |
pipeline_tag: text-classification
|
| 7 |
---
|
| 8 |
|
| 9 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
-
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
|
| 12 |
-
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
|
| 13 |
|
| 14 |
## Usage
|
| 15 |
|
|
@@ -177,3 +188,11 @@ Source: [Hugging Face dataset page](https://huggingface.co/datasets/inparallel/s
|
|
| 177 |
* Numba: 0.60.0
|
| 178 |
* Plotly: 5.24.1
|
| 179 |
* Python: 3.10.12
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
pipeline_tag: text-classification
|
| 7 |
---
|
| 8 |
|
| 9 |
+
# -BERTopic_Arab_news
|
| 10 |
+
|
| 11 |
+
A modular implementation of BERTopic for topic modeling, specifically trained on `Arabic news articles`. This implementation allows for flexible component selection at each layer of the topic modeling pipeline.
|
| 12 |
+
|
| 13 |
+

|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
## The core of this project is BERTopic, which is used to perform topic modeling on the processed text. The following steps are performed:
|
| 17 |
+
|
| 18 |
+
#### - Topic Modeling: BERTopic is trained on the cleaned dataset to identify topics in the articles.
|
| 19 |
+
#### - Fine-tuning with KeyBERT: We use KeyBERT-inspired representations to improve the clarity and interpretability of the topics.
|
| 20 |
+
#### - Topic Extraction: The most frequent topics are extracted, and each document is assigned a topic.
|
| 21 |
+
#### - Topic Updates: The model can be fine-tuned by updating topics with n-grams for more domain-specific phrases.
|
| 22 |
+
|
| 23 |
|
|
|
|
|
|
|
| 24 |
|
| 25 |
## Usage
|
| 26 |
|
|
|
|
| 188 |
* Numba: 0.60.0
|
| 189 |
* Plotly: 5.24.1
|
| 190 |
* Python: 3.10.12
|
| 191 |
+
|
| 192 |
+
|
| 193 |
+
|
| 194 |
+
|
| 195 |
+
# Visualization: Displays topic distribution and document-level information
|
| 196 |
+
|
| 197 |
+
|
| 198 |
+

|