---
license: mit
language:
- ar
- en
library_name: keras
---
# Arabic Semantic / Sentiment Classification using BiLSTM

This repository contains a TensorFlow/Keras-based **Bidirectional LSTM (BiLSTM)** model for Arabic text classification.  
The model is designed for **binary classification tasks** such as sentiment or semantic polarity detection.

## Overview
- **Language:** Arabic
- **Task:** Binary text classification
- **Model:** BiLSTM neural network
- **Framework:** TensorFlow / Keras
- **Focus:** Emoji-aware preprocessing and Arabic stemming

This project combines classical NLP preprocessing with deep learning to handle informal Arabic text, including emojis.

---

## Model Architecture
The neural network architecture consists of:

- Embedding layer (vocabulary size = 10,000, embedding dim = 128)
- Bidirectional LSTM (128 units, return sequences)
- Dropout (0.5)
- Bidirectional LSTM (64 units)
- Dense layer (32 units, ReLU)
- Output layer (1 unit, Sigmoid)

Loss function: **Binary Crossentropy**  
Optimizer: **Adam (lr = 0.001)**

---

## Preprocessing Pipeline
The preprocessing steps are critical and must be applied **exactly as during training**:

1. Emoji conversion using `demoji`
2. Whitespace and regex normalization
3. Tokenization using NLTK
4. Arabic stemming using **ISRIStemmer**
5. Keras tokenization and padding (max length = 100)

This pipeline allows the model to better handle:
- Informal Arabic
- Social media text
- Emoji-heavy content

---

## Files in This Repository

| File | Description |
|-----|------------|
| `lstm_text_model.h5` | Trained BiLSTM model |
| `tokenizer.pkl` | Keras tokenizer (must match training) |
| `label_encoder.pkl` | Label encoder for output mapping |
| `requirements.txt` | Python dependencies |