--- license: mit language: - ar - en library_name: keras --- # Arabic Semantic / Sentiment Classification using BiLSTM This repository contains a TensorFlow/Keras-based **Bidirectional LSTM (BiLSTM)** model for Arabic text classification. The model is designed for **binary classification tasks** such as sentiment or semantic polarity detection. ## Overview - **Language:** Arabic - **Task:** Binary text classification - **Model:** BiLSTM neural network - **Framework:** TensorFlow / Keras - **Focus:** Emoji-aware preprocessing and Arabic stemming This project combines classical NLP preprocessing with deep learning to handle informal Arabic text, including emojis. --- ## Model Architecture The neural network architecture consists of: - Embedding layer (vocabulary size = 10,000, embedding dim = 128) - Bidirectional LSTM (128 units, return sequences) - Dropout (0.5) - Bidirectional LSTM (64 units) - Dense layer (32 units, ReLU) - Output layer (1 unit, Sigmoid) Loss function: **Binary Crossentropy** Optimizer: **Adam (lr = 0.001)** --- ## Preprocessing Pipeline The preprocessing steps are critical and must be applied **exactly as during training**: 1. Emoji conversion using `demoji` 2. Whitespace and regex normalization 3. Tokenization using NLTK 4. Arabic stemming using **ISRIStemmer** 5. Keras tokenization and padding (max length = 100) This pipeline allows the model to better handle: - Informal Arabic - Social media text - Emoji-heavy content --- ## Files in This Repository | File | Description | |-----|------------| | `lstm_text_model.h5` | Trained BiLSTM model | | `tokenizer.pkl` | Keras tokenizer (must match training) | | `label_encoder.pkl` | Label encoder for output mapping | | `requirements.txt` | Python dependencies |