gijl's picture
Upload README.md with huggingface_hub
c2f7d68 verified
---
language:
- ar
- en
tags:
- medical
- moe
- reasoning
- bilingual
license: apache-2.0
library_name: transformers
metrics:
- accuracy
---
# ๐Ÿฉบ Bilingual Medical Reasoning MoE
A specialized **1.5B Mixture-of-Experts (MoE)** Transformer model optimized for Arabic-English clinical reasoning and medical decision support.
### ๐Ÿ—๏ธ Model Architecture
- **Parameters:** 1.5B (Total), ~68M (Active per token).
- **Structure:** 6 layers, 8 heads per layer, Grouped-Query Attention (GQA).
- **MoE System:** 4 experts per FFN layer with Top-2 active routing.
- **Reasoning:** Native support for Chain-of-Thought (CoT) using `<|think|>` tags.
### ๐Ÿš€ Usage
This model is designed to be used with the custom `DeepThinkingModel` architecture defined in this repository.
```python
from model import DeepThinkingModel
import torch
model = DeepThinkingModel.from_pretrained("gijl/Bilingual-Medical-Reasoning-MoE")
```
### ๐Ÿ“Š Training Data
- **AceGPT-Instruction** (Specialized Arabic instructions)
- **Helsinki-NLP OPUS-100** (Bilingual translation & reasoning)
- **Oasst1** (Conversational grounding)