gijl
/

Bilingual-Medical-Reasoning-MoE

Mixture of Experts

Model card Files Files and versions

Bilingual-Medical-Reasoning-MoE / README.md

gijl's picture

Upload README.md with huggingface_hub

c2f7d68 verified 13 days ago

|

history blame contribute delete

1.11 kB

	---
	language:
	- ar
	- en
	tags:
	- medical
	- moe
	- reasoning
	- bilingual
	license: apache-2.0
	library_name: transformers
	metrics:
	- accuracy
	---

	# 🩺 Bilingual Medical Reasoning MoE

	A specialized 1.5B Mixture-of-Experts (MoE) Transformer model optimized for Arabic-English clinical reasoning and medical decision support.

	### 🏗️ Model Architecture
	- Parameters: 1.5B (Total), ~68M (Active per token).
	- Structure: 6 layers, 8 heads per layer, Grouped-Query Attention (GQA).
	- MoE System: 4 experts per FFN layer with Top-2 active routing.
	- Reasoning: Native support for Chain-of-Thought (CoT) using `<\|think\|>` tags.

	### 🚀 Usage
	This model is designed to be used with the custom `DeepThinkingModel` architecture defined in this repository.

	```python
	from model import DeepThinkingModel
	import torch

	model = DeepThinkingModel.from_pretrained("gijl/Bilingual-Medical-Reasoning-MoE")
	```

	### 📊 Training Data
	- AceGPT-Instruction (Specialized Arabic instructions)
	- Helsinki-NLP OPUS-100 (Bilingual translation & reasoning)
	- Oasst1 (Conversational grounding)