π©Ί Bilingual Medical Reasoning MoE
A specialized 1.5B Mixture-of-Experts (MoE) Transformer model optimized for Arabic-English clinical reasoning and medical decision support.
ποΈ Model Architecture
- Parameters: 1.5B (Total), ~68M (Active per token).
- Structure: 6 layers, 8 heads per layer, Grouped-Query Attention (GQA).
- MoE System: 4 experts per FFN layer with Top-2 active routing.
- Reasoning: Native support for Chain-of-Thought (CoT) using
<|think|>tags.
π Usage
This model is designed to be used with the custom DeepThinkingModel architecture defined in this repository.
from model import DeepThinkingModel
import torch
model = DeepThinkingModel.from_pretrained("gijl/Bilingual-Medical-Reasoning-MoE")
π Training Data
- AceGPT-Instruction (Specialized Arabic instructions)
- Helsinki-NLP OPUS-100 (Bilingual translation & reasoning)
- Oasst1 (Conversational grounding)
- Downloads last month
- 84
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support