Mizan: Gated Parallel Arabic Injection Layers

LoRA adapters from Mizan architecture ablations on Qwen3-8B. Each variant tests different injection frequencies and gate initialization values.

Variants

Adapter Injection Freq Gate Init Notes
mizan_every2_gate0.05 every 2 0.05 Dense injection
mizan_every4_gate0.0 every 4 0.0 Zero gate init
mizan_every4_gate0.01 every 4 0.01 Small gate init
mizan_every4_gate0.05 every 4 0.05 Best configuration
mizan_every4_gate0.05_pretrained every 4 0.05 Pretrained weights
mizan_every4_gate0.1 every 4 0.1 Large gate init
mizan_every8_gate0.05 every 8 0.05 Sparse injection
mizan_every0_gate0.0_zero None 0.0 No injection baseline

Usage

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mariklolik228/Mizan-Qwen3-8B-Adapters

Finetuned
Qwen/Qwen3-8B
Adapter
(1071)
this model