We trained Mistral 7B, Qwen 8B, Gemma 9B models on 5 domains sequentially to test catastrophic forgetting. We achieved zero forgetting with medical knowledge retained at 100% after adding enterprise, finance, military, and real estate domains on top. Most fine-tuned models catastrophically forget everything they learned when you train them on something new. We built a continual learning engine that prevents this. First of its kind. We're shipping it as a SaaS platform at modelbrew.ai - dataset optimization + fine-tuning + continual learning in one pipeline. I'm looking for ML fine-tuning engineers and researchers who want to test this. DM me or comment below.
Zero Forgetting in LLM Fine-Tuning β 4 Benchmarks, All Domains Retained
We tested sequential fine-tuning on Mistral-7B across 4 independent benchmarks (5, 4, 5, and 8 domains). Standard LoRA forgets 38β49% of prior knowledge per domain. Our continual learning adapter: -0.17% drift.
The Salesforce 5-domain test showed positive backward transfer β the model got better at old domains as it learned new ones (retention BERTScore: 0.889 β 0.907).
No replay buffers. No EWC. No knowledge distillation. Spectral norm locked at 1.0. Naive LoRA crashed at gradient norm 263. Ours: under 6.