Qwen2-57B-A14B-Instruct-MoNE-48-gsm8k-100

This repository contains a structured pruned variant of Qwen2-57B-A14B-Instruct using the MoNE (Mixture-of-Novice Experts) framework proposed in our paper.

*## Model Overview

Base Model: Qwen2-57B-A14B-Instruct
Method: MoNE structured expert pruning
Remaining Experts: 48
Calibration Set: gsm8k-100
Architecture: Mixture-of-Experts (MoE)
Framework: Transformers-compatible
This checkpoint replaces redundant experts with lightweight novice experts via structured pruning, aiming to reduce compute while preserving performance.

Paper

Title: MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE
Authors: Geng Zhang, Yuxuan Han, Yuxuan Lou, Yiqi Zhang, Wangbo Zhao, Yang You
arXiv: arXiv:2507.00390