https://huggingface.co/YuanLabAI/Yuan3.0-Ultra

#2031
by VaLtEc-BoY - opened
  1. Introduction
    Yuan3.0 Ultra employs a unified multimodal model architecture, integrating a vision encoder, a language backbone, and a multimodal alignment module to enabling synergistic modeling of visual and linguistic information. The language backbone is built on a Mixture-of-Experts (MoE) architecture, featuring 103 Transformer layers. The model was pre-trained from scratch original with 1515B parameters. Through the innovative Layer-Adaptive Expert Pruning (LAEP) algorithm, the parameter count was reduced to 1010B during pre-training, improving pre-training efficiency by 49%. The activated parameter count for Yuan3.0 Ultra is 68.8B. Furthermore, the model incorporates a Localized Filtering-based Attention (LFA) mechanism, which effectively enhances the modeling of semantic relationships and achieves higher accuracy compared to classical attention architectures.

Yuan3.0 Ultra enhances the Reflection Inhibition Reward Mechanism (RIRM) proposed in Yuan3.0 Flash. By incorporating reward constraints based on the number of reflection steps, the model actively reduces ineffective reflections after arriving at the "first correct answer," while retaining the necessary reasoning depth for complex problems. This approach effectively mitigates the "overthinking" phenomenon in fast-thinking reinforcement learning. Training results demonstrate that under this controlled fast-thinking strategy, the model’s accuracy improves significantly, while the number of tokens generated during reasoning continually decreases—achieving simultaneous gains in both accuracy and computational efficiency.

Additionally, the technical report for Yuan3.0 Ultra has been released, which provides more detailed technical specifications and evaluation results.

VaLtEc-BoY changed discussion title from YuanLabAI/Yuan3.0-Ultra to https://huggingface.co/YuanLabAI/Yuan3.0-Ultra

Sign up or log in to comment