Experimental SeerAttention AttnGates for Qwen3-8B. Trained on RedPajama-1T-Sample Dataset.
Base model