Qwen3-4B-Thinking-RYS-Duplicated-L25-27

Model Description

This model is a modified version of the Qwen/Qwen3-4B-Thinking-2507 base model, specifically optimized for enhanced mathematical reasoning and analytical performance through structural layer duplication.

Architecture

The model features a 39-layer architecture, expanded from the original 36 layers. The modification process involved:

Deep-Copy Duplication: Layers 25–27 (indices 25–27) of the original model were deep-copied and inserted to extend the model's depth.
Independent Parameters: Unlike simple pointer-based duplication, the duplicated block contains unique parameters, allowing for stable serialization and potential further fine-tuning.

Benchmarks

The model was evaluated using a set of complex mathematical proxies (cube roots, large multiplications, etc.) to measure reasoning accuracy. The results demonstrate a significant improvement over the base model:

Model Version	Average Proxy Score	Improvement (Delta)
Base Model (36 Layers)	0.1635	-
RYS Duplicated Model (39 Layers)	0.3465	+0.1830

Usage

This model can be loaded using the standard AutoModelForCausalLM and AutoTokenizer classes from the Hugging Face transformers library.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "gatepoet/Qwen3-4B-Thinking-RYS-Duplicated-L25-27"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

Downloads last month: 36

Safetensors

Model size

4B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gatepoet/Qwen3-4B-Thinking-RYS-Duplicated-L25-27

Base model

Qwen/Qwen3-4B-Thinking-2507

Finetuned

(226)

this model