Qwen3-4B-Thinking-RYS-Duplicated-L25-27

Model Description

This model is a modified version of the Qwen/Qwen3-4B-Thinking-2507 base model, specifically optimized for enhanced mathematical reasoning and analytical performance through structural layer duplication.

Architecture

The model features a 39-layer architecture, expanded from the original 36 layers. The modification process involved:

  • Deep-Copy Duplication: Layers 25โ€“27 (indices 25โ€“27) of the original model were deep-copied and inserted to extend the model's depth.
  • Independent Parameters: Unlike simple pointer-based duplication, the duplicated block contains unique parameters, allowing for stable serialization and potential further fine-tuning.

Benchmarks

The model was evaluated using a set of complex mathematical proxies (cube roots, large multiplications, etc.) to measure reasoning accuracy. The results demonstrate a significant improvement over the base model:

Model Version Average Proxy Score Improvement (Delta)
Base Model (36 Layers) 0.1635 -
RYS Duplicated Model (39 Layers) 0.3465 +0.1830

Usage

This model can be loaded using the standard AutoModelForCausalLM and AutoTokenizer classes from the Hugging Face transformers library.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "gatepoet/Qwen3-4B-Thinking-RYS-Duplicated-L25-27"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
Downloads last month
36
Safetensors
Model size
4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for gatepoet/Qwen3-4B-Thinking-RYS-Duplicated-L25-27

Finetuned
(226)
this model