Qwen3-4B-Thinking-RYS-Duplicated-L25-27
Model Description
This model is a modified version of the Qwen/Qwen3-4B-Thinking-2507 base model, specifically optimized for enhanced mathematical reasoning and analytical performance through structural layer duplication.
Architecture
The model features a 39-layer architecture, expanded from the original 36 layers. The modification process involved:
- Deep-Copy Duplication: Layers 25โ27 (indices 25โ27) of the original model were deep-copied and inserted to extend the model's depth.
- Independent Parameters: Unlike simple pointer-based duplication, the duplicated block contains unique parameters, allowing for stable serialization and potential further fine-tuning.
Benchmarks
The model was evaluated using a set of complex mathematical proxies (cube roots, large multiplications, etc.) to measure reasoning accuracy. The results demonstrate a significant improvement over the base model:
| Model Version | Average Proxy Score | Improvement (Delta) |
|---|---|---|
| Base Model (36 Layers) | 0.1635 | - |
| RYS Duplicated Model (39 Layers) | 0.3465 | +0.1830 |
Usage
This model can be loaded using the standard AutoModelForCausalLM and AutoTokenizer classes from the Hugging Face transformers library.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "gatepoet/Qwen3-4B-Thinking-RYS-Duplicated-L25-27"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
- Downloads last month
- 36
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for gatepoet/Qwen3-4B-Thinking-RYS-Duplicated-L25-27
Base model
Qwen/Qwen3-4B-Thinking-2507