| --- |
| license: apache-2.0 |
| language: |
| - en |
| - zh |
| - ko |
| - ja |
| - fr |
| - es |
| - de |
| - it |
| - ru |
| - ar |
| - multilingual |
| pipeline_tag: text-generation |
| tags: |
| - chat |
| - suzhou |
| - merged |
| - reasoning |
| - tool-use |
| - agent |
| library_name: transformers |
| base_model: |
| - tripplet-research/suzhou3.1 |
| - Qwen/Qwen2.5-3B-Instruct |
| --- |
| |
| # Suzhou 3.2 |
|
|
| A 12 billion parameter instruction-tuned language model by **Triplet Research**. Suzhou 3.2 is a weighted merge of Suzhou 3.1 and Qwen2.5-3B, designed to improve reasoning and math capabilities. |
|
|
| ## Merge Details |
|
|
| - **Method**: Weighted blending (70% Suzhou 3.1 + 30% Qwen2.5-3B) |
| - **Model A**: Suzhou 3.1 - strong agent/tool-use, reasoning |
| - **Model B**: Qwen2.5-3B-Instruct - math reasoning, general knowledge |
| - **Target**: 12B parameters |
|
|
| ## Key Features |
|
|
| - **12B parameters** |
| - **262K context window** |
| - Strong **reasoning** and **chain-of-thought** capabilities |
| - **Tool calling** and **agent** support |
| - **Multilingual** support (29+ languages) |
| - Mixed attention architecture (linear + full attention layers) |
|
|
| ## Architecture |
|
|
| - Type: Causal Language Model |
| - Architecture: Qwen3.5 Text |
| - Layers: 32 |
| - Parameters: 12B |
|
|
| ## Safetensors |
|
|
| - 12B parameters |
|
|
| ## Quickstart |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model = AutoModelForCausalLM.from_pretrained("Triplet-Research/suzhou-3.2") |
| tokenizer = AutoTokenizer.from_pretrained("Triplet-Research/suzhou-3.2") |
| ``` |
|
|