Invalid JSON: No number after minus sign in JSONat line 1, column 2
| --- | |
| license: apache-2.0 | |
| language: | |
| - en | |
| - zh | |
| - ko | |
| - ja | |
| - fr | |
| - es | |
| - de | |
| - it | |
| - ru | |
| - ar | |
| - multilingual | |
| pipeline_tag: text-generation | |
| tags: | |
| - chat | |
| - suzhou | |
| - merged | |
| - reasoning | |
| - tool-use | |
| - agent | |
| library_name: transformers | |
| base_model: | |
| - tripplet-research/suzhou3.1 | |
| - Qwen/Qwen2.5-3B-Instruct | |
| --- | |
| # Suzhou 3.2 | |
| A 12 billion parameter instruction-tuned language model by **Triplet Research**. Suzhou 3.2 is a weighted merge of Suzhou 3.1 and Qwen2.5-3B, designed to improve reasoning and math capabilities. | |
| ## Merge Details | |
| - **Method**: Weighted blending (70% Suzhou 3.1 + 30% Qwen2.5-3B) | |
| - **Model A**: Suzhou 3.1 - strong agent/tool-use, reasoning | |
| - **Model B**: Qwen2.5-3B-Instruct - math reasoning, general knowledge | |
| - **Target**: 12B parameters | |
| ## Key Features | |
| - **12B parameters** | |
| - **262K context window** | |
| - Strong **reasoning** and **chain-of-thought** capabilities | |
| - **Tool calling** and **agent** support | |
| - **Multilingual** support (29+ languages) | |
| - Mixed attention architecture (linear + full attention layers) | |
| ## Architecture | |
| - Type: Causal Language Model | |
| - Architecture: Qwen3.5 Text | |
| - Layers: 32 | |
| - Parameters: 12B | |
| ## Quickstart | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained("Triplet-Research/suzhou-3.2") | |
| tokenizer = AutoTokenizer.from_pretrained("Triplet-Research/suzhou-3.2") | |
| ``` | |