Sn-Logicer-0.8B
A fine-tune of Qwen/Qwen3.5-0.8B optimized for grade-school math reasoning, trained on ~7k synthetic math word problems generated by DeepSeek v3.2.
No GSM8K data was used for training. GSM8K is used solely as a held-out evaluation benchmark.
Results
Evaluated with lm-eval-harness (gsm8k_cot_llama, 8-shot CoT):
| Model | Flexible Extract | Strict Match |
|---|---|---|
| Qwen3.5-0.8B (base) | 48.45% | 47.69% |
| Sn-Logicer-0.8B | 50.57% | 50.42% |
| +2.12 | +2.73 |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("SnurfyAI/Sn-Logicer-0.8B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("SnurfyAI/Sn-Logicer-0.8B", trust_remote_code=True)
messages = [
{"role": "user", "content": "A store sells 3 shirts at $15 each and 2 pants at $25 each. If a customer buys all of them with a 10% discount, how much do they pay?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Training Details
Dataset
- 7,077 synthetic math word problems generated via DeepSeek v3.2 through OpenRouter
- Problems cover arithmetic, fractions, percentages, rates, money, time/distance, geometry, combinatorics, and unit conversions
- Each example includes step-by-step reasoning ending with
#### <answer> - Training data is entirely synthetic — no existing math benchmarks were used
Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 3 |
| Batch size (effective) | 16 (4 x 4 grad accum) |
| Learning rate | 2e-5 |
| LR scheduler | Cosine |
| Warmup steps | 50 |
| Weight decay | 0.01 |
| Max sequence length | 512 |
| Precision | bfloat16 |
| Optimizer | AdamW |
| Gradient checkpointing | Enabled |
Infrastructure
- Hardware: NVIDIA RTX 5090 (32GB)
- Training time: ~3 hours
- Data generation: DeepSeek v3.2 via OpenRouter API
Framework Versions
- TRL: 0.29.0
- Transformers: 5.3.0
- PyTorch: 2.10.0+cu130
- Datasets: 4.8.3
- Tokenizers: 0.22.2
Evaluation Command
lm_eval --model hf \
--model_args pretrained=SnurfyAI/Sn-Logicer-0.8B,trust_remote_code=True \
--tasks gsm8k_cot_llama \
--num_fewshot 8 \
--apply_chat_template \
--fewshot_as_multiturn \
--batch_size auto
Limitations
- Trained only on synthetic grade-school math — may not generalize to advanced mathematics
- The +2.1% improvement over base is modest; more/higher-quality training data would likely yield larger gains
- Inherits all limitations of the base Qwen3.5-0.8B model
Citations
Cite Qwen3.5 as:
@misc{qwen3.5,
title = {{Qwen3.5}: Towards Native Multimodal Agents},
author = {{Qwen Team}},
month = {February},
year = {2026},
url = {https://qwen.ai/blog?id=qwen3.5}
}
Cite TRL as:
@software{vonwerra2020trl,
title = {{TRL: Transformers Reinforcement Learning}},
author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
license = {Apache-2.0},
url = {https://github.com/huggingface/trl},
year = {2020}
}
@misc{snurfyai2026snlogicer,
title = {Sn-Logicer-0.8B: Math Reasoning Fine-tune of Qwen3.5-0.8B},
author = {SnurfyAI},
year = {2026},
url = {https://huggingface.co/SnurfyAI/Sn-Logicer-0.8B}
}
- Downloads last month
- 40
Model tree for SnurfyAI/Sn-Logicer-0.8B
Evaluation results
- Gsm8k on openai/gsm8k View evaluation results leaderboard 50.57 *