Sn-Logicer-0.8B

A fine-tune of Qwen/Qwen3.5-0.8B optimized for grade-school math reasoning, trained on ~7k synthetic math word problems generated by DeepSeek v3.2.

No GSM8K data was used for training. GSM8K is used solely as a held-out evaluation benchmark.

Results

Evaluated with lm-eval-harness (gsm8k_cot_llama, 8-shot CoT):

Model	Flexible Extract	Strict Match
Qwen3.5-0.8B (base)	48.45%	47.69%
Sn-Logicer-0.8B	50.57%	50.42%
	+2.12	+2.73

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("SnurfyAI/Sn-Logicer-0.8B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("SnurfyAI/Sn-Logicer-0.8B", trust_remote_code=True)

messages = [
    {"role": "user", "content": "A store sells 3 shirts at $15 each and 2 pants at $25 each. If a customer buys all of them with a 10% discount, how much do they pay?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training Details

Dataset

7,077 synthetic math word problems generated via DeepSeek v3.2 through OpenRouter
Problems cover arithmetic, fractions, percentages, rates, money, time/distance, geometry, combinatorics, and unit conversions
Each example includes step-by-step reasoning ending with #### <answer>
Training data is entirely synthetic — no existing math benchmarks were used

Hyperparameters

Parameter	Value
Epochs	3
Batch size (effective)	16 (4 x 4 grad accum)
Learning rate	2e-5
LR scheduler	Cosine
Warmup steps	50
Weight decay	0.01
Max sequence length	512
Precision	bfloat16
Optimizer	AdamW
Gradient checkpointing	Enabled

Infrastructure

Hardware: NVIDIA RTX 5090 (32GB)
Training time: ~3 hours
Data generation: DeepSeek v3.2 via OpenRouter API

Framework Versions

TRL: 0.29.0
Transformers: 5.3.0
PyTorch: 2.10.0+cu130
Datasets: 4.8.3
Tokenizers: 0.22.2

Evaluation Command

lm_eval --model hf \
  --model_args pretrained=SnurfyAI/Sn-Logicer-0.8B,trust_remote_code=True \
  --tasks gsm8k_cot_llama \
  --num_fewshot 8 \
  --apply_chat_template \
  --fewshot_as_multiturn \
  --batch_size auto

Limitations

Trained only on synthetic grade-school math — may not generalize to advanced mathematics
The +2.1% improvement over base is modest; more/higher-quality training data would likely yield larger gains
Inherits all limitations of the base Qwen3.5-0.8B model

Citations

Cite Qwen3.5 as:

@misc{qwen3.5,
    title  = {{Qwen3.5}: Towards Native Multimodal Agents},
    author = {{Qwen Team}},
    month  = {February},
    year   = {2026},
    url    = {https://qwen.ai/blog?id=qwen3.5}
}

Cite TRL as:

@software{vonwerra2020trl,
  title   = {{TRL: Transformers Reinforcement Learning}},
  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
  license = {Apache-2.0},
  url     = {https://github.com/huggingface/trl},
  year    = {2020}
}

@misc{snurfyai2026snlogicer,
  title  = {Sn-Logicer-0.8B: Math Reasoning Fine-tune of Qwen3.5-0.8B},
  author = {SnurfyAI},
  year   = {2026},
  url    = {https://huggingface.co/SnurfyAI/Sn-Logicer-0.8B}
}

Downloads last month: 40

Safetensors

Model size

0.8B params

Tensor type

BF16

Model tree for SnurfyAI/Sn-Logicer-0.8B

Base model

Qwen/Qwen3.5-0.8B-Base

Finetuned

Qwen/Qwen3.5-0.8B

Finetuned

(158)

this model

Quantizations

1 model

Evaluation results

Gsm8k on openai/gsm8k View evaluation results leaderboard

50.57 ^*