Qwen3-1.8B Semantic ID Baseline (New Data, 2048 seq)

Overview

Qwen3-1.8B fully fine-tuned for generative product recommendation via Semantic IDs. Trained on new pipeline data (63K items, 4.7M conversations). This is the baseline model (0% general data mixing) for the H2 experiment comparing the effect of reasoning data mixing.

Training

Stage 1: Vocabulary Expansion

Stage 2: Full Fine-tuning

  • Dataset: 4,719,994 SID conversations (Amazon Pet Supplies)
  • General data: None (0% — baseline)
  • Epochs: 3
  • Final loss: 1.687
  • LR: 2×10⁻⁵, cosine with min LR (0.2×peak)
  • Warmup: 3%
  • Optimizer: adamw_8bit
  • Batch: 16 × 8 = 128 effective
  • Max seq length: 2048
  • Packing: yes (custom greedy bin-packing)
  • Gradient checkpointing: yes
  • torch.compile: no
  • Instruction masking: yes (loss only on assistant responses)
  • Runtime: ~17 hours on NVIDIA H100 80GB
  • Hardware: vast.ai, France

Training Curve

Epoch 0.0: loss 2.956
Epoch 0.5: loss 1.80
Epoch 1.0: loss 1.68
Epoch 2.0: loss 1.63
Epoch 3.0: loss 1.59 (final: 1.687 avg)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("kalistratov/qwen3-1.8b-sid-baseline")
tokenizer = AutoTokenizer.from_pretrained("kalistratov/qwen3-1.8b-sid-baseline")

Experiment Context

This model is part of hypothesis H2: "Does reasoning data mixing improve SID prediction quality?"

Variant General data Status
This model (baseline) 0% Completed
+ 25% reasoning OpenMathReasoning + OpenCodeReasoning + reasoning-v1 Pending

Citation

Master's thesis, Moscow Institute of Physics and Technology (MIPT), 2026.

References

  1. Y. Sun et al. "OpenOneRec," arXiv:2512.24762, 2025.
  2. E. Yan. "semantic-ids-llm," GitHub, 2024.
Downloads last month
50
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kalistratov/qwen3-1.8b-sid-baseline

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(613)
this model

Paper for kalistratov/qwen3-1.8b-sid-baseline