Qwen3-8B Fine-tuned for Semantic ID Recommendation

Overview

Qwen3-8B fine-tuned for generative product recommendation via hierarchical semantic identifiers. The model generates 4-level Semantic IDs (<|sid_start|><|A#|><|B#|><|C#|><|D#|><|sid_end|>) given product descriptions, purchase histories, or co-purchase contexts.

This is the larger model in a controlled comparison experiment (1.8B vs 8B), demonstrating consistent improvement across all task types with increased model scale.

Training

Stage 1: Vocabulary Expansion

Added 1,027 special tokens (3 structural + 4×256 codebook tokens)
Trained only embedding matrices (untied input/output)
2,000 steps, LR 1×10⁻³, batch 16 × 4 = 64 effective

Stage 2: Full Fine-tuning

Dataset: 4,719,994 instruction-formatted conversations (Amazon Pet Supplies)
Task types: text→SID, sequential recommendation, co-purchase prediction
Optimizer: AdamW 8-bit, LR 2×10⁻⁵, cosine with min LR (0.2×peak)
Warmup: 3%, weight decay 0.01
Batch: 16 × 8 = 128 effective, 3 epochs
Techniques: Custom instruction masking, greedy sequence packing (~3× throughput), gradient checkpointing
Hardware: NVIDIA H100 80GB (vast.ai), ~10.5 hours

Results

Hierarchical SID prediction accuracy (greedy decoding):

Task	A-level	Exact (beam k=10)
title → SID	66.0%	3.3%
description → SID	65.8%	3.3%
features → SID	62.1%	2.7%
seq_last_2	8.7%	6.8%
seq_last_3	10.7%	6.0%
seq_last_5	9.7%	6.0%
copurchase_backward	5.7%	1.8%
copurchase_forward	6.0%	2.0%

Evaluation: 3,000 samples per task, 11 task types.

Comparison with 1.8B

Task category	1.8B	8B	Δ
Text → SID (avg)	59.9%	64.6%	+4.7
Sequential (avg)	7.0%	9.7%	+2.7
Co-purchase (avg)	5.5%	5.8%	+0.3

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("kalistratov/qwen3-8b-semantic-ids")
tokenizer = AutoTokenizer.from_pretrained("kalistratov/qwen3-8b-semantic-ids")

Citation

Master's thesis, Moscow Institute of Physics and Technology (MIPT), 2026.

References

Y. Sun et al. "OpenOneRec," arXiv:2502.18851, 2025.
J. Liu et al. "PLUM," arXiv:2406.12346, 2024.
C. Huh et al. "Straightening Out the Straight-Through Estimator," arXiv:2410.06424, 2024.
E. Yan. "semantic-ids-llm," GitHub, 2024.