Qwen3-1.8B Stage 1: Vocabulary Expansion for Semantic IDs

Overview

Qwen3-1.8B after Stage 1 (vocabulary expansion): 1,027 SID tokens added to the tokenizer, only embedding matrices trained. All other 1.7B parameters frozen.

This checkpoint serves as the starting point for Stage 2 full fine-tuning experiments.

Training

Base model: Qwen/Qwen3-1.7B (tied embeddings)
New tokens: 1,027 (3 structural + 4×256 codebook tokens)
Trainable parameters: 312M / 1.7B (18.2%)
Dataset: Amazon Pet Supplies (64K samples from 4.7M conversations)
Steps: 2,000
LR: 1×10⁻³, cosine scheduler
Optimizer: adamw_torch_fused
Batch: 32 × 2 = 64 effective
Hardware: NVIDIA H100 80GB

SID Token Format

<|sid_start|><|A42|><|B128|><|C64|><|D0|><|sid_end|>

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("kalistratov/qwen3-1.8b-stage1-sid")
tokenizer = AutoTokenizer.from_pretrained("kalistratov/qwen3-1.8b-stage1-sid")

# Verify SID tokens
sid = "<|sid_start|><|A10|><|B20|><|C30|><|D0|><|sid_end|>"
ids = tokenizer.encode(sid, add_special_tokens=False)
assert tokenizer.decode(ids, skip_special_tokens=False) == sid

Citation

Master's thesis, Moscow Institute of Physics and Technology (MIPT), 2026.

Downloads last month: 31

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kalistratov/qwen3-1.8b-stage1-sid

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

(622)

this model