RQ-VAE for Product Semantic ID Generation

Overview

Residual Quantized Variational Autoencoder (RQ-VAE) trained to assign hierarchical semantic identifiers to product embeddings. The model compresses 1024-dimensional item embeddings into compact 3-level discrete codes (A, B, C), where each level uses a 256-code codebook. Semantically similar products share common code prefixes.

Architecture

  • Encoder: MLP (1024 β†’ 512 β†’ 256 β†’ 128 β†’ 32)
  • Quantization: 3-level residual VQ, codebook size 256 per level, rotation trick gradient estimation
  • Decoder: MLP (32 β†’ 128 β†’ 256 β†’ 512 β†’ 1024)
  • Codebook initialization: k-means on first batch

Training

  • Dataset: Amazon Pet Supplies (63,319 items)
  • Input embeddings: Qwen3-Embedding-0.6B (1024d, L2-normalized)
  • Optimizer: AdamW (LR 3Γ—10⁻⁴ β†’ 1Γ—10⁻⁢, cosine with warmup)
  • Batch size: 4096
  • Epochs: 1000

Metrics

Metric Value
Codebook utilization 100% (0 dead codes, all levels)
Perplexity 228–242 per level
Cosine similarity (orig ↔ recon) mean 0.80, p50 0.80
NN overlap@5 0.236
Unique (A,B,C) tuples 55,647 / 63,319 (87.9%)
Prefix-1 coincidence (NN / random) 0.745 / 0.005

Usage

from mipt_master.src.rqvae.model import RQVAE, RQVAEConfig
from mipt_master.src.rqvae.train import load_checkpoint

ckpt = load_checkpoint("best_model.pth")
cfg = RQVAEConfig(**ckpt.config)
model = RQVAE(cfg)
model.load_state_dict(ckpt.model_state_dict)
model.eval()

# Encode embeddings β†’ semantic IDs [N, 3]
semantic_ids = model.encode_to_semantic_ids(embeddings)

Citation

Master's thesis, Moscow Institute of Physics and Technology (MIPT), 2026.

References

  1. Y. Sun et al. "OpenOneRec," arXiv:2502.18851, 2025.
  2. C. Huh et al. "Straightening Out the Straight-Through Estimator," arXiv:2410.06424, 2024.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for kalistratov/rqvae-pet-supplies