Naming notice (2026-04-10). The "PolarQuant" technique used in this model is being rebranded to HLWQ (Hadamard-Lloyd Weight Quantization). The change is only the name; the algorithm and the weights in this repository are unchanged.

The rebrand resolves a name collision with an unrelated, earlier KV cache quantization method also named PolarQuant (Han et al., arXiv:2502.02617, 2025). HLWQ addresses weight quantization with a deterministic Walsh-Hadamard rotation and Lloyd-Max scalar codebook; Han et al.'s PolarQuant addresses KV cache quantization with a random polar rotation. The two methods are technically distinct.

Existing loaders that load this repository by ID continue to work without changes. Future model uploads will use the HLWQ name.

Reference paper for this technique: arXiv:2603.29078 (v2 in preparation; v1 still uses the old name).

🧊🎬 HY-OmniWeaving — PolarQuant Q5 (Bit-Packed)

First PolarQuant of Tencent OmniWeaving — unified video generation with reasoning, 65% smaller download, fully self-contained.

🎬 Video Demo

Prompt: "A polar bear walking on ice" Generated on NVIDIA RTX PRO 6000 Blackwell (102 GB) — 30 steps, 49 frames, 848×480, 2:36 total. Downloaded as 28 GB PQ5 repo (vs 79 GB original+deps), dequanted on GPU in 2 min.

📊 Compression Results

Component	Original	PQ5 Packed	Reduction	cos_sim
Transformer (MMDiT)	33.4 GB	5.6 GB	-83%	0.9986
Text Encoder (MLLM)	16.6 GB	6.1 GB	-63%	0.9986
Qwen2.5-VL-7B (base LLM)	16.6 GB	6.1 GB	-63%	0.9986
VAE	5.0 GB	5.0 GB	—	1.0000
byt5-small	1.2 GB	1.2 GB	—	—
Glyph-SDXL-v2	1.9 GB	1.9 GB	—	—
SigLIP vision	1.7 GB	1.7 GB	—	—
Upsampler + configs	0.3 GB	0.3 GB	—	—
Total	~79 GB	~28 GB	-65%

cos_sim 0.9986 = visually lossless. The PQ5 dequanted model generates identical quality video.

🚀 Quick Start (3 commands)

1. Install dependencies

pip install safetensors huggingface_hub scipy transformers==4.57.1 diffusers==0.32.2
pip install einops loguru omegaconf qwen-vl-utils peft imageio imageio-ffmpeg decord

2. Download & setup (28 GB, auto-dequant)

# Download the repo
git clone https://huggingface.co/caiovicentino1/HY-OmniWeaving-PolarQuant-Q5 ./OmniWeaving-PQ5
cd OmniWeaving-PQ5

# Run setup (downloads inference code + dequants PQ5 → BF16)
python setup.py

Setup takes ~3 minutes on GPU. It will:

Clone OmniWeaving inference code from GitHub
Dequant PQ5 codes → BF16 safetensors (transformer, text encoder, Qwen LLM)
Trim Hadamard padding to match original shapes
Verify all components are ready

3. Generate video!

# Simple
python generate_pq5.py --prompt "A polar bear walking on ice"

# With options
python generate_pq5.py \
    --prompt "A massive wave crashing against cliffs at sunset" \
    --steps 30 --frames 49 --seed 42 \
    --aspect-ratio 16:9 --output wave.mp4

# With reasoning mode
python generate_pq5.py --prompt "A robot solving a puzzle" --think

# Low VRAM mode (RTX 4090 / 24 GB)
python generate_pq5.py --prompt "A cat playing" --gpu-offload

Or use the raw OmniWeaving pipeline:

cd OmniWeaving-PQ5/OmniWeaving_code

torchrun --nproc_per_node=1 generate.py \
    --task t2v \
    --prompt "Your prompt here" \
    --model_path ../  \
    --output_path output.mp4 \
    --pipeline_config omniweaving \
    --num_inference_steps 30 \
    --video_length 49

🖥️ Hardware Requirements

GPU	VRAM	Speed	Offloading
RTX PRO 6000	102 GB	~5 s/step	None needed
A100 80 GB	80 GB	~4 s/step	None needed
A100 40 GB	40 GB	~6 s/step	Component offload
RTX 4090	24 GB	~15 s/step	Group offload (`--gpu-offload`)
RTX 4080	16 GB	~25 s/step	Group offload (tight)

With the original 79 GB model, only A100 80 GB+ could run without offloading. With PolarQuant PQ5, even RTX 4090 can generate video!

🎬 Supported Tasks

Task	Flag	Input	Output
Text-to-Video	`--task t2v`	Text	Video
Image-to-Video	`--task i2v`	Image + Text	Video
Interpolation	`--task interpolation`	2 Images + Text	Video
Reference-to-Video	`--task reference2v`	1-4 Images + Text	Video
Video Editing	`--task editing`	Video + Text	Video
Reasoning + Video	`--task t2v --think`	Text	Reasoning + Video

🏗️ Architecture

OmniWeaving (HunyuanVideo-1.5 based)
├── MMDiT Transformer
│   ├── 54 double blocks
│   ├── hidden_size=2048, 16 heads, head_dim=128
│   └── 763 layers → PQ5 bit-packed (5.2 GB)
├── MLLM Text Encoder
│   ├── Fine-tuned on Qwen2.5-VL-7B
│   └── 359 layers → PQ5 bit-packed (5.0 GB)
├── Qwen2.5-VL-7B Base LLM
│   └── 359 layers → PQ5 bit-packed (5.0 GB)
├── VAE (BF16 preserved, 5.0 GB)
├── SigLIP Vision Encoder (1.7 GB)
├── Glyph-SDXL-v2 / byt5-small
└── Upsampler (720p super-resolution)

🔬 Method: PolarQuant Q5

Hadamard Rotation — Walsh-Hadamard transform redistributes weight outliers uniformly across dimensions
Lloyd-Max Q5 — 32 optimal centroids computed via iterative refinement on Gaussian distribution
5-bit Packing — 8 codes packed into 5 bytes (62.5% of int8 storage)
Per-block Norms — FP16 norms preserve magnitude per 128-element block

cos_sim 0.9986 across all 1,481 quantized layers.

📦 Repository Structure

HY-OmniWeaving-PolarQuant-Q5/
├── setup.py                    # One-command setup script
├── generate_pq5.py             # Easy generation wrapper
├── app.py                      # Gradio demo
├── polarquant/
│   ├── codes/
│   │   ├── transformer_codes.safetensors     (5.2 GB)
│   │   ├── text_encoder_codes.safetensors    (5.0 GB)
│   │   └── qwen_llm_codes.safetensors       (5.0 GB)
│   └── bf16/
│       ├── transformer_bf16.safetensors      (0.3 GB)
│       ├── text_encoder_bf16.safetensors     (1.1 GB)
│       └── qwen_llm_bf16.safetensors         (1.1 GB)
├── vae/diffusion_pytorch_model.safetensors   (5.0 GB)
├── text_encoder/
│   ├── llm/          (Qwen2.5-VL configs + tokenizer)
│   ├── byt5-small/   (1.2 GB)
│   └── Glyph-SDXL-v2/ (checkpoints + assets)
├── vision_encoder/siglip/
│   ├── image_encoder/ (1.7 GB)
│   └── feature_extractor/
├── transformer/config.json
├── scheduler/
└── upsampler/ (0.3 GB)

🌐 Gradio Demo

Run locally:

pip install gradio
python app.py

Or deploy as a HuggingFace Space with GPU.

⚡ Performance

Metric	Value
Download size	28 GB (vs 79 GB, -65%)
Dequant time	~3 min (GPU)
Generation (30 steps)	2:36 (RTX PRO 6000)
Per-step latency	~5.2 s/step
Resolution	848×480 (480p) or 1280×720 (720p)
Frames	49 (default), up to 129
FPS	16

🔗 Links

📄 PolarQuant Paper: arXiv:2603.29078
💻 GitHub: polarengine-vllm
📦 PyPI: pip install polarquant
🏠 Original Model: tencent/HY-OmniWeaving
📄 OmniWeaving Paper: arXiv:2603.24458

📖 Citation

@article{polarquant2026,
  title={PolarQuant: Hadamard-Rotated Lloyd-Max Quantization for Large Language Models},
  author={Vicentino, Caio},
  journal={arXiv preprint arXiv:2603.29078},
  year={2026}
}

79 GB → 28 GB with zero visual quality loss. Video generation on RTX 4090 — only possible with PolarQuant. Quantized with PolarQuant.