Naming notice (2026-04-10). The "PolarQuant" technique used in this model is being rebranded to HLWQ (Hadamard-Lloyd Weight Quantization). The change is only the name; the algorithm and the weights in this repository are unchanged.
The rebrand resolves a name collision with an unrelated, earlier KV cache quantization method also named PolarQuant (Han et al., arXiv:2502.02617, 2025). HLWQ addresses weight quantization with a deterministic Walsh-Hadamard rotation and Lloyd-Max scalar codebook; Han et al.'s PolarQuant addresses KV cache quantization with a random polar rotation. The two methods are technically distinct.
Existing loaders that load this repository by ID continue to work without changes. Future model uploads will use the HLWQ name.
Reference paper for this technique: arXiv:2603.29078 (v2 in preparation; v1 still uses the old name).
π§π¬ HY-OmniWeaving β PolarQuant Q5 (Bit-Packed)
First PolarQuant of Tencent OmniWeaving β unified video generation with reasoning, 65% smaller download, fully self-contained.
π¬ Video Demo
Prompt: "A polar bear walking on ice" Generated on NVIDIA RTX PRO 6000 Blackwell (102 GB) β 30 steps, 49 frames, 848Γ480, 2:36 total. Downloaded as 28 GB PQ5 repo (vs 79 GB original+deps), dequanted on GPU in 2 min.
π Compression Results
| Component | Original | PQ5 Packed | Reduction | cos_sim |
|---|---|---|---|---|
| Transformer (MMDiT) | 33.4 GB | 5.6 GB | -83% | 0.9986 |
| Text Encoder (MLLM) | 16.6 GB | 6.1 GB | -63% | 0.9986 |
| Qwen2.5-VL-7B (base LLM) | 16.6 GB | 6.1 GB | -63% | 0.9986 |
| VAE | 5.0 GB | 5.0 GB | β | 1.0000 |
| byt5-small | 1.2 GB | 1.2 GB | β | β |
| Glyph-SDXL-v2 | 1.9 GB | 1.9 GB | β | β |
| SigLIP vision | 1.7 GB | 1.7 GB | β | β |
| Upsampler + configs | 0.3 GB | 0.3 GB | β | β |
| Total | ~79 GB | ~28 GB | -65% |
cos_sim 0.9986 = visually lossless. The PQ5 dequanted model generates identical quality video.
π Quick Start (3 commands)
1. Install dependencies
pip install safetensors huggingface_hub scipy transformers==4.57.1 diffusers==0.32.2
pip install einops loguru omegaconf qwen-vl-utils peft imageio imageio-ffmpeg decord
2. Download & setup (28 GB, auto-dequant)
# Download the repo
git clone https://huggingface.co/caiovicentino1/HY-OmniWeaving-PolarQuant-Q5 ./OmniWeaving-PQ5
cd OmniWeaving-PQ5
# Run setup (downloads inference code + dequants PQ5 β BF16)
python setup.py
Setup takes ~3 minutes on GPU. It will:
- Clone OmniWeaving inference code from GitHub
- Dequant PQ5 codes β BF16 safetensors (transformer, text encoder, Qwen LLM)
- Trim Hadamard padding to match original shapes
- Verify all components are ready
3. Generate video!
# Simple
python generate_pq5.py --prompt "A polar bear walking on ice"
# With options
python generate_pq5.py \
--prompt "A massive wave crashing against cliffs at sunset" \
--steps 30 --frames 49 --seed 42 \
--aspect-ratio 16:9 --output wave.mp4
# With reasoning mode
python generate_pq5.py --prompt "A robot solving a puzzle" --think
# Low VRAM mode (RTX 4090 / 24 GB)
python generate_pq5.py --prompt "A cat playing" --gpu-offload
Or use the raw OmniWeaving pipeline:
cd OmniWeaving-PQ5/OmniWeaving_code
torchrun --nproc_per_node=1 generate.py \
--task t2v \
--prompt "Your prompt here" \
--model_path ../ \
--output_path output.mp4 \
--pipeline_config omniweaving \
--num_inference_steps 30 \
--video_length 49
π₯οΈ Hardware Requirements
| GPU | VRAM | Speed | Offloading |
|---|---|---|---|
| RTX PRO 6000 | 102 GB | ~5 s/step | None needed |
| A100 80 GB | 80 GB | ~4 s/step | None needed |
| A100 40 GB | 40 GB | ~6 s/step | Component offload |
| RTX 4090 | 24 GB | ~15 s/step | Group offload (--gpu-offload) |
| RTX 4080 | 16 GB | ~25 s/step | Group offload (tight) |
With the original 79 GB model, only A100 80 GB+ could run without offloading. With PolarQuant PQ5, even RTX 4090 can generate video!
π¬ Supported Tasks
| Task | Flag | Input | Output |
|---|---|---|---|
| Text-to-Video | --task t2v |
Text | Video |
| Image-to-Video | --task i2v |
Image + Text | Video |
| Interpolation | --task interpolation |
2 Images + Text | Video |
| Reference-to-Video | --task reference2v |
1-4 Images + Text | Video |
| Video Editing | --task editing |
Video + Text | Video |
| Reasoning + Video | --task t2v --think |
Text | Reasoning + Video |
ποΈ Architecture
OmniWeaving (HunyuanVideo-1.5 based)
βββ MMDiT Transformer
β βββ 54 double blocks
β βββ hidden_size=2048, 16 heads, head_dim=128
β βββ 763 layers β PQ5 bit-packed (5.2 GB)
βββ MLLM Text Encoder
β βββ Fine-tuned on Qwen2.5-VL-7B
β βββ 359 layers β PQ5 bit-packed (5.0 GB)
βββ Qwen2.5-VL-7B Base LLM
β βββ 359 layers β PQ5 bit-packed (5.0 GB)
βββ VAE (BF16 preserved, 5.0 GB)
βββ SigLIP Vision Encoder (1.7 GB)
βββ Glyph-SDXL-v2 / byt5-small
βββ Upsampler (720p super-resolution)
π¬ Method: PolarQuant Q5
- Hadamard Rotation β Walsh-Hadamard transform redistributes weight outliers uniformly across dimensions
- Lloyd-Max Q5 β 32 optimal centroids computed via iterative refinement on Gaussian distribution
- 5-bit Packing β 8 codes packed into 5 bytes (62.5% of int8 storage)
- Per-block Norms β FP16 norms preserve magnitude per 128-element block
cos_sim 0.9986 across all 1,481 quantized layers.
π¦ Repository Structure
HY-OmniWeaving-PolarQuant-Q5/
βββ setup.py # One-command setup script
βββ generate_pq5.py # Easy generation wrapper
βββ app.py # Gradio demo
βββ polarquant/
β βββ codes/
β β βββ transformer_codes.safetensors (5.2 GB)
β β βββ text_encoder_codes.safetensors (5.0 GB)
β β βββ qwen_llm_codes.safetensors (5.0 GB)
β βββ bf16/
β βββ transformer_bf16.safetensors (0.3 GB)
β βββ text_encoder_bf16.safetensors (1.1 GB)
β βββ qwen_llm_bf16.safetensors (1.1 GB)
βββ vae/diffusion_pytorch_model.safetensors (5.0 GB)
βββ text_encoder/
β βββ llm/ (Qwen2.5-VL configs + tokenizer)
β βββ byt5-small/ (1.2 GB)
β βββ Glyph-SDXL-v2/ (checkpoints + assets)
βββ vision_encoder/siglip/
β βββ image_encoder/ (1.7 GB)
β βββ feature_extractor/
βββ transformer/config.json
βββ scheduler/
βββ upsampler/ (0.3 GB)
π Gradio Demo
Run locally:
pip install gradio
python app.py
Or deploy as a HuggingFace Space with GPU.
β‘ Performance
| Metric | Value |
|---|---|
| Download size | 28 GB (vs 79 GB, -65%) |
| Dequant time | ~3 min (GPU) |
| Generation (30 steps) | 2:36 (RTX PRO 6000) |
| Per-step latency | ~5.2 s/step |
| Resolution | 848Γ480 (480p) or 1280Γ720 (720p) |
| Frames | 49 (default), up to 129 |
| FPS | 16 |
π Links
- π PolarQuant Paper: arXiv:2603.29078
- π» GitHub: polarengine-vllm
- π¦ PyPI:
pip install polarquant - π Original Model: tencent/HY-OmniWeaving
- π OmniWeaving Paper: arXiv:2603.24458
π Citation
@article{polarquant2026,
title={PolarQuant: Hadamard-Rotated Lloyd-Max Quantization for Large Language Models},
author={Vicentino, Caio},
journal={arXiv preprint arXiv:2603.29078},
year={2026}
}
79 GB β 28 GB with zero visual quality loss. Video generation on RTX 4090 β only possible with PolarQuant. Quantized with PolarQuant.
- Downloads last month
- -



