Naming notice (2026-04-10). The "PolarQuant" technique used in this model is being rebranded to HLWQ (Hadamard-Lloyd Weight Quantization). The change is only the name; the algorithm and the weights in this repository are unchanged.

The rebrand resolves a name collision with an unrelated, earlier KV cache quantization method also named PolarQuant (Han et al., arXiv:2502.02617, 2025). HLWQ addresses weight quantization with a deterministic Walsh-Hadamard rotation and Lloyd-Max scalar codebook; Han et al.'s PolarQuant addresses KV cache quantization with a random polar rotation. The two methods are technically distinct.

Existing loaders that load this repository by ID continue to work without changes. Future model uploads will use the HLWQ name.

Reference paper for this technique: arXiv:2603.29078 (v2 in preparation; v1 still uses the old name).

🧊🎬 HY-OmniWeaving β€” PolarQuant Q5 (Bit-Packed)

First PolarQuant of Tencent OmniWeaving β€” unified video generation with reasoning, 65% smaller download, fully self-contained.

🎬 Video Demo

Prompt: "A polar bear walking on ice" Generated on NVIDIA RTX PRO 6000 Blackwell (102 GB) β€” 30 steps, 49 frames, 848Γ—480, 2:36 total. Downloaded as 28 GB PQ5 repo (vs 79 GB original+deps), dequanted on GPU in 2 min.


πŸ“Š Compression Results

Download Size

Component Compression

Component Original PQ5 Packed Reduction cos_sim
Transformer (MMDiT) 33.4 GB 5.6 GB -83% 0.9986
Text Encoder (MLLM) 16.6 GB 6.1 GB -63% 0.9986
Qwen2.5-VL-7B (base LLM) 16.6 GB 6.1 GB -63% 0.9986
VAE 5.0 GB 5.0 GB β€” 1.0000
byt5-small 1.2 GB 1.2 GB β€” β€”
Glyph-SDXL-v2 1.9 GB 1.9 GB β€” β€”
SigLIP vision 1.7 GB 1.7 GB β€” β€”
Upsampler + configs 0.3 GB 0.3 GB β€” β€”
Total ~79 GB ~28 GB -65%

cos_sim 0.9986 = visually lossless. The PQ5 dequanted model generates identical quality video.


πŸš€ Quick Start (3 commands)

1. Install dependencies

pip install safetensors huggingface_hub scipy transformers==4.57.1 diffusers==0.32.2
pip install einops loguru omegaconf qwen-vl-utils peft imageio imageio-ffmpeg decord

2. Download & setup (28 GB, auto-dequant)

# Download the repo
git clone https://huggingface.co/caiovicentino1/HY-OmniWeaving-PolarQuant-Q5 ./OmniWeaving-PQ5
cd OmniWeaving-PQ5

# Run setup (downloads inference code + dequants PQ5 β†’ BF16)
python setup.py

Setup takes ~3 minutes on GPU. It will:

  • Clone OmniWeaving inference code from GitHub
  • Dequant PQ5 codes β†’ BF16 safetensors (transformer, text encoder, Qwen LLM)
  • Trim Hadamard padding to match original shapes
  • Verify all components are ready

3. Generate video!

# Simple
python generate_pq5.py --prompt "A polar bear walking on ice"

# With options
python generate_pq5.py \
    --prompt "A massive wave crashing against cliffs at sunset" \
    --steps 30 --frames 49 --seed 42 \
    --aspect-ratio 16:9 --output wave.mp4

# With reasoning mode
python generate_pq5.py --prompt "A robot solving a puzzle" --think

# Low VRAM mode (RTX 4090 / 24 GB)
python generate_pq5.py --prompt "A cat playing" --gpu-offload

Or use the raw OmniWeaving pipeline:

cd OmniWeaving-PQ5/OmniWeaving_code

torchrun --nproc_per_node=1 generate.py \
    --task t2v \
    --prompt "Your prompt here" \
    --model_path ../  \
    --output_path output.mp4 \
    --pipeline_config omniweaving \
    --num_inference_steps 30 \
    --video_length 49

πŸ–₯️ Hardware Requirements

Hardware Compatibility

GPU VRAM Speed Offloading
RTX PRO 6000 102 GB ~5 s/step None needed
A100 80 GB 80 GB ~4 s/step None needed
A100 40 GB 40 GB ~6 s/step Component offload
RTX 4090 24 GB ~15 s/step Group offload (--gpu-offload)
RTX 4080 16 GB ~25 s/step Group offload (tight)

With the original 79 GB model, only A100 80 GB+ could run without offloading. With PolarQuant PQ5, even RTX 4090 can generate video!


🎬 Supported Tasks

Task Flag Input Output
Text-to-Video --task t2v Text Video
Image-to-Video --task i2v Image + Text Video
Interpolation --task interpolation 2 Images + Text Video
Reference-to-Video --task reference2v 1-4 Images + Text Video
Video Editing --task editing Video + Text Video
Reasoning + Video --task t2v --think Text Reasoning + Video

πŸ—οΈ Architecture

OmniWeaving (HunyuanVideo-1.5 based)
β”œβ”€β”€ MMDiT Transformer
β”‚   β”œβ”€β”€ 54 double blocks
β”‚   β”œβ”€β”€ hidden_size=2048, 16 heads, head_dim=128
β”‚   └── 763 layers β†’ PQ5 bit-packed (5.2 GB)
β”œβ”€β”€ MLLM Text Encoder
β”‚   β”œβ”€β”€ Fine-tuned on Qwen2.5-VL-7B
β”‚   └── 359 layers β†’ PQ5 bit-packed (5.0 GB)
β”œβ”€β”€ Qwen2.5-VL-7B Base LLM
β”‚   └── 359 layers β†’ PQ5 bit-packed (5.0 GB)
β”œβ”€β”€ VAE (BF16 preserved, 5.0 GB)
β”œβ”€β”€ SigLIP Vision Encoder (1.7 GB)
β”œβ”€β”€ Glyph-SDXL-v2 / byt5-small
└── Upsampler (720p super-resolution)

πŸ”¬ Method: PolarQuant Q5

PolarQuant Pipeline

  1. Hadamard Rotation β€” Walsh-Hadamard transform redistributes weight outliers uniformly across dimensions
  2. Lloyd-Max Q5 β€” 32 optimal centroids computed via iterative refinement on Gaussian distribution
  3. 5-bit Packing β€” 8 codes packed into 5 bytes (62.5% of int8 storage)
  4. Per-block Norms β€” FP16 norms preserve magnitude per 128-element block

cos_sim 0.9986 across all 1,481 quantized layers.


πŸ“¦ Repository Structure

HY-OmniWeaving-PolarQuant-Q5/
β”œβ”€β”€ setup.py                    # One-command setup script
β”œβ”€β”€ generate_pq5.py             # Easy generation wrapper
β”œβ”€β”€ app.py                      # Gradio demo
β”œβ”€β”€ polarquant/
β”‚   β”œβ”€β”€ codes/
β”‚   β”‚   β”œβ”€β”€ transformer_codes.safetensors     (5.2 GB)
β”‚   β”‚   β”œβ”€β”€ text_encoder_codes.safetensors    (5.0 GB)
β”‚   β”‚   └── qwen_llm_codes.safetensors       (5.0 GB)
β”‚   └── bf16/
β”‚       β”œβ”€β”€ transformer_bf16.safetensors      (0.3 GB)
β”‚       β”œβ”€β”€ text_encoder_bf16.safetensors     (1.1 GB)
β”‚       └── qwen_llm_bf16.safetensors         (1.1 GB)
β”œβ”€β”€ vae/diffusion_pytorch_model.safetensors   (5.0 GB)
β”œβ”€β”€ text_encoder/
β”‚   β”œβ”€β”€ llm/          (Qwen2.5-VL configs + tokenizer)
β”‚   β”œβ”€β”€ byt5-small/   (1.2 GB)
β”‚   └── Glyph-SDXL-v2/ (checkpoints + assets)
β”œβ”€β”€ vision_encoder/siglip/
β”‚   β”œβ”€β”€ image_encoder/ (1.7 GB)
β”‚   └── feature_extractor/
β”œβ”€β”€ transformer/config.json
β”œβ”€β”€ scheduler/
└── upsampler/ (0.3 GB)

🌐 Gradio Demo

Run locally:

pip install gradio
python app.py

Or deploy as a HuggingFace Space with GPU.


⚑ Performance

Metric Value
Download size 28 GB (vs 79 GB, -65%)
Dequant time ~3 min (GPU)
Generation (30 steps) 2:36 (RTX PRO 6000)
Per-step latency ~5.2 s/step
Resolution 848Γ—480 (480p) or 1280Γ—720 (720p)
Frames 49 (default), up to 129
FPS 16

πŸ”— Links


πŸ“– Citation

@article{polarquant2026,
  title={PolarQuant: Hadamard-Rotated Lloyd-Max Quantization for Large Language Models},
  author={Vicentino, Caio},
  journal={arXiv preprint arXiv:2603.29078},
  year={2026}
}

79 GB β†’ 28 GB with zero visual quality loss. Video generation on RTX 4090 β€” only possible with PolarQuant. Quantized with PolarQuant.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for caiovicentino1/HY-OmniWeaving-PolarQuant-Q5

Quantized
(3)
this model

Space using caiovicentino1/HY-OmniWeaving-PolarQuant-Q5 1

Collections including caiovicentino1/HY-OmniWeaving-PolarQuant-Q5

Papers for caiovicentino1/HY-OmniWeaving-PolarQuant-Q5