what is polarquant and what diff is making compared to actual animate?

#1
by Perfs - opened

Hi @Perfs !

What is PolarQuant?

PolarQuant is a quantization method that compresses model weights using:

  1. Walsh-Hadamard rotation β€” spreads weight outliers uniformly across dimensions
  2. Lloyd-Max Q5 β€” 32 optimal centroids for 5-bit quantization
  3. 5-bit packing β€” 8 codes packed into 5 bytes (62.5% of int8)

Paper: arXiv:2603.29078

What's different from the original Wan2.2-Animate-14B?

Same model, smaller download:

Original PolarQuant Q5
Transformer 34.5 GB 17.6 GB
VAE 508 MB 508 MB (unchanged)
T5 encoder 11.4 GB 11.4 GB (unchanged)
Total ~72 GB ~55 GB
Quality (cos_sim) 1.000 >0.999
  • The transformer weights are compressed to 5-bit codes (PQ5)
  • VAE and text encoders are kept in BF16 (quality-sensitive)
  • cos_sim >0.999 means the dequanted weights are virtually identical to the original
  • You get the same video quality with a smaller download

How to use

You need to dequant the PQ5 codes back to BF16 before inference:

  1. Download this repo (55 GB vs 72 GB)
  2. Dequant the transformer codes to BF16 safetensors
  3. Use the standard Wan2.2-Animate inference pipeline

The video output will be identical to the original model β€” the quantization is visually lossless at cos_sim >0.999.

We have 49+ models quantized with PolarQuant across LLMs, video, and image generation: all models

does that mean the processing time gonna be slower? like let's say for a 30 minutes render, is this polarquant animate model divide the time?
could you share a workable workflow in comfyui? you told that need to be dequanted to bf16 safetensors i have no clue on code stuff could you elaborate how i can do that or simply share a template workflow for comfy, thanks for much for you work, im impatient to test it out and feedback you results

Hi @Perfs β€” let me be direct about what this repo gives you and what it doesn't.

Processing time

No, it does not speed up rendering. PolarQuant Q5 is only about download size, not inference speed.

The flow is:

  1. Download ~55 GB (vs ~72 GB original)
  2. One-time dequant: codes β†’ BF16 safetensors
  3. Inference runs at the same speed as the original model

So a 30-minute render stays a 30-minute render. The only upside is saving 17 GB of disk/bandwidth on the initial download.

Honest take for a ComfyUI workflow

This model is not plug-and-play in ComfyUI right now. The dequant step is Python code β€” there's no custom ComfyUI node for it. Until someone wraps it, the realistic path is:

  1. Download my Q5 repo (55 GB)
  2. Run a Python dequant script β†’ produces transformer.safetensors (BF16)
  3. Point your standard Wan2.2-Animate ComfyUI workflow at the dequanted transformer

If you're not comfortable with Python, the original Wan2.2-Animate-14B is probably the simpler path. It works with existing ComfyUI workflows out of the box. The only reason to use my Q5 repo is disk/bandwidth savings on the initial download β€” inference speed and output quality are the same.

I can share a dequant script if you want to go that route β€” just let me know your OS (Linux/Mac/Windows) and whether you have CUDA. But I want to set expectations honestly: this isn't a faster version, it's a smaller download.

Sign up or log in to comment