Naming notice (2026-04-10). The "HLWQ" technique used in this model is being rebranded to HLWQ (Hadamard-Lloyd Weight Quantization). The change is only the name; the algorithm and the weights in this repository are unchanged.

The rebrand resolves a name collision with an unrelated, earlier KV cache quantization method also named HLWQ (Han et al., arXiv:2502.02617, 2025). HLWQ addresses weight quantization with a deterministic Walsh-Hadamard rotation and Lloyd-Max scalar codebook; Han et al.'s HLWQ addresses KV cache quantization with a random polar rotation. The two methods are technically distinct.

Existing loaders that load this repository by ID continue to work without changes. Future model uploads will use the HLWQ name.

Reference paper for this technique: arXiv:2603.29078 (v2 in preparation; v1 still uses the old name).

CogVideoX-5b-I2V β€” HLWQ Q5 (Bit-Packed)

PQ5 quantized CogVideoX-5b Image-to-Video β€” 19k downloads/month.

21.6 GB β†’ 8.7 GB (-60%) | cos_sim 0.9986 | 422 layers quantized

Download Size

Download

Compression

Compression

Component Original PQ5 Packed Reduction
Transformer (42 layers) 11.3 GB 3.9 GB -65%
T5 Text Encoder 9.5 GB 3.1 GB -67%
VAE 0.9 GB 0.9 GB BF16
Total 21.6 GB 8.7 GB -60%

Quick Start

# 1. Install
pip install safetensors huggingface_hub scipy diffusers transformers accelerate

# 2. Download & setup (8.7 GB)
git clone https://huggingface.co/caiovicentino1/CogVideoX-5b-I2V-HLWQ-Q5 ./CogVideoX-PQ5
cd CogVideoX-PQ5 && python setup.py

# 3. Generate video from image
python generate_cogvideo.py --image photo.jpg --prompt "A dog running in a park"

Diffusers Integration

from diffusers import CogVideoXImageToVideoPipeline
from diffusers.utils import load_image, export_to_video
import torch

# After running setup.py:
pipe = CogVideoXImageToVideoPipeline.from_pretrained(
    "./CogVideoX-PQ5", torch_dtype=torch.bfloat16
).to("cuda")

image = load_image("photo.jpg")
video = pipe(image=image, prompt="A cat playing", num_frames=49).frames[0]
export_to_video(video, "output.mp4", fps=8)

Architecture

  • CogVideoX 3D Transformer (5B params)
  • 42 layers, 48 heads, head_dim=64
  • Image-to-video generation (image conditioning)
  • 49 frames, rotary + learned positional embeddings
  • T5 text encoder (4096-dim embeddings)
  • Diffusers native (CogVideoXImageToVideoPipeline)

Hardware

GPU VRAM Status
RTX 4090 (24 GB) 24 GB Fits after dequant
RTX 3090 (24 GB) 24 GB Fits after dequant
A100 (40 GB) 40 GB Comfortable
T4 (16 GB) 16 GB With CPU offloading

Files

CogVideoX-5b-I2V-HLWQ-Q5/
β”œβ”€β”€ setup.py                    # One-command setup
β”œβ”€β”€ generate_cogvideo.py        # Easy generation wrapper
β”œβ”€β”€ polarquant/
β”‚   β”œβ”€β”€ codes/ (5 shards, 6.1 GB total)
β”‚   └── bf16/ (5 shards, 1.7 GB total)
β”œβ”€β”€ vae/ (0.9 GB, BF16)
β”œβ”€β”€ transformer/config.json
β”œβ”€β”€ text_encoder/config.json
β”œβ”€β”€ tokenizer/
└── scheduler/

Links


21.6 GB β†’ 8.7 GB with cos_sim 0.9986. Quantized with HLWQ.

Downloads last month
110
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for caiovicentino1/CogVideoX-5b-I2V-HLWQ-Q5

Finetuned
(6)
this model

Collections including caiovicentino1/CogVideoX-5b-I2V-HLWQ-Q5

Papers for caiovicentino1/CogVideoX-5b-I2V-HLWQ-Q5