Github | Habr article | Project Page | Technical Report (soon)

KVAE 2.0: Video tokenizers

Model KVAE-3D-2.0-t4s8 has time compression 4 and spacial compression 8x8

Evaluation of reconstruction

For the test, open datasets MCL-JCV (video in 1280x720 resolution) and BVI-DVC were used. Wan-2.1 and HunyuanVideo-1.0 were considered as alternatives for the 4x8x8 format. Below are the results of a comparison using the PSNR, SSIM, and LPIPS metrics (with features from AlexNet).

Reconstruction comparison of KVAE 2.0, Hunyuan 1.0 and Wan 2.1

Downloads last month: 73

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including kandinskylab/KVAE-3D-2.0-t4s8

KVAE 2.0

Collection

KVAE 2.0 is a family of video tokenizers with a time compression ratio of 4 and spacial compression ratio of 8 and 16 • 2 items • Updated 4 days ago • 2