Github | Habr article | Project Page | Technical Report (soon)

KVAE 2.0: Video tokenizers

Model KVAE-3D-2.0-t4s8 has time compression 4 and spacial compression 8x8

Evaluation of reconstruction

For the test, open datasets MCL-JCV (video in 1280x720 resolution) and BVI-DVC were used. Wan-2.1 and HunyuanVideo-1.0 were considered as alternatives for the 4x8x8 format. Below are the results of a comparison using the PSNR, SSIM, and LPIPS metrics (with features from AlexNet).

Reconstruction comparison of KVAE 2.0, Hunyuan 1.0 and Wan 2.1

Downloads last month
73
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including kandinskylab/KVAE-3D-2.0-t4s8