StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Paper • 2403.14773 • Published • 11
This is an INT8 symmetric per-channel quantization of PAIR/StreamingSVD by Picsart AI Research (PAIR).
StreamingSVD: Consistent, Dynamic, and Extendable Image-Guided Long Video Generation Roberto Henschel, Levon Khachatryan, Daniil Hayrapetyan, Hayk Poghosyan, Vahram Tadevosyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi [arXiv 2403.14773] · [Project page] · [Code]
| Size | Dtype | |
|---|---|---|
| Original | ~12.5 GB | F32 |
| This repo | ~3.2 GB | INT8 |
Large weight tensors are quantized with symmetric per-channel INT8. Norms, biases, and positional embeddings are kept in FP32 for accuracy.
from safetensors.torch import load_file
import torch
tensors = load_file("model.int8.safetensors")
weights = {}
for k, v in tensors.items():
if k.endswith(".scale"):
continue
scale_key = k + ".scale"
if scale_key in tensors:
scale = tensors[scale_key]
shape = [-1] + [1] * (v.dim() - 1)
weights[k] = v.float() * scale.view(shape)
else:
weights[k] = v
@article{henschel2024streamingt2v,
title={StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text},
author={Henschel, Roberto and Khachatryan, Levon and Hayrapetyan, Daniil and Poghosyan, Hayk
and Tadevosyan, Vahram and Wang, Zhangyang and Navasardyan, Shant and Shi, Humphrey},
journal={arXiv preprint arXiv:2403.14773},
year={2024}
}
Base model
PAIR/StreamingSVD