Remaster DRUNet -- Video Enhancement Models

Tiny neural networks that remove compression artifacts AND recover detail from video at native resolution, faster than real-time on a laptop GPU.

Both models are DRUNet (UNetRes) -- pure Conv+ReLU residual U-Nets with no attention, no normalization layers, no dynamic operations. 100% compatible with TensorRT INT8 quantization and CUDA graph capture.

Models

	Student	Teacher
File	`drunet_student.pth`	`drunet_teacher.pth`
Parameters	1.06M	32.6M
Architecture	nc=[16,32,64,128] nb=2	nc=[64,128,256,512] nb=4
Quality (PSNR)	49.98 dB	53.27 dB
Sharpness	~100% of original	107% of original
Speed (RTX 3060)	57 fps C++ pipeline / 63 fps TRT FP16 / 64 fps TRT INT8	~5 fps
VRAM	~500 MB	~2 GB
Checkpoint size	4 MB	125 MB
Use case	Deployment / real-time	Quality reference / training

Also included: drunet_student.onnx -- ONNX export with dynamic spatial dimensions for TensorRT engine building.

How It Works

A large teacher model learns to enhance video using perceptual and pixel-level losses against high-quality targets. A tiny student model then learns to replicate the teacher's output at 30x the speed through knowledge distillation with feature matching.

Training

Targets: SCUNet GAN (perceptual denoiser) + Unsharp Mask -- denoise AND sharpen in one pass
Losses: Charbonnier pixel + DISTS perceptual (teacher), Charbonnier + feature matching (student)
Optimizer: Prodigy (auto-tuned learning rate)
Data: ~7K paired frames from diverse 1080p content (live action, animation, film)
Fine-tuning: Full 1920x1080 frames to handle edge artifacts and letterboxing

Key Finding

Mixed training data (HEVC artifact removal + synthetic edge-aware blur) produces a model that generalizes beyond its training tasks -- it denoises AND sharpens, often exceeding the quality of the original Bluray source material.

Usage

Python Inference

import torch
import sys
sys.path.insert(0, "/path/to/KAIR")  # github.com/cszn/KAIR
from models.network_unet import UNetRes

# Student model (fast, deployment)
model = UNetRes(in_nc=3, out_nc=3, nc=[16, 32, 64, 128], nb=2,
                act_mode='R', bias=False)
ckpt = torch.load("drunet_student.pth", map_location="cpu", weights_only=True)
model.load_state_dict(ckpt["params"])
model.eval().half().cuda()

# Inference: input is [0, 1] float tensor, NCHW
with torch.no_grad(), torch.cuda.amp.autocast():
    output = model(input_tensor)
output = output.clamp(0, 1)

Teacher Model

# Same architecture, larger config
model = UNetRes(in_nc=3, out_nc=3, nc=[64, 128, 256, 512], nb=4,
                act_mode='R', bias=False)
ckpt = torch.load("drunet_teacher.pth", map_location="cpu", weights_only=True)
model.load_state_dict(ckpt["params"])

ONNX / TensorRT

# Build TensorRT FP16 engine from ONNX (one-time, ~2 min)
trtexec --onnx=drunet_student.onnx \
    --shapes=input:1x3x1080x1920 --fp16 --useCudaGraph \
    --saveEngine=drunet_student_1080p_fp16.engine

# INT8 quantization (requires calibration data)
trtexec --onnx=drunet_student.onnx \
    --shapes=input:1x3x1080x1920 --int8 --fp16 --useCudaGraph \
    --calib=calibration_data.bin \
    --saveEngine=drunet_student_1080p_int8.engine

VapourSynth Real-Time Playback

The ONNX model works with vs-mlrt for TensorRT inference inside VapourSynth:

import vapoursynth as vs
core = vs.core
clip = core.bs.VideoSource("input.mkv")
clip = core.resize.Bicubic(clip, format=vs.RGBS, matrix_in_s="709")
clip = core.ort.Model(clip, network_path="drunet_student.onnx",
                      backend=core.ort.Backend.TRT(fp16=True))
clip = core.resize.Bicubic(clip, format=vs.YUV420P10, matrix_s="709")
clip.set_output()

Checkpoint Format

All .pth files use the format {"params": state_dict}. Load with:

state_dict = torch.load("model.pth", map_location="cpu", weights_only=True)["params"]

Architecture Details

DRUNet (UNetRes) from KAIR:

4-level encoder-decoder U-Net with skip connections
Conv 3x3 + ReLU activation (mode='R'), no bias
Residual blocks at each level (nb=2 for student, nb=4 for teacher)
Channel progression: student [16,32,64,128], teacher [64,128,256,512]
Input: 3-channel RGB [0,1], Output: 3-channel RGB [0,1]
Spatial dimensions must be divisible by 8 (3 downsampling levels)
No batch normalization, layer normalization, or attention -- pure CNN

Requirements

PyTorch 2.0+
KAIR for UNetRes architecture definition
NVIDIA GPU with 1GB+ VRAM (student) or 4GB+ VRAM (teacher)
Optional: TensorRT 10+ for optimized inference, VapourSynth + vs-mlrt for video pipeline

License

MIT

Citation

@misc{remaster-drunet,
  title={Remaster DRUNet: Real-Time Video Enhancement via Teacher-Student Distillation},
  url={https://github.com/seantempesta/remaster},
  year={2026}
}

Acknowledgments

KAIR -- DRUNet/UNetRes architecture (MIT license)
SCUNet -- Training target generation
DISTS -- Perceptual loss function
vs-mlrt -- VapourSynth TensorRT inference

Downloads last month: -; Downloads are not tracked for this model. How to track