🚀 SESR-M7 on AMD AI PC NPU

Bhardwaj et al. (2022) introduced the Super-Efficient Super Resolution (SESR) model to solve a classic computer vision problem: to take a low-resolution input image and output a high-resolution image. The SESR model is based on a "linear overparameterization of CNNs and creates an efficient model architecture for [Single Image Super Resolution (SISR)]." The official code can be found on GitHub: https://github.com/ARM-software/sesr. One of the main ideas behind the model was to make it very computationally efficient.

This version of the model is the SESR-M7 (Small) version; it has been converted from PyTorch format to ONNX, and then quantized to INT8 to run on an AMD AI PC NPU with Ryzen AI software. The model in its current form operates on tile sizes of 512x512; however, because tiles can overlap across the image, the model accepts input of almost any size and will create a 2x super-resolution image as output (Figure 1).

Input image	Output image

Figure 1: Ice climber image upscaled by 2x with SESR-M7 model running on AMD AI PC NPU. Source: DIV2K dataset (DIV2K_valid_LR_bicubic\X4\0844x4.png).

Model Details	Description
Person or organization developing model	Yixuan Liu (AMD), Hongwei Qin (AMD), Benjamin Consolvo (AMD)
Model date	January 2026
Model version	1
Model type	Super-Resolution (Image-to-Image)
Information about training algorithms, parameters, fairness constraints or other applied approaches, and features	The $\times2$ SESR was trained for "300 epochs using ADAM optimizer with a constant learning rate of $5 \times 10^{-4}$ and a batch size of 32 on DIV2K training set." And the $\times4$ SESR model starts with the pretrained $\times2$ SESR model and replaces "the final layer of $5 \times 5 \times f \times 4$ with a $5 \times 5 \times f \times 16$ and then perform[s] the depth-to-space operation twice" (Bhardwaj et al., 2022). For more training details, refer to the paper.
Paper or other resource for more information	Bhardwaj, K., Milosavljevic, M., O'Neil, L., Gope, D., Matas, R., Chalfin, A., ... & Loh, D. (2022). Collapsible linear blocks for super-efficient super resolution. Proceedings of machine learning and systems, 4, 529-547
License	Apache 2.0
Where to send questions or comments about the model	Community Tab and AMD Developer Community Discord

⚡ Intended Use

Intended Use	Description
Primary intended uses	The model can be used to create high-resolution images from low-resolution images. The model has been converted to ONNX format and quantized for optimized performance on AMD AI PC NPUs.
Primary intended users	Anyone using or evaluating super-resolution models on AMD AI PCs.
Out-of-scope uses	This model is not intended for generating misinformation or disinformation, impersonating others, facilitating or inciting harassment or violence, any use that could lead to the violation of a human right.

How to Use

📐 Hardware Prerequisites

Before getting started, make sure you meet the minimum hardware and OS requirements:

Series	Codename	Abbreviation	Launch Year	Windows 11
Ryzen AI Max PRO 300 Series	Strix Halo	STX	2025	☑️
Ryzen AI PRO 300 Series	Strix Point / Krackan Point	STX/KRK	2025	☑️
Ryzen AI Max 300 Series	Strix Halo	STX	2025	☑️
Ryzen AI 300 Series	Strix Point	STX	2025	☑️

Getting Started

Follow the instructions here to download necessary NPU drivers and Ryzen AI software: Ryzen AI SW Installation Instructions. Please allow for around 30 minutes to install all of the necessary components of Ryzen AI SW.
Activate the previously installed conda environment from Ryzen AI (RAI) SW, and set the RAI environment variable to your installation path. Substitute the correct RAI version number for v.v.v, such as 1.7.0.

conda activate ryzen-ai-v.v.v
$Env:RYZEN_AI_INSTALLATION_PATH = 'C:/Program Files/RyzenAI/v.v.v/'

Clone the Hugging Face model repository:

git clone https://hf.co/amd/sesr-m7-512x512-tiles-amdnpu

Alternatively, you can use the Hugging Face Hub API to download the files with Python:

from huggingface_hub import snapshot_download
snapshot_download("amd/sesr-m7-512x512-tiles-amdnpu")

Install the necessary packages into the existing conda environment:

pip install -r requirements.txt

Data Preparation (optional: for evaluation). Download the EDSR benchmark dataset extract it into the datasets/ directory. Note that you will need to run this script twice, as it seems to fail on first attempt.

python download_edsr_benchmark.py

The datasets/ directory should look like this:

datasets/edsr_benchmark
    └── B100
          └── HR
            ├── 3096.png
            ├── ...
          └── LR_bicubic/X2
            ├── 3096x4.png
            ├── ...
    └── Set5
...

Run inference on a single image or a folder of images. For example:

python onnx_inference.py --onnx onnx-models/sesr_nchw_int8_512x512.onnx --input datasets/edsr_benchmark/B100/HR/3096.png --out-dir outputs  --device npu

Arguments:

--onnx: The ONNX model file path.

--input: Accepts either a single image file path or a directory path. If it's a file, the script will process that image only. If it's a directory, the script will recursively scan for .png, .jpg, and .jpeg files and process all of them.

--out-dir: Output directory where the restored images will be saved.

--device: Accepts "npu" or "cpu". The NPU will attempt to use the VitisAIExecutionProvider; the CPU will attempt to use the CPUExecutionProvider. Note that to use the NPU, the updated NPU drivers and Ryzen AI SW must first be installed.

The model has already been compiled and cached under modelcachekey_sesr_nchw_int8_512x512, but if this folder is not present, the model will be recompiled and then inference can be run.

Evaluate the accuracy of the model on benchmark datasets (optional).

Eval on Set14:

python onnx_eval.py --onnx onnx-models/sesr_nchw_int8_512x512.onnx --hq-dir datasets/edsr_benchmark/Set14/HR --lq-dir datasets/edsr_benchmark/Set14/LR_bicubic/X2 --out-dir outputs/Set14 --device npu -clean

Additional arguments for onnx_eval.py: --max-samples: (Optional) Limit the number of samples to evaluate. Useful for debugging. If not specified, all samples will be evaluated. -clean: (Optional) If specified, the generated super-resolution images will be deleted after evaluation to save disk space.

The output is a set of accuracy metrics (PSNR, MS_SSIM, SSIM and FID) in a JSON file:

{
  "onnx": "onnx-models/sesr_nchw_int8_512x512.onnx",
  "psnr": 30.458898544311523,
  "ms_ssim": 0.9915697759583786,
  "ssim": 0.8916147694010581,
  "fid": 20.698477172553396
}

The following are example scripts to run INT8 models for evaluation of the other datasets on the NPU:

Eval on B100:

python onnx_eval.py --onnx onnx-models/sesr_nchw_int8_512x512.onnx --hq-dir datasets/edsr_benchmark/B100/HR --lq-dir datasets/edsr_benchmark/B100/LR_bicubic/X2 --out-dir outputs/B100 --device npu -clean

Eval on Urban100

python onnx_eval.py --onnx onnx-models/sesr_nchw_int8_512x512.onnx --hq-dir datasets/edsr_benchmark/Urban100/HR --lq-dir datasets/edsr_benchmark/Urban100/LR_bicubic/X2 --out-dir outputs/Urban100 --device npu -clean

🔧 Evaluation Data

Datasets: The AMD ONNX model results were evaluated with the EDSR datasets (Set14, BSD100, Urban100) on peak signal-to-noise ratio (PSNR), multi-scale structural similarity (MS-SSIM), structural similarity (SSIM), and Fréchet Inception Distance (FID). To draw direct comparisons to the original paper by Bhardwaj et al. (2022), we only show the results for PSNR and SSIM (Table 1).

📚 Training Data

The $\times2$ SESR was trained for "300 epochs using ADAM optimizer with a constant learning rate of $5 \times 10^{-4}$ and a batch size of 32 on DIV2K training set." (Bhardwaj et al., 2022). DIV2K is a training set of 800 2K-resolution images for image restoration tasks.

📝 Quantitative Analyses

The evaluation results for the AMD models for PSNR/SSIM (Table 1) show decent accuracy, even after quantization from FP32 to INT8.

Regime	Model	Parameters	MACs	Set5 (↑)	Set14 (↑)	BSD100 (↑)	Urban100 (↑)	Manga109 (↑)	DIV2K (↑)
Small	AMD-SESR-M7-INT8	-	-	-	30.46/0.8916	29.80/0.8737	28.25/0.8870	-	-
	AMD-SESR-M7-FP32	-	-	-	30.98/0.9021	30.23/0.8845	28.84/0.9004	-	-
	Bicubic	-	-	33.68/0.9307	30.24/0.8693	29.56/0.8439	26.88/0.8408	30.82/0.9349	32.45/0.9043
	FSRCNN (our setup)	12.46K	6.00G	36.85/0.9561	32.47/0.9076	31.37/0.8891	29.43/0.8963	35.81/0.9689	34.73/0.9349
	FSRCNN (Dong et al., 2016)	12.46K	6.00G	36.98/0.9556	32.62/0.9087	31.50/0.8904	29.85/0.9009	36.62/0.9710	34.74/0.9340
	MOREMNAS-C (Chu et al., 2020)	25K	5.5G	37.06/0.9561	32.75/0.9094	31.50/0.8904	29.92/0.9023	-	-
	SESR-M3 (f=16, m=3)	8.91K	2.05G	37.21/0.9577	32.70/0.9100	31.56/0.8920	29.92/0.9034	36.47/0.9717	35.03/0.9373
	SESR-M5 (f=16, m=5)	13.52K	3.11G	37.39/0.9585	32.84/0.9115	31.70/0.8938	30.33/0.9087	37.07/0.9734	35.24/0.9389
	SESR-M7 (f=16, m=7)	18.12K	4.17G	37.47/0.9588	32.91/0.9118	31.77/0.8946	30.49/0.9105	37.14/0.9738	35.32/0.9395

Medium	TPSR-NoGAN (Lee et al., 2020)	60K	14.0G	37.38/0.9583	33.00/0.9123	31.75/0.8942	30.61/0.9119	-	-
	SESR-M11 (f=16, m=11)	27.34K	6.30G	37.58/0.9593	33.03/0.9128	31.85/0.8956	30.72/0.9136	37.40/0.9746	35.45/0.9404

Large	VDSR (Kim et al., 2016)	665K	612.6G	37.53/0.9587	33.05/0.9127	31.90/0.8960	30.77/0.9141	37.16/0.9740	35.43/0.9410
	LapSRN (Lai et al., 2017)	813K	29.9G	37.52/0.9590	33.08/0.9130	31.80/0.8950	30.41/0.9100	37.53/0.9740	35.31/0.9400
	BTSRN (Fan et al., 2017)	410K	207.7G	37.75/-	33.20/-	32.05/-	31.63/-	-	-
	CARN-M (Ahn et al., 2018)	412K	91.2G	37.53/0.9583	33.26/0.9141	31.92/0.8960	31.23/0.9193	-	-
	MOREMNAS-B (Chu et al., 2020)	1118K	256.9G	37.58/0.9584	33.22/0.9135	31.91/0.8959	31.14/0.9175	-	-
	SESR-XL (f=32, m=11)	105.37K	24.27G	37.77/0.9601	33.24/0.9145	31.99/0.8976	31.16/0.9184	38.01/0.9759	35.67/0.9420

Table 1: "PSNR/SSIM results on $\times$ 2 Super Resolution on several benchmark datasets. MACs are reported as the number of multiply-adds needed to convert an image to 720p (1280 $\times$ 720) resolution via $\times$ 2 SISR." Highlights indicate best score within each regime. Table from Bhardwaj et al. (2022), with 2 AMD rows added for comparison.

The following are the performance results of 5 super-resolution models on the Strix machine NPU (Table 2).

Model	FPS on Strix (↑)
amd/sesr-m7-512x512-tiles-amdnpu	91.76
amd/realesrgan-128x128-tiles-amdnpu	14.65
amd/realesrgan-256x256-tiles-amdnpuu	4.21
amd/realesrgan-512x512-tiles-amdnpu	0.55
amd/realesrgan-1024x1024-tiles-amdnpu	0.05

Table 2: Performance metrics in frames per second (FPS) for AMD Super-Resolution models (higher is better).

⚓ Ethical Considerations

AMD is committed to conducting our business in a fair, ethical and honest manner and in compliance with all applicable laws, rules and regulations. You can find out more at the AMD Ethics and Compliance page.

⚠️ Caveats and Recommendations

The models here are for x2 super-resolution (not x4).

📌 Citation Details

@article{bhardwaj2022collapsible,
  title={Collapsible linear blocks for super-efficient super resolution},
  author={Bhardwaj, Kartikeya and Milosavljevic, Milos and O'Neil, Liam and Gope, Dibakar and Matas, Ramon and Chalfin, Alex and Suda, Naveen and Meng, Lingchuan and Loh, Danny},
  journal={Proceedings of machine learning and systems},
  volume={4},
  pages={529--547},
  year={2022}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train amd/sesr-m7-512x512-tiles-amdnpu

Collection including amd/sesr-m7-512x512-tiles-amdnpu

Ryzen-AI-1.7-NPU-creativity-models

Collection

9 items • Updated Feb 19 • 2

Paper for amd/sesr-m7-512x512-tiles-amdnpu

Collapsible Linear Blocks for Super-Efficient Super Resolution

Paper • 2103.09404 • Published Mar 17, 2021