Update README.md

10d4855 verified about 18 hours ago

3.79 kB

	---
	license: apache-2.0
	datasets:
	- lijiayangCS/StableI2I_Bench
	language:
	- en
	base_model:
	- Qwen/Qwen3-VL-8B-Instruct
	pipeline_tag: image-text-to-text
	---
	# StableI2I-PLUS

	Official score-supported implementation of StableI2I: Spotting Unintended Changes in Image-to-Image Transition
	ICML 2026

	> This HuggingFace repository provides the score-supported checkpoint of StableI2I.
	> Compared with the paper checkpoint, this version supports explicit fidelity score prediction.
	> Please note that the score-supported version may show a slight degradation in some fine-grained evaluation abilities, such as semantic / structure / low-level diagnosis accuracy.
	> This model is associated with our paper: https://arxiv.org/abs/2605.04453

	For the latest code, demo, inference scripts, and usage examples, please refer to the official GitHub repository:
	https://github.com/Henry-Lee-real/StableI2I

	Any questions can be consulted via email: lijiayang.cs@gmail.com

	Looking forward to your ⭐!

	---

	## 📌 TODOs

	- [x] Release code
	- [x] Release checkpoint
	- [x] Release score-supported checkpoint
	- [ ] Release pip package
	- [ ] Release arXiv version
	- [ ] Release ICML camera-ready paper
	- [ ] Release HuggingFace project page

	---

	## 🔥 News

	- StableI2I is accepted by ICML 2026.
	- This repository hosts the score-supported StableI2I checkpoint.
	- This version can directly output a fidelity score for image-to-image transitions.
	- The latest codebase is maintained in the official GitHub repository.

	---

	## Core Concept

	In most real-world image-to-image (I2I) scenarios, existing evaluations primarily focus on instruction following and perceptual quality or aesthetics of the generated images. However, they often fail to assess whether the output image faithfully preserves the semantic correspondence, spatial structure, and low-level appearance of the input image.

	To address this limitation, we propose StableI2I, a unified and dynamic evaluation framework for measuring content fidelity and pre--post consistency in image-to-image transitions. StableI2I does not require reference images and can be applied to a wide range of I2I tasks, including image editing and image restoration.

	StableI2I evaluates unintended changes from three complementary perspectives:

	1. Semantic Level
	Checks whether the output introduces unintended object-level or meaning-level changes, such as object addition, removal, replacement, or identity drift.

	2. Structure Level
	Checks whether the output preserves spatial layout and geometric consistency, including misalignment, deformation, repainting, and structural distortion.

	3. Low-level Appearance
	Checks whether the output introduces unintended visual degradation, such as blur, noise, color cast, exposure degradation, or artifacts.

	In addition to these diagnostic outputs, this score-supported version can produce an explicit fidelity score, which provides a compact numerical assessment of the overall content consistency between the input and output images.

	---

	## Model Checkpoint

	This HuggingFace repository provides the score-supported checkpoint of StableI2I.

	Please note:

	- This checkpoint supports explicit score output.
	- The score is intended to summarize overall content fidelity and pre--post consistency.
	- Compared with the original paper checkpoint, this version may have slightly weaker fine-grained diagnostic performance.
	- If you prioritize the most accurate semantic / structure / low-level diagnosis, please consider using the paper checkpoint.
	- If you need direct numerical scoring, please use this score-supported checkpoint.
	- For the latest inference pipeline and API interface, please refer to the official GitHub repository.

	---