Image-to-Image
Diffusers
Safetensors
Diffusion Single File
English
Flux2KleinPipeline
image-generation
image-editing
flux
Instructions to use internlm/ETCHR-FLUX.2-klein-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use internlm/ETCHR-FLUX.2-klein-9B with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("internlm/ETCHR-FLUX.2-klein-9B", dtype=torch.bfloat16, device_map="cuda") prompt = "Turn this cat into a dog" input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") image = pipe(image=input_image, prompt=prompt).images[0] - Diffusion Single File
How to use internlm/ETCHR-FLUX.2-klein-9B with Diffusion Single File:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -2,4 +2,105 @@
|
|
| 2 |
license: other
|
| 3 |
license_name: flux-non-commercial-license
|
| 4 |
license_link: LICENSE
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
|
|
|
| 2 |
license: other
|
| 3 |
license_name: flux-non-commercial-license
|
| 4 |
license_link: LICENSE
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
base_model:
|
| 8 |
+
- FLUX.2-klein-base-9B
|
| 9 |
+
pipeline_tag: image-to-image
|
| 10 |
+
tags:
|
| 11 |
+
- image-generation
|
| 12 |
+
- image-editing
|
| 13 |
+
- flux
|
| 14 |
+
- diffusion-single-file
|
| 15 |
+
library_name: diffusers
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
# ETCHR-FLUX.2-klein-9B
|
| 19 |
+
<p align="center" style="font-size: 1.2em; margin-top: 0.5em">
|
| 20 |
+
π<a href="https://arxiv.org/abs/">Paper</a>
|
| 21 |
+
| π <a href="https://github.com/InternLM/ETCHR">Homepage</a >
|
| 22 |
+
| π€<a href="https://huggingface.co/internlm/ETCHR-FLUX.2-klein-9B">ETCHR-FLUX.2-klein-9B Model</a >
|
| 23 |
+
| π€<a href="https://huggingface.co/datasets/internlm/ETCHR-SFT-400K">ETCHR SFT-400K Dataset</a >
|
| 24 |
+
| π€<a href="https://huggingface.co/datasets/internlm/ETCHR-GRPO-10K">ETCHR GRPO-10K Dataset</a >
|
| 25 |
+
| π€<a href="https://huggingface.co/datasets/internlm/DL3DV-2k">DL3DV-2K Benchmark</a >
|
| 26 |
+
</p >
|
| 27 |
+
ETCHR-FLUX.2-klein-9B is a novel question-conditioned, reasoning-aware image editor designed to serve as a decoupled visual reasoning assistant for Multimodal Large Language Models. By decoupling the specialized image editor from the downstream understanding model, ETCHR bridges the critical bottleneck where a purely textual chain of thought fails in fine-grained focus or complex spatial transformations.
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
## π’ News
|
| 31 |
+
- π [2026/05/22] We have released the training and evaluation code of ETCHR.
|
| 32 |
+
- π [2026/05/21] We have released the [ETCHR-FLUX.2-klein-9B Model](https://huggingface.co/internlm/ETCHR-FLUX.2-klein-9B), [ETCHR-SFT-400K Dataset](https://huggingface.co/datasets/internlm/ETCHR-SFT-400K) and [ETCHR GRPO-10K Dataset](https://huggingface.co/datasets/internlm/ETCHR-GRPO-10K).
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
## π Overview
|
| 36 |
+
We are thrilled to introduce ETCHR (Editing To Clarify and Harness Reasoning), a novel question-conditioned, reasoning-aware image editor built on [FLUX.2-klein-base-9B](https://huggingface.co/black-forest-labs/FLUX.2-klein-base-9B) designed to serve as a decoupled visual reasoning assistant for Multimodal Large Language Models (MLLMs).
|
| 37 |
+
By decoupling the specialized image editor from the downstream understanding model, ETCHR bridges the critical bottleneck where a purely textual chain of thought fails in fine-grained focus or complex spatial transformations.
|
| 38 |
+
|
| 39 |
+
</p>
|
| 40 |
+
<p style="text-align: center;">
|
| 41 |
+
<img src="assets/overview.png" alt="Teaser" width="100%">
|
| 42 |
+
</p>
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
|
| 46 |
+
## π‘ Highlights
|
| 47 |
+
- π₯ **Decoupled & Plug-and-Play:** ETCHR functions as a separate module, allowing it to assist diverse downstream MLLMs (such as Qwen3-VL-8B, Gemini-3.1-Flash-Lite, or Kimi K2.5) without requiring any task-specific fine-tuning on the understanding models themselves.
|
| 48 |
+
- π₯ **Naturally Reflective Pipeline:** Introduces an Edit-Verify-Reason inference mechanism where the understanding model filters out noisy or flawed edits, reverting safely to the original image when verification fails.
|
| 49 |
+
|
| 50 |
+
## π Results
|
| 51 |
+
We evaluate ETCHR across five distinct task families spanning fine-grained perception, chart understanding, logic reasoning, jigsaw restoration, and 3D understanding. Across all evaluated backbones, ETCHR consistently yields major improvements in Pass@1 accuracy:
|
| 52 |
+
<p style="text-align: center;">
|
| 53 |
+
<img src="assets/result.png" alt="Pipeline" width="100%">
|
| 54 |
+
</p>
|
| 55 |
+
|
| 56 |
+
|
| 57 |
+
## π οΈ Evaluation
|
| 58 |
+
Prepare your environment:
|
| 59 |
+
```bash
|
| 60 |
+
git clone https://github.com/InternLM/ETCHR.git
|
| 61 |
+
conda create -n ETCHR python==3.11
|
| 62 |
+
conda activate ETCHR
|
| 63 |
+
cd RL/Pref-GRPO
|
| 64 |
+
bash env_setup.sh fastvideo
|
| 65 |
+
pip install "vllm>=0.11.0"
|
| 66 |
+
pip install qwen-vl-utils==0.0.14
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
We Provide an example code running ETCHR on [DL3DV-2K Benchmark](https://huggingface.co/datasets/internlm/DL3DV-2k) in ```[Evaluation/inference_dl3dv.py](https://github.com/InternLM/ETCHR/blob/master/Evaluation/inference_dl3dv.py)```, you can start the evaluation with the following two steps:
|
| 70 |
+
|
| 71 |
+
**Step 1:** start a VLLM server for an understanding model (eg. Qwen3-VL-8B, Kimi K2.5, ...).
|
| 72 |
+
```bash
|
| 73 |
+
cd Evaluation
|
| 74 |
+
bash launch_vllm.sh
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
**Step 2:** Run ETCHR atop any understanding model
|
| 78 |
+
```bash
|
| 79 |
+
python inference_dl3dv.py
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Cases
|
| 83 |
+
ETCHR can assist with a broad spectrum of understanding tasks, including fine-grained perception, chart reasoning, maze navigation, jigsaw puzzles, and 3D spatial understanding.
|
| 84 |
+
|
| 85 |
+
<p style="text-align: center;">
|
| 86 |
+
<img src="assets/case-3D.png" alt="case3D" width="100%">
|
| 87 |
+
</p>
|
| 88 |
+
<p style="text-align: center;">
|
| 89 |
+
<img src="assets/case-jigsaw.png" alt="casejigsaw" width="100%">
|
| 90 |
+
</p>
|
| 91 |
+
<p style="text-align: center;">
|
| 92 |
+
<img src="assets/case-maze.png" alt="casejigsaw" width="100%">
|
| 93 |
+
</p>
|
| 94 |
+
<p style="text-align: center;">
|
| 95 |
+
<img src="assets/case-chart.png" alt="casejigsaw" width="100%">
|
| 96 |
+
</p>
|
| 97 |
+
|
| 98 |
+
|
| 99 |
+
## π License
|
| 100 |
+
Our work is based on [FLUX.2-klein-base-9B](https://huggingface.co/black-forest-labs/FLUX.2-klein-base-9B), so please follow [FLUX Non-Commercial License](https://github.com/black-forest-labs/flux2/blob/main/model_licenses/LICENSE-FLUX-NON-COMMERICAL).
|
| 101 |
+
|
| 102 |
+
## βοΈCitation
|
| 103 |
+
If you find this project useful, please kindly cite:
|
| 104 |
+
```
|
| 105 |
+
```
|
| 106 |
---
|