base_model: stabilityai/stable-diffusion-3.5-medium
library_name: peft
pipeline_tag: text-to-image
Flow-OPD: On-Policy Distillation for Flow Matching Models
Project Page | Paper | Code
Flow-OPD is the first unified post-training framework that integrates On-Policy Distillation (OPD) into Flow Matching (FM) models. This repository contains the student model weights (LoRA adapter) aligned using the Flow-OPD methodology, built upon Stable Diffusion 3.5 Medium.
Model Description
Flow-OPD addresses two critical bottlenecks in multi-task alignment for text-to-image models: reward sparsity and gradient interference. It adopts a two-stage strategy:
- Cold Start Initialization: Establishes a robust initial policy through SFT or Model Merging.
- Multi-Teacher On-Policy Distillation: Consolidates heterogeneous expertise into a single student through dense trajectory-level supervision from domain-specialized teachers (GenEval, OCR, DeQA, PickScore).
The framework also introduces Manifold Anchor Regularization (MAR), which uses a task-agnostic teacher to anchor generation to a high-quality manifold, effectively mitigating aesthetic degradation.
Key Results
Built upon Stable Diffusion 3.5 Medium, Flow-OPD achieves significant performance gains over the base model:
| Model | GenEval | OCR Acc. | DeQA | PickScore | Average |
|---|---|---|---|---|---|
| SD-3.5-M (base) | 0.63 | 0.59 | 4.07 | 21.64 | 0.72 |
| Flow-OPD (Merge Init) | 0.92 | 0.94 | 4.35 | 23.08 | 0.90 |
- ✨ +18pt average improvement over the base model.
- 📝 0.94 OCR accuracy (improved from 0.59).
- 🚀 0.92 GenEval score (improved from 0.63).
Method Highlights
- On-Policy Sampling (SDE): Stochastic exploration via SDE for diverse trajectory sampling.
- Dense Trajectory Supervision: Replaces sparse scalar rewards with dense vector field supervision from multiple teachers.
- MAR Regularization: Anchors generation to a high-quality aesthetic manifold.
Citation
@misc{fang2026flowopdonpolicydistillationflow,
title={Flow-OPD: On-Policy Distillation for Flow Matching Models},
author={Zhen Fang and Wenxuan Huang and Yu Zeng and Yiming Zhao and Shuang Chen and Kaituo Feng and Yunlong Lin and Lin Chen and Zehui Chen and Shaosheng Cao and Feng Zhao},
year={2026},
eprint={2605.08063},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.08063},
}
Acknowledgements
This repository is based on flow-grpo. We thank the authors for their valuable contributions to the AIGC community.