PEFT
Safetensors
Flow-OPD / README.md
nielsr's picture
nielsr HF Staff
Improve model card: add metadata, paper links, and key results
083bf66 verified
|
raw
history blame
2.83 kB
metadata
base_model: stabilityai/stable-diffusion-3.5-medium
library_name: peft
pipeline_tag: text-to-image

Flow-OPD: On-Policy Distillation for Flow Matching Models

Project Page | Paper | Code

Flow-OPD is the first unified post-training framework that integrates On-Policy Distillation (OPD) into Flow Matching (FM) models. This repository contains the student model weights (LoRA adapter) aligned using the Flow-OPD methodology, built upon Stable Diffusion 3.5 Medium.

Model Description

Flow-OPD addresses two critical bottlenecks in multi-task alignment for text-to-image models: reward sparsity and gradient interference. It adopts a two-stage strategy:

  1. Cold Start Initialization: Establishes a robust initial policy through SFT or Model Merging.
  2. Multi-Teacher On-Policy Distillation: Consolidates heterogeneous expertise into a single student through dense trajectory-level supervision from domain-specialized teachers (GenEval, OCR, DeQA, PickScore).

The framework also introduces Manifold Anchor Regularization (MAR), which uses a task-agnostic teacher to anchor generation to a high-quality manifold, effectively mitigating aesthetic degradation.

Key Results

Built upon Stable Diffusion 3.5 Medium, Flow-OPD achieves significant performance gains over the base model:

Model GenEval OCR Acc. DeQA PickScore Average
SD-3.5-M (base) 0.63 0.59 4.07 21.64 0.72
Flow-OPD (Merge Init) 0.92 0.94 4.35 23.08 0.90
  • +18pt average improvement over the base model.
  • 📝 0.94 OCR accuracy (improved from 0.59).
  • 🚀 0.92 GenEval score (improved from 0.63).

Method Highlights

  • On-Policy Sampling (SDE): Stochastic exploration via SDE for diverse trajectory sampling.
  • Dense Trajectory Supervision: Replaces sparse scalar rewards with dense vector field supervision from multiple teachers.
  • MAR Regularization: Anchors generation to a high-quality aesthetic manifold.

Citation

@misc{fang2026flowopdonpolicydistillationflow,
      title={Flow-OPD: On-Policy Distillation for Flow Matching Models},
      author={Zhen Fang and Wenxuan Huang and Yu Zeng and Yiming Zhao and Shuang Chen and Kaituo Feng and Yunlong Lin and Lin Chen and Zehui Chen and Shaosheng Cao and Feng Zhao},
      year={2026},
      eprint={2605.08063},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.08063},
}

Acknowledgements

This repository is based on flow-grpo. We thank the authors for their valuable contributions to the AIGC community.