File size: 4,206 Bytes
e2f0d3d
 
 
 
 
 
 
 
 
 
 
 
 
88ce2bb
 
 
 
e2f0d3d
 
88ce2bb
 
 
e2f0d3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dd8f07c
e2f0d3d
 
 
 
 
dd8f07c
e2f0d3d
 
 
 
 
 
 
 
dd8f07c
e2f0d3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97becd8
 
 
e2f0d3d
d7114f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e2f0d3d
 
 
 
88ce2bb
e2f0d3d
88ce2bb
 
e2f0d3d
 
 
6b0dbe8
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
license: mit
library_name: pytorch
tags:
  - image-generation
  - diffusion
  - imagenet
  - frechet-distance
  - fd-loss
datasets:
  - imagenet-1k
---

# Representation Fréchet Loss for Visual Generation

[![arXiv](https://img.shields.io/badge/arXiv-2604.28190-b31b1b.svg)](https://arxiv.org/abs/2604.28190)
[![GitHub](https://img.shields.io/badge/GitHub-code-181717.svg)](https://github.com/Jiawei-Yang/FD-Loss)

This repository hosts the released checkpoints and reference data for
**Representation Fréchet Loss for Visual Generation**.

Paper: [Representation Fréchet Loss for Visual Generation](https://arxiv.org/abs/2604.28190).

Code, training scripts, and evaluation utilities are available at:
[github.com/Jiawei-Yang/FD-Loss](https://github.com/Jiawei-Yang/FD-Loss).

FD-Loss post-trains visual generators by matching generated-image feature
distributions to real-image feature distributions in frozen representation spaces.
This release contains base and FD-loss post-trained checkpoints for pMF, iMF, and
JiT models, together with the reference statistics used by the paper.

## Files

```text
checkpoints/
  base/
    iMF-B.pth
    iMF-L.pth
    iMF-XL.pth
    JiT-B.pth
    JiT-L.pth
    JiT-H.pth
    pMF-B_256.pth
    pMF-L_256.pth
    pMF-H_256.pth
    pMF-B_512.pth
    pMF-L_512.pth
    pMF-H_512.pth
  post-trained/
    iMF-B_FD-Inception.pth
    iMF-B_FD-SIM.pth
    iMF-L_FD-Inception.pth
    iMF-L_FD-SIM.pth
    iMF-XL_FD-Inception.pth
    iMF-XL_FD-SIM.pth
    JiT-B_FD-Inception.pth
    JiT-B_FD-SIM.pth
    JiT-L_FD-Inception.pth
    JiT-L_FD-SIM.pth
    JiT-H_FD-Inception.pth
    JiT-H_FD-SIM.pth
    pMF-B_FD-Inception.pth
    pMF-B_FD-SIM.pth
    pMF-L_FD-Inception.pth
    pMF-L_FD-SIM.pth
    pMF-H_FD-Inception.pth
    pMF-H_FD-SIM.pth
    pMF-B_512_FD-SIM.pth
    pMF-L_512_FD-SIM.pth
    pMF-H_512_FD-SIM.pth
data/
  fid_stats/
    paper_ref_stats.pkl
  train.txt
  val.txt
  val_labeled.txt
```

## Download

Install the Hugging Face CLI:

```bash
pip install -U huggingface_hub
```

Download all checkpoints and data files into a clone of the code repository:

```bash
hf download jjiaweiyang/FD-Loss \
  --local-dir . \
  --include "checkpoints/**/*.pth" \
  --include "data/**"
```

For released-checkpoint evaluation only:

```bash
hf download jjiaweiyang/FD-Loss \
  --local-dir . \
  --include "checkpoints/post-trained/*.pth" \
  --include "data/**"
```

Then unpack the bundled reference statistics:

```bash
python scripts/extract_paper_ref_stats.py
```

## Evaluation

Run from the root of the GitHub repository:

```bash
PRESET=pMF_H_256 \
CKPT_PATH=checkpoints/post-trained/pMF-H_FD-SIM.pth \
GPUS_PER_NODE=8 \
bash scripts/evaluate_released_ckpt.sh

PRESET=JiT_H \
CKPT_PATH=checkpoints/post-trained/JiT-H_FD-SIM.pth \
GPUS_PER_NODE=8 \
bash scripts/evaluate_released_ckpt.sh

PRESET=iMF_XL \
CKPT_PATH=checkpoints/post-trained/iMF-XL_FD-SIM.pth \
GPUS_PER_NODE=8 \
bash scripts/evaluate_released_ckpt.sh
```

Additional presets, smoke-test settings, and the Table 1 ablation, Table 2
repurposing, and Table 3 scalability scripts are documented in the GitHub
repository.

The evaluator reports both raw FD and the paper-normalized metrics. `FDr` is raw
FD divided by the validation-set raw FD for the corresponding representation
space, and `FDr-6` is the arithmetic mean over Inception, ConvNeXt, DINOv2, MAE,
SigLIP, and CLIP. The released code uses these validation-set raw FD values:

| Representation | Inception | ConvNeXt | DINOv2 | MAE | SigLIP | CLIP |
|---|---:|---:|---:|---:|---:|---:|
| valFD | 1.68 | 56.87 | 14.19 | 0.04 | 0.60 | 5.60 |

To reproduce these normalizers from ImageNet validation images:

```bash
DATA_ROOT=/path/to/imagenet \
torchrun --nproc_per_node=8 scripts/compute_valfd.py \
  --data_root "$DATA_ROOT"
```

## Citation

```bibtex
@article{yang2026fdloss,
  title={Representation Fréchet Loss for Visual Generation},
  author={Yang, Jiawei and Geng, Zhengyang and Ju, Xuan and Tian, Yonglong and Wang, Yue},
  journal={arXiv:2604.28190},
  url={https://arxiv.org/abs/2604.28190},
  year={2026}
}
```

If you have any questions, feel free to contact me through email
([yangjiaw@usc.edu](mailto:yangjiaw@usc.edu)).