BWGZK commited on
Commit
e23b994
·
verified ·
1 Parent(s): cef4ee6

Add ProCrop model card

Browse files
Files changed (1) hide show
  1. README.md +146 -0
README.md ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - image-cropping
5
+ - aesthetic-cropping
6
+ - computer-vision
7
+ - retrieval-augmented
8
+ - conditional-detr
9
+ pipeline_tag: image-to-image
10
+ library_name: pytorch
11
+ datasets:
12
+ - BWGZK/procrop_dataset
13
+ language:
14
+ - en
15
+ ---
16
+
17
+ # ProCrop: Learning Aesthetic Image Cropping from Professional Compositions
18
+
19
+ [![arXiv](https://img.shields.io/badge/arXiv-2505.22490-b31b1b.svg)](https://arxiv.org/abs/2505.22490)
20
+ [![GitHub](https://img.shields.io/badge/GitHub-ProCrop-blue)](https://github.com/BWGZK-keke/ProCrop)
21
+
22
+ This is the **headline supervised checkpoint** for the AAAI 2026 paper "ProCrop: Learning Aesthetic Image Cropping from Professional Compositions" by Zhang et al.
23
+
24
+ ## Model Description
25
+
26
+ ProCrop is a retrieval-augmented framework for aesthetic image cropping that leverages professional photography compositions as guidance. Given a query image, ProCrop:
27
+
28
+ 1. **Retrieves** compositionally similar professional images from a large database (AVA / CGL) using SAM embeddings and Faiss nearest-neighbor search.
29
+ 2. **Fuses** retrieved features with the query via cross-attention.
30
+ 3. **Predicts** diverse crop proposals ranked by aesthetic score using a Conditional DETR decoder.
31
+
32
+ ## Reported Performance (FLMS supervised setting)
33
+
34
+ | Metric | Value |
35
+ |--------|-------|
36
+ | **IoU** | **0.843** |
37
+ | **BDE (Displacement)** | **0.036** |
38
+
39
+ This checkpoint matches the FLMS row of Table 3 in the paper.
40
+
41
+ ## Checkpoint Details
42
+
43
+ | Property | Value |
44
+ |----------|-------|
45
+ | File | `procrop_flms_supervised.pth` |
46
+ | Size | 512 MB |
47
+ | Original filename | `checkpoint0008200.8425250053405762.pth` |
48
+ | Trainable params | ~44.8M |
49
+ | Backbone | ResNet-50 (DC5) + Transformer encoder/decoder |
50
+ | Training data | CPCDataset (supervised) + AVA retrieval references |
51
+ | Evaluation | FLMS test set, IoU = 0.8425 |
52
+ | Training epoch | 83 |
53
+ | Crop queries | 24 (Conditional DETR style) |
54
+
55
+ ## How to Use
56
+
57
+ ### 1. Clone the GitHub repository
58
+
59
+ ```bash
60
+ git clone https://github.com/BWGZK-keke/ProCrop.git
61
+ cd ProCrop
62
+ pip install -r requirements.txt
63
+ pip install git+https://github.com/openai/CLIP.git
64
+ ```
65
+
66
+ ### 2. Download this checkpoint
67
+
68
+ ```python
69
+ from huggingface_hub import hf_hub_download
70
+
71
+ ckpt_path = hf_hub_download(
72
+ repo_id="BWGZK/ProCrop",
73
+ filename="procrop_flms_supervised.pth"
74
+ )
75
+ ```
76
+
77
+ Or with the CLI:
78
+ ```bash
79
+ huggingface-cli download BWGZK/ProCrop procrop_flms_supervised.pth --local-dir ./checkpoints
80
+ ```
81
+
82
+ ### 3. Run inference on a single image
83
+
84
+ ```bash
85
+ cd cropping
86
+ python test_singleimage.py \
87
+ --dataset_root /path/to/your/images \
88
+ --retrieval_cache_dir /path/to/retrieval_tables \
89
+ --retrieval_img_dir /path/to/CGL_images \
90
+ --resume ./checkpoints/procrop_flms_supervised.pth \
91
+ --crop_savepath ./results
92
+ ```
93
+
94
+ ### 4. Evaluate on FLMS
95
+
96
+ ```bash
97
+ cd cropping
98
+ python main_cpc.py \
99
+ --dataset_root /path/to/FLMS \
100
+ --retrieval_cache_dir /path/to/retrieval_tables \
101
+ --resume ./checkpoints/procrop_flms_supervised.pth \
102
+ --eval
103
+ ```
104
+
105
+ You also need:
106
+ - **Precomputed retrieval tables** from [BWGZK/procrop_dataset](https://huggingface.co/datasets/BWGZK/procrop_dataset)
107
+ - **SAM ViT-B checkpoint** if training on GAIC/CAD: [download here](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth)
108
+
109
+ ## Architecture
110
+
111
+ ProCrop extends **Conditional DETR** with a retrieval augmentation module:
112
+
113
+ - **Backbone**: ResNet-50 with dilated C5 stage
114
+ - **Encoder**: 6-layer transformer encoder for the query image
115
+ - **Retrieval fusion**: Cross-attention between query features and top-K retrieved SAM embeddings (64×256)
116
+ - **Decoder**: 6-layer transformer decoder with N=24 learnable crop queries
117
+ - **Heads**:
118
+ - 4-dim bounding-box MLP (3 layers)
119
+ - 1-dim aesthetic-score classification head (binary focal loss)
120
+ - **EMA self-distillation**: Mean-teacher framework for weakly-supervised training on CAD
121
+
122
+ Core implementation: [`cropping/models/conditional_detr_cpc.py`](https://github.com/BWGZK-keke/ProCrop/blob/main/cropping/models/conditional_detr_cpc.py)
123
+
124
+ ## Related Resources
125
+
126
+ - **Code (GitHub)**: https://github.com/BWGZK-keke/ProCrop
127
+ - **Paper (arXiv)**: https://arxiv.org/abs/2505.22490
128
+ - **Dataset (HuggingFace)**: https://huggingface.co/datasets/BWGZK/procrop_dataset
129
+ - CAD dataset (242K weakly annotated images)
130
+ - Precomputed retrieval tables
131
+ - Pre-extracted SAM embedding databases
132
+
133
+ ## Citation
134
+
135
+ ```bibtex
136
+ @article{ProCrop2025,
137
+ title={ProCrop: Learning Aesthetic Image Cropping from Professional Compositions},
138
+ author={Zhang, Ke and Ding, Tianyu and Jiang, Jiachen and Chen, Tianyi and Zharkov, Ilya and Patel, Vishal M. and Liang, Luming},
139
+ journal={arXiv preprint arXiv:2505.22490},
140
+ year={2025}
141
+ }
142
+ ```
143
+
144
+ ## License
145
+
146
+ Apache 2.0. The model builds on [ConditionalDETR](https://github.com/Atten4Vis/ConditionalDETR), [RALF](https://github.com/CyberAgentAILab/RALF), and [Segment Anything](https://github.com/facebookresearch/segment-anything) — please consult their respective licenses.