rakib72642 commited on
Commit
985cbbd
·
1 Parent(s): f363644

init commit

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. INSTALL.md +0 -34
  2. README.md +0 -405
  3. cog.yaml +0 -24
  4. cutler/__init__.py +0 -15
  5. cutler/config/__init__.py +0 -3
  6. cutler/config/cutler_config.py +0 -19
  7. cutler/data/__init__.py +0 -15
  8. cutler/data/build.py +0 -561
  9. cutler/data/dataset_mapper.py +0 -193
  10. cutler/data/datasets/__init__.py +0 -16
  11. cutler/data/datasets/builtin.py +0 -216
  12. cutler/data/datasets/builtin_meta.py +0 -389
  13. cutler/data/datasets/coco.py +0 -544
  14. cutler/data/detection_utils.py +0 -650
  15. cutler/data/transforms/__init__.py +0 -15
  16. cutler/data/transforms/augmentation_impl.py +0 -616
  17. cutler/data/transforms/transform.py +0 -355
  18. cutler/demo/__init__.py +0 -5
  19. cutler/demo/demo.py +0 -197
  20. cutler/demo/predictor.py +0 -219
  21. cutler/engine/__init__.py +0 -7
  22. cutler/engine/defaults.py +0 -726
  23. cutler/engine/train_loop.py +0 -360
  24. cutler/evaluation/__init__.py +0 -3
  25. cutler/evaluation/coco_evaluation.py +0 -727
  26. cutler/model_zoo/configs/Base-RCNN-FPN.yaml +0 -42
  27. cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_100perc.yaml +0 -40
  28. cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_10perc.yaml +0 -40
  29. cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_1perc.yaml +0 -42
  30. cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_20perc.yaml +0 -40
  31. cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_2perc.yaml +0 -42
  32. cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_30perc.yaml +0 -40
  33. cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_40perc.yaml +0 -40
  34. cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_50perc.yaml +0 -40
  35. cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_5perc.yaml +0 -42
  36. cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_60perc.yaml +0 -40
  37. cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_80perc.yaml +0 -40
  38. cutler/model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml +0 -61
  39. cutler/model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN_demo.yaml +0 -62
  40. cutler/model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN_self_train.yaml +0 -60
  41. cutler/model_zoo/configs/CutLER-ImageNet/mask_rcnn_R_50_FPN.yaml +0 -52
  42. cutler/modeling/__init__.py +0 -16
  43. cutler/modeling/meta_arch/__init__.py +0 -7
  44. cutler/modeling/meta_arch/build.py +0 -27
  45. cutler/modeling/meta_arch/rcnn.py +0 -344
  46. cutler/modeling/roi_heads/__init__.py +0 -16
  47. cutler/modeling/roi_heads/custom_cascade_rcnn.py +0 -338
  48. cutler/modeling/roi_heads/fast_rcnn.py +0 -587
  49. cutler/modeling/roi_heads/roi_heads.py +0 -926
  50. cutler/solver/__init__.py +0 -5
INSTALL.md DELETED
@@ -1,34 +0,0 @@
1
-
2
- # Installation
3
-
4
- ## Requirements
5
- - Linux or macOS with Python ≥ 3.8
6
- - PyTorch ≥ 1.8 and [torchvision](https://github.com/pytorch/vision/) that matches the PyTorch installation.
7
- Install them together at [pytorch.org](https://pytorch.org) to make sure of this.
8
- Note, please check PyTorch version matches that is required by Detectron2.
9
- - Detectron2: follow Detectron2 installation instructions.
10
- - OpenCV ≥ 4.6 is needed by demo and visualization.
11
-
12
- ## Example conda environment setup
13
-
14
- ```bash
15
- conda create --name cutler python=3.8 -y
16
- conda activate cutler
17
- conda install pytorch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 -c pytorch
18
- pip install git+https://github.com/lucasb-eyer/pydensecrf.git
19
-
20
- # under your working directory
21
- git clone git@github.com:facebookresearch/detectron2.git
22
- cd detectron2
23
- pip install -e .
24
- pip install git+https://github.com/cocodataset/panopticapi.git
25
- pip install git+https://github.com/mcordts/cityscapesScripts.git
26
-
27
- cd ..
28
- git clone --recursive git@github.com:facebookresearch/CutLER.git
29
- cd CutLER
30
- pip install -r requirements.txt
31
- ```
32
-
33
- ## datasets
34
- If you want to train/evaluate on the datasets, please see [datasets/README.md](datasets/README.md) to see how we prepare datasets for this project.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md DELETED
@@ -1,405 +0,0 @@
1
- # Cut and Learn for Unsupervised Image & Video Object Detection and Instance Segmentation
2
-
3
- **Cut**-and-**LE**a**R**n (**CutLER**) is a simple approach for training object detection and instance segmentation models without human annotations.
4
- It outperforms previous SOTA by **2.7 times** for AP50 and **2.6 times** for AR on **11 benchmarks**.
5
-
6
- <p align="center"> <img src='docs/teaser_img.jpg' align="center" > </p>
7
-
8
- > [**Cut and Learn for Unsupervised Object Detection and Instance Segmentation**](http://people.eecs.berkeley.edu/~xdwang/projects/CutLER/)
9
- > [Xudong Wang](https://people.eecs.berkeley.edu/~xdwang/), [Rohit Girdhar](https://rohitgirdhar.github.io/), [Stella X. Yu](https://www1.icsi.berkeley.edu/~stellayu/), [Ishan Misra](https://imisra.github.io/)
10
- > FAIR, Meta AI; UC Berkeley
11
- > CVPR 2023
12
-
13
- [[`project page`](http://people.eecs.berkeley.edu/~xdwang/projects/CutLER/)] [[`arxiv`](https://arxiv.org/abs/2301.11320)] [[`colab`](https://colab.research.google.com/drive/1NgEyFHvOfuA2MZZnfNPWg1w5gSr3HOBb?usp=sharing)] [[`bibtex`](#citation)]
14
-
15
- Unsupervised video instance segmentation (**VideoCutLER**) is also supported. ***We demonstrate that video instance segmentation models can be learned without using any human annotations, without relying on natural videos (ImageNet data alone is sufficient), and even without motion estimations!*** The code is available [here](videocutler).
16
-
17
- <p align="center">
18
- <img src="docs/demos_videocutler.gif" width=100%>
19
- </p>
20
-
21
- > [**VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation**](https://people.eecs.berkeley.edu/~xdwang/projects/VideoCutLER/videocutler.pdf)
22
- > [Xudong Wang](https://people.eecs.berkeley.edu/~xdwang/), [Ishan Misra](https://imisra.github.io/), Ziyun Zeng, [Rohit Girdhar](https://rohitgirdhar.github.io/), [Trevor Darrell](https://people.eecs.berkeley.edu/~trevor/)
23
- > UC Berkeley; FAIR, Meta AI
24
- > CVPR 2024
25
-
26
- [[`code`](videocutler/README.md)] [[`PDF`](https://people.eecs.berkeley.edu/~xdwang/projects/VideoCutLER/videocutler.pdf)] [[`arxiv`](https://arxiv.org/abs/2308.14710)] [[`bibtex`](#citation)]
27
-
28
- ## Features
29
- - We propose MaskCut approach to generate pseudo-masks for multiple objects in an image.
30
- - CutLER can learn unsupervised object detectors and instance segmentors solely on ImageNet-1K.
31
- - CutLER exhibits strong robustness to domain shifts when evaluated on 11 different benchmarks across domains like natural images, video frames, paintings, sketches, etc.
32
- - CutLER can serve as a pretrained model for fully/semi-supervised detection and segmentation tasks.
33
- - We also propose VideoCutLER, a surprisingly simple unsupervised video instance segmentation (UVIS) method without relying on optical flows. ImaegNet-1K is all we need for training a SOTA UVIS model!
34
-
35
- ## Installation
36
- See [installation instructions](INSTALL.md).
37
-
38
- ## Dataset Preparation
39
- See [Preparing Datasets for CutLER](datasets/README.md).
40
-
41
- ## Method Overview
42
- <p align="center">
43
- <img src="docs/pipeline.jpg" width=55%>
44
- </p>
45
- Cut-and-Learn has two stages: 1) generating pseudo-masks with MaskCut and 2) learning unsupervised detectors from pseudo-masks of unlabeled data.
46
-
47
- ### 1. MaskCut
48
-
49
- MaskCut can be used to provide segmentation masks for multiple instances of each image.
50
- <p align="center">
51
- <img src="docs/maskcut.gif" width=100%>
52
- </p>
53
-
54
- ### MaskCut Demo
55
-
56
- Try out the MaskCut demo using Colab (no GPU needed): [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1X05lKL_IBRvZB7q6n6pb4w00_tIYjGlf?usp=sharing)
57
-
58
- Try out the web demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/facebook/MaskCut) (thanks to [@hysts](https://github.com/hysts)!)
59
-
60
-
61
-
62
-
63
- If you want to run MaskCut locally, we provide `demo.py` that is able to visualize the pseudo-masks produced by MaskCut.
64
- Run it with:
65
- ```
66
- cd maskcut
67
- python demo.py --img-path imgs/demo2.jpg \
68
- --N 3 --tau 0.15 --vit-arch base --patch-size 8 \
69
- [--other-options]
70
- ```
71
- We give a few demo images in maskcut/imgs/. If you want to run demo.py with cpu, simply add "--cpu" when running the demo script.
72
- For imgs/demo4.jpg, you need to use "--N 6" to segment all six instances in the image.
73
- Following, we give some visualizations of the pseudo-masks on the demo images.
74
- <p align="center">
75
- <img src="docs/maskcut-demo.jpg" width=100%>
76
- </p>
77
-
78
- ### Generating Annotations for ImageNet-1K with MaskCut
79
- To generate pseudo-masks for ImageNet-1K using MaskCut, first set up the ImageNet-1K dataset according to the instructions in [datasets/README.md](datasets/README.md), then execute the following command:
80
- ```
81
- cd maskcut
82
- python maskcut.py \
83
- --vit-arch base --patch-size 8 \
84
- --tau 0.15 --fixed_size 480 --N 3 \
85
- --num-folder-per-job 1000 --job-index 0 \
86
- --dataset-path /path/to/dataset/traindir \
87
- --out-dir /path/to/save/annotations \
88
- ```
89
- As the process of generating pseudo-masks for all 1.3 million images in 1,000 folders takes a significant amount of time, it is recommended to use multiple runs. Each run should process the pseudo-mask generation for a smaller number of image folders by setting "--num-folder-per-job" and "--job-index". Once all runs are completed, you can merge all the resulting json files by using the following command:
90
- ```
91
- python merge_jsons.py \
92
- --base-dir /path/to/save/annotations \
93
- --num-folder-per-job 2 --fixed-size 480 \
94
- --tau 0.15 --N 3 \
95
- --save-path imagenet_train_fixsize480_tau0.15_N3.json
96
- ```
97
- The "--num-folder-per-job", "--fixed-size", "--tau" and "--N" of merge_jsons.py should match the ones used to run maskcut.py.
98
-
99
- We also provide a submitit script to launch the pseudo-mask generation process with multiple nodes.
100
- ```
101
- cd maskcut
102
- bash run_maskcut_with_submitit.sh
103
- ```
104
- After that, you can use "merge_jsons.py" to merge all these json files as described above.
105
-
106
- ### 2. CutLER
107
-
108
- ### Inference Demo for CutLER with Pre-trained Models
109
- Try out the CutLER demo using Colab (no GPU needed): [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1NgEyFHvOfuA2MZZnfNPWg1w5gSr3HOBb?usp=sharing)
110
-
111
- Try out the web demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/facebook/CutLER) (thanks to [@hysts](https://github.com/hysts)!)
112
-
113
-
114
- Try out Replicate demo and the API: [![Replicate](https://replicate.com/cjwbw/cutler/badge)](https://replicate.com/cjwbw/cutler)
115
-
116
-
117
- If you want to run CutLER demos locally,
118
- 1. Pick a model and its config file from [model zoo](#model-zoo),
119
- for example, `model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml`.
120
- 2. We provide `demo.py` that is able to demo builtin configs. Run it with:
121
- ```
122
- cd cutler
123
- python demo/demo.py --config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN_demo.yaml \
124
- --input demo/imgs/*.jpg \
125
- [--other-options]
126
- --opts MODEL.WEIGHTS /path/to/cutler_w_cascade_checkpoint
127
- ```
128
- The configs are made for training, therefore we need to specify `MODEL.WEIGHTS` to a model from model zoo for evaluation.
129
- This command will run the inference and show visualizations in an OpenCV window.
130
- <!-- For details of the command line arguments, see `demo.py -h` or look at its source code
131
- to understand its behavior. Some common arguments are: -->
132
- * To run __on cpu__, add `MODEL.DEVICE cpu` after `--opts`.
133
- * To save outputs to a directory (for images) or a file (for webcam or video), use `--output`.
134
-
135
- Following, we give some visualizations of the model predictions on the demo images.
136
- <p align="center">
137
- <img src="docs/cutler-demo.jpg" width=100%>
138
- </p>
139
-
140
- ### Unsupervised Model Learning
141
- Before training the detector, it is necessary to use MaskCut to generate pseudo-masks for all ImageNet data.
142
- You can either use the pre-generated json file directly by downloading it from [here](http://dl.fbaipublicfiles.com/cutler/maskcut/imagenet_train_fixsize480_tau0.15_N3.json) and placing it under "DETECTRON2_DATASETS/imagenet/annotations/", or generate your own pseudo-masks by following the instructions in [MaskCut](#1-maskcut).
143
-
144
- We provide a script `train_net.py`, that is made to train all the configs provided in CutLER.
145
- To train a model with "train_net.py", first setup the ImageNet-1K dataset following [datasets/README.md](datasets/README.md), then run:
146
- ```
147
- cd cutler
148
- export DETECTRON2_DATASETS=/path/to/DETECTRON2_DATASETS/
149
- python train_net.py --num-gpus 8 \
150
- --config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml
151
- ```
152
-
153
- If you want to train a model using multiple nodes, you may need to adjust [some model parameters](https://arxiv.org/abs/1706.02677) and some SBATCH command options in "tools/train-1node.sh" and "tools/single-node_run.sh", then run:
154
- ```
155
- cd cutler
156
- sbatch tools/train-1node.sh \
157
- --config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml \
158
- MODEL.WEIGHTS /path/to/dino/d2format/model \
159
- OUTPUT_DIR output/
160
- ```
161
- You can also convert a pre-trained DINO model to detectron2's format by yourself following [this link](https://github.com/facebookresearch/moco/tree/main/detection).
162
-
163
- ### Self-training
164
- We further improve performance by self-training the model on its predictions.
165
-
166
- Firstly, we can get model predictions on ImageNet via running:
167
- ```
168
- python train_net.py --num-gpus 8 \
169
- --config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml \
170
- --test-dataset imagenet_train \
171
- --eval-only TEST.DETECTIONS_PER_IMAGE 30 \
172
- MODEL.WEIGHTS output/model_final.pth \ # load previous stage/round checkpoints
173
- OUTPUT_DIR output/ # path to save model predictions
174
- ```
175
- Secondly, we can run the following command to generate the json file for the first round of self-training:
176
- ```
177
- python tools/get_self_training_ann.py \
178
- --new-pred output/inference/coco_instances_results.json \ # load model predictions
179
- --prev-ann DETECTRON2_DATASETS/imagenet/annotations/imagenet_train_fixsize480_tau0.15_N3.json \ # path to the old annotation file.
180
- --save-path DETECTRON2_DATASETS/imagenet/annotations/cutler_imagenet1k_train_r1.json \ # path to save a new annotation file.
181
- --threshold 0.7
182
- ```
183
- Finally, place "cutler_imagenet1k_train_r1.json" under "DETECTRON2_DATASETS/imagenet/annotations/", then launch the self-training process:
184
- ```
185
- python train_net.py --num-gpus 8 \
186
- --config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN_self_train.yaml \
187
- --train-dataset imagenet_train_r1 \
188
- MODEL.WEIGHTS output/model_final.pth \ # load previous stage/round checkpoints
189
- OUTPUT_DIR output/self-train-r1/ # path to save checkpoints
190
- ```
191
-
192
- You can repeat the steps above to perform multiple rounds of self-training and adjust some arguments as needed (e.g., "--threshold" for round 1 and 2 can be set to 0.7 and 0.65, respectively; "--train-dataset" for round 1 and 2 can be set to "imagenet_train_r1" and "imagenet_train_r2", respectively; MODEL.WEIGHTS for round 1 and 2 should point to the previous stage/round checkpoints). Ensure that all annotation files are placed under DETECTRON2_DATASETS/imagenet/annotations/.
193
- Please ensure that "--train-dataset", json file names and locations match the ones specified in "cutler/data/datasets/builtin.py".
194
- Please refer to this [instruction](https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html) for guidance on using custom datasets.
195
-
196
- You can also directly download the MODEL.WEIGHTS and annotations used for each round of self-training:
197
- <table><tbody>
198
- <!-- START TABLE -->
199
- <!-- TABLE BODY -->
200
- <!-- ROW: round 1 -->
201
- <tr><td align="center">round 1</td>
202
- <td align="center"><a href="http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_r1.pth">cutler_cascade_r1.pth</a></td>
203
- <td align="center"><a href="http://dl.fbaipublicfiles.com/cutler/maskcut/cutler_imagenet1k_train_r1.json">cutler_imagenet1k_train_r1.json</a></td>
204
- </tr>
205
- <!-- ROW: round 2 -->
206
- <tr><td align="center">round 2</td>
207
- <td align="center"><a href="http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_r2.pth">cutler_cascade_r2.pth</a></td>
208
- <td align="center"><a href="http://dl.fbaipublicfiles.com/cutler/maskcut/cutler_imagenet1k_train_r2.json">cutler_imagenet1k_train_r2.json</a></td>
209
- </tr>
210
- </tbody></table>
211
-
212
- ### Unsupervised Zero-shot Evaluation
213
- To evaluate a model's performance on 11 different datasets, please refer to [datasets/README.md](datasets/README.md) for instructions on preparing the datasets. Next, select a model from the model zoo, specify the "model_weights", "config_file" and the path to "DETECTRON2_DATASETS" in `tools/eval.sh`, then run the script.
214
- ```
215
- bash tools/eval.sh
216
- ```
217
-
218
- ### Model Zoo
219
- We show zero-shot unsupervised object detection performance (AP50&nbsp;|&nbsp;AR) on 11 different datasets spanning a variety of domains. ^: CutLER using Mask R-CNN as a detector; *: CutLER using Cascade Mask R-CNN as a detector.
220
- <table><tbody>
221
- <!-- START TABLE -->
222
- <!-- TABLE HEADER -->
223
- <th valign="bottom">Methods</th>
224
- <th valign="bottom">Models</th>
225
- <th valign="bottom">COCO</th>
226
- <th valign="bottom">COCO20K</th>
227
- <th valign="bottom">VOC</th>
228
- <th valign="bottom">LVIS</th>
229
- <th valign="bottom">UVO</th>
230
- <th valign="bottom">Clipart</th>
231
- <th valign="bottom">Comic</th>
232
- <th valign="bottom">Watercolor</th>
233
- <th valign="bottom">KITTI</th>
234
- <th valign="bottom">Objects365</th>
235
- <th valign="bottom">OpenImages</th>
236
- <!-- TABLE BODY -->
237
- </tr>
238
- <tr><td align="center">Prev. SOTA</td>
239
- <td valign="bottom">-</td>
240
- <td align="center">9.6&nbsp;|&nbsp;12.6</td>
241
- <td align="center">9.7&nbsp;|&nbsp;12.6</td>
242
- <td align="center">15.9&nbsp;|&nbsp;21.3</td>
243
- <td align="center">3.8&nbsp;|&nbsp;6.4</td>
244
- <td align="center">10.0&nbsp;|&nbsp;14.2</td>
245
- <td align="center">7.9&nbsp;|&nbsp;15.1</td>
246
- <td align="center">9.9&nbsp;|&nbsp;16.3</td>
247
- <td align="center">6.7&nbsp;|&nbsp;16.2</td>
248
- <td align="center">7.7&nbsp;|&nbsp;7.1</td>
249
- <td align="center">8.1&nbsp;|&nbsp;10.2</td>
250
- <td align="center">9.9&nbsp;|&nbsp;14.9</td>
251
- </tr>
252
- <!-- ROW: Box/Mask AP for CutLER -->
253
- </tr>
254
- <tr><td align="center">CutLER^</td>
255
- <td valign="bottom"><a href="http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_mrcnn_final.pth">download</a></td>
256
- <td align="center">21.1&nbsp;|&nbsp;29.6</td>
257
- <td align="center">21.6&nbsp;|&nbsp;30.0</td>
258
- <td align="center">36.6&nbsp;|&nbsp;41.0</td>
259
- <td align="center">7.7&nbsp;|&nbsp;18.7</td>
260
- <td align="center">29.8&nbsp;|&nbsp;38.4</td>
261
- <td align="center">20.9&nbsp;|&nbsp;38.5</td>
262
- <td align="center">31.2&nbsp;|&nbsp;37.1</td>
263
- <td align="center">37.3&nbsp;|&nbsp;39.9</td>
264
- <td align="center">15.3&nbsp;|&nbsp;25.4</td>
265
- <td align="center">19.5&nbsp;|&nbsp;30.0</td>
266
- <td align="center">17.1&nbsp;|&nbsp;26.4</td>
267
- </tr>
268
- <!-- ROW: Box/Mask AP for CutLER -->
269
- </tr>
270
- <tr><td align="center">CutLER*</td>
271
- <td valign="bottom"><a href="http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_final.pth">download</a></td>
272
- <td align="center">21.9&nbsp;|&nbsp;32.7</td>
273
- <td align="center">22.4&nbsp;|&nbsp;33.1</td>
274
- <td align="center">36.9&nbsp;|&nbsp;44.3</td>
275
- <td align="center">8.4&nbsp;|&nbsp;21.8</td>
276
- <td align="center">31.7&nbsp;|&nbsp;42.8</td>
277
- <td align="center">21.1&nbsp;|&nbsp;41.3</td>
278
- <td align="center">30.4&nbsp;|&nbsp;38.6</td>
279
- <td align="center">37.5&nbsp;|&nbsp;44.6</td>
280
- <td align="center">18.4&nbsp;|&nbsp;27.5</td>
281
- <td align="center">21.6&nbsp;|&nbsp;34.2</td>
282
- <td align="center">17.3&nbsp;|&nbsp;29.6</td>
283
- </tr>
284
- </tbody></table>
285
-
286
- ## Semi-supervised and Fully-supervised Learning
287
- CutLER can also serve as a pretrained model for training fully supervised object detection and instance segmentation models and improves performance on COCO, including on few-shot benchmarks.
288
-
289
- ### Training & Evaluation in Command Line
290
- You can find all the semi-supervised and fully-supervised learning configs provided in CutLER under `model_zoo/configs/COCO-Semisupervised`.
291
-
292
- To train a model using K% labels with `train_net.py`, first set up the COCO dataset according to [datasets/README.md](datasets/README.md) and specify K value in the config file, then run:
293
- ```
294
- python train_net.py --num-gpus 8 \
295
- --config-file model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_{K}perc.yaml \
296
- MODEL.WEIGHTS /path/to/cutler_pretrained_model
297
- ```
298
-
299
- You can find all config files used to train supervised models under `model_zoo/configs/COCO-Semisupervised`.
300
- The configs are made for 8-GPU training. To train on 1 GPU, you may need to [change some parameters](https://arxiv.org/abs/1706.02677), e.g. number of GPUs (num-gpus your_num_gpus), learning rates (SOLVER.BASE_LR your_base_lr) and batch size (SOLVER.IMS_PER_BATCH your_batch_size).
301
-
302
- ### Evaluation
303
- To evaluate a model's performance, use
304
- ```
305
- python train_net.py \
306
- --config-file model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_{K}perc.yaml \
307
- --eval-only MODEL.WEIGHTS /path/to/checkpoint_file
308
- ```
309
- For more options, see `python train_net.py -h`.
310
-
311
- ### Model Zoo
312
- We fine-tune a Cascade R-CNN model initialized with CutLER or MoCo-v2 on varying amounts of labeled COCO data, and show results (Box&nbsp;|&nbsp;Mask AP) on the val2017 split below:
313
-
314
- <table><tbody>
315
- <!-- START TABLE -->
316
- <!-- TABLE HEADER -->
317
- <th valign="bottom">% of labels</th>
318
- <th valign="bottom">1%</th>
319
- <th valign="bottom">2%</th>
320
- <th valign="bottom">5%</th>
321
- <th valign="bottom">10%</th>
322
- <th valign="bottom">20%</th>
323
- <th valign="bottom">30%</th>
324
- <th valign="bottom">40%</th>
325
- <th valign="bottom">50%</th>
326
- <th valign="bottom">60%</th>
327
- <th valign="bottom">80%</th>
328
- <th valign="bottom">100%</th>
329
- <!-- TABLE BODY -->
330
- <!-- ROW: Box/Mask AP for CutLER -->
331
- <tr><td align="center">MoCo-v2</td>
332
- <td align="center">11.8&nbsp;|&nbsp;10.0</td>
333
- <td align="center">16.2&nbsp;|&nbsp;13.8</td>
334
- <td align="center">20.5&nbsp;|&nbsp;17.8</td>
335
- <td align="center">26.5&nbsp;|&nbsp;23.0</td>
336
- <td align="center">32.5&nbsp;|&nbsp;28.2</td>
337
- <td align="center">35.5&nbsp;|&nbsp;30.8</td>
338
- <td align="center">37.3&nbsp;|&nbsp;32.3</td>
339
- <td align="center">38.7&nbsp;|&nbsp;33.6</td>
340
- <td align="center">39.9&nbsp;|&nbsp;34.6</td>
341
- <td align="center">41.6&nbsp;|&nbsp;36.0</td>
342
- <td align="center">42.8&nbsp;|&nbsp;37.0</td>
343
- </tr>
344
- <!-- ROW: Mask AP -->
345
- <tr><td align="center">CutLER</td>
346
- <td align="center">16.8&nbsp;|&nbsp;14.6</td>
347
- <td align="center">21.6&nbsp;|&nbsp;18.9</td>
348
- <td align="center">27.8&nbsp;|&nbsp;24.3</td>
349
- <td align="center">32.2&nbsp;|&nbsp;28.1</td>
350
- <td align="center">36.6&nbsp;|&nbsp;31.7</td>
351
- <td align="center">38.2&nbsp;|&nbsp;33.3</td>
352
- <td align="center">39.9&nbsp;|&nbsp;34.7</td>
353
- <td align="center">41.5&nbsp;|&nbsp;35.9</td>
354
- <td align="center">42.3&nbsp;|&nbsp;36.7</td>
355
- <td align="center">43.8&nbsp;|&nbsp;37.9</td>
356
- <td align="center">44.7&nbsp;|&nbsp;38.5</td>
357
- </tr>
358
- <!-- ROW: Model Downloads -->
359
- <tr><td align="center">Download</td>
360
- <td align="center"><a href="http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_semi_1perc.pth">model</a></td>
361
- <td align="center"><a href="http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_semi_2perc.pth">model</a></td>
362
- <td align="center"><a href="http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_semi_5perc.pth">model</a></td>
363
- <td align="center"><a href="http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_semi_10perc.pth">model</a></td>
364
- <td align="center"><a href="http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_semi_20perc.pth">model</a></td>
365
- <td align="center"><a href="http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_semi_30perc.pth">model</a></td>
366
- <td align="center"><a href="http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_semi_40perc.pth">model</a></td>
367
- <td align="center"><a href="http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_semi_50perc.pth">model</a></td>
368
- <td align="center"><a href="http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_semi_60perc.pth">model</a></td>
369
- <td align="center"><a href="http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_semi_80perc.pth">model</a></td>
370
- <td align="center"><a href="http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_fully_100perc.pth">model</a></td>
371
- </tr>
372
- </tbody></table>
373
-
374
- Both MoCo-v2 and our CutLER are trained for the 1x schedule using Detectron2, except for extremely low-shot settings with 1% or 2% labels. When training with 1% or 2% labels, we train both MoCo-v2 and our model for 3,600 iterations with a batch size of 16.
375
-
376
- ## License
377
- The majority of CutLER, Detectron2 and DINO are licensed under the [CC-BY-NC license](LICENSE), however portions of the project are available under separate license terms: TokenCut, Bilateral Solver and CRF are licensed under the MIT license; If you later add other third party code, please keep this license info updated, and please let us know if that component is licensed under something other than CC-BY-NC, MIT, or CC0.
378
-
379
- ## Ethical Considerations
380
- CutLER's wide range of detection capabilities may introduce similar challenges to many other visual recognition methods.
381
- As the image can contain arbitrary instances, it may impact the model output.
382
-
383
- ## How to get support from us?
384
- If you have any general questions, feel free to email us at [Xudong Wang](mailto:xdwang@eecs.berkeley.edu), [Ishan Misra](mailto:imisra@meta.com) and [Rohit Girdhar](mailto:rgirdhar@meta.com). If you have code or implementation-related questions, please feel free to send emails to us or open an issue in this codebase (We recommend that you open an issue in this codebase, because your questions may help others).
385
-
386
- ## Citation
387
- If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.
388
- ```
389
- @inproceedings{wang2023cut,
390
- title={Cut and learn for unsupervised object detection and instance segmentation},
391
- author={Wang, Xudong and Girdhar, Rohit and Yu, Stella X and Misra, Ishan},
392
- booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
393
- pages={3124--3134},
394
- year={2023}
395
- }
396
- ```
397
-
398
- ```
399
- @article{wang2023videocutler,
400
- title={VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation},
401
- author={Wang, Xudong and Misra, Ishan and Zeng, Ziyun and Girdhar, Rohit and Darrell, Trevor},
402
- journal={arXiv preprint arXiv:2308.14710},
403
- year={2023}
404
- }
405
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cog.yaml DELETED
@@ -1,24 +0,0 @@
1
- build:
2
- gpu: true
3
- cuda: "11.6"
4
- python_version: "3.8"
5
- python_packages:
6
- - "torch==1.11.0"
7
- - "torchvision==0.12.0"
8
- - "faiss-gpu==1.7.2"
9
- - "opencv-python==4.6.0.66"
10
- - "scikit-image==0.19.2"
11
- - "scikit-learn==1.1.1"
12
- - "shapely==1.8.2"
13
- - "timm==0.5.4"
14
- - "pyyaml==6.0"
15
- - "colored==1.4.4"
16
- - "fvcore==0.1.5.post20220512"
17
- - "gdown==4.5.4"
18
- - "pycocotools==2.0.6"
19
- - "numpy==1.20.0"
20
-
21
- run:
22
- - pip install git+https://github.com/lucasb-eyer/pydensecrf.git
23
-
24
- predict: "maskcut/predict.py:Predictor"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/__init__.py DELETED
@@ -1,15 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
-
3
- import config
4
- import engine
5
- import modeling
6
- import structures
7
- import tools
8
- import demo
9
-
10
- # dataset loading
11
- from . import data # register all new datasets
12
- from data import datasets # register all new datasets
13
- from solver import *
14
-
15
- # from .data import register_all_imagenet
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/config/__init__.py DELETED
@@ -1,3 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
-
3
- from .cutler_config import add_cutler_config
 
 
 
 
cutler/config/cutler_config.py DELETED
@@ -1,19 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
-
3
- from detectron2.config import CfgNode as CN
4
-
5
- def add_cutler_config(cfg):
6
- cfg.DATALOADER.COPY_PASTE = False
7
- cfg.DATALOADER.COPY_PASTE_RATE = 0.0
8
- cfg.DATALOADER.COPY_PASTE_MIN_RATIO = 0.5
9
- cfg.DATALOADER.COPY_PASTE_MAX_RATIO = 1.0
10
- cfg.DATALOADER.COPY_PASTE_RANDOM_NUM = True
11
- cfg.DATALOADER.VISUALIZE_COPY_PASTE = False
12
-
13
- cfg.MODEL.ROI_HEADS.USE_DROPLOSS = False
14
- cfg.MODEL.ROI_HEADS.DROPLOSS_IOU_THRESH = 0.0
15
-
16
- cfg.SOLVER.BASE_LR_MULTIPLIER = 1
17
- cfg.SOLVER.BASE_LR_MULTIPLIER_NAMES = []
18
-
19
- cfg.TEST.NO_SEGM = False
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/data/__init__.py DELETED
@@ -1,15 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
-
3
- from . import datasets # ensure the builtin datasets are registered
4
- from .detection_utils import * # isort:skip
5
- from .build import (
6
- build_batch_data_loader,
7
- build_detection_train_loader,
8
- build_detection_test_loader,
9
- get_detection_dataset_dicts,
10
- load_proposals_into_dataset,
11
- print_instances_class_histogram,
12
- )
13
- from detectron2.data.common import *
14
-
15
- __all__ = [k for k in globals().keys() if not k.startswith("_")]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/data/build.py DELETED
@@ -1,561 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/data/build.py
3
-
4
- import itertools
5
- import logging
6
- import numpy as np
7
- import operator
8
- import pickle
9
- from typing import Any, Callable, Dict, List, Optional, Union
10
- import torch
11
- import torch.utils.data as torchdata
12
- from tabulate import tabulate
13
- from termcolor import colored
14
-
15
- from detectron2.config import configurable
16
- from detectron2.structures import BoxMode
17
- from detectron2.utils.comm import get_world_size
18
- from detectron2.utils.env import seed_all_rng
19
- from detectron2.utils.file_io import PathManager
20
- from detectron2.utils.logger import _log_api_usage, log_first_n
21
-
22
- from detectron2.data.catalog import DatasetCatalog, MetadataCatalog
23
- from detectron2.data.common import AspectRatioGroupedDataset, DatasetFromList, MapDataset, ToIterableDataset
24
- from data.dataset_mapper import DatasetMapper
25
- from data.detection_utils import check_metadata_consistency
26
- from detectron2.data.samplers import (
27
- InferenceSampler,
28
- RandomSubsetTrainingSampler,
29
- RepeatFactorTrainingSampler,
30
- TrainingSampler,
31
- )
32
-
33
- """
34
- This file contains the default logic to build a dataloader for training or testing.
35
- """
36
-
37
- __all__ = [
38
- "build_batch_data_loader",
39
- "build_detection_train_loader",
40
- "build_detection_test_loader",
41
- "get_detection_dataset_dicts",
42
- "load_proposals_into_dataset",
43
- "print_instances_class_histogram",
44
- ]
45
-
46
-
47
- def filter_images_with_only_crowd_annotations(dataset_dicts):
48
- """
49
- Filter out images with none annotations or only crowd annotations
50
- (i.e., images without non-crowd annotations).
51
- A common training-time preprocessing on COCO dataset.
52
-
53
- Args:
54
- dataset_dicts (list[dict]): annotations in Detectron2 Dataset format.
55
-
56
- Returns:
57
- list[dict]: the same format, but filtered.
58
- """
59
- num_before = len(dataset_dicts)
60
-
61
- def valid(anns):
62
- for ann in anns:
63
- if ann.get("iscrowd", 0) == 0:
64
- return True
65
- return False
66
-
67
- dataset_dicts = [x for x in dataset_dicts if valid(x["annotations"])]
68
- num_after = len(dataset_dicts)
69
- logger = logging.getLogger(__name__)
70
- logger.info(
71
- "Removed {} images with no usable annotations. {} images left.".format(
72
- num_before - num_after, num_after
73
- )
74
- )
75
- print("Removed {} images with no usable annotations. {} images left.".format(
76
- num_before - num_after, num_after
77
- ))
78
- return dataset_dicts
79
-
80
-
81
- def filter_images_with_few_keypoints(dataset_dicts, min_keypoints_per_image):
82
- """
83
- Filter out images with too few number of keypoints.
84
-
85
- Args:
86
- dataset_dicts (list[dict]): annotations in Detectron2 Dataset format.
87
-
88
- Returns:
89
- list[dict]: the same format as dataset_dicts, but filtered.
90
- """
91
- num_before = len(dataset_dicts)
92
-
93
- def visible_keypoints_in_image(dic):
94
- # Each keypoints field has the format [x1, y1, v1, ...], where v is visibility
95
- annotations = dic["annotations"]
96
- return sum(
97
- (np.array(ann["keypoints"][2::3]) > 0).sum()
98
- for ann in annotations
99
- if "keypoints" in ann
100
- )
101
-
102
- dataset_dicts = [
103
- x for x in dataset_dicts if visible_keypoints_in_image(x) >= min_keypoints_per_image
104
- ]
105
- num_after = len(dataset_dicts)
106
- logger = logging.getLogger(__name__)
107
- logger.info(
108
- "Removed {} images with fewer than {} keypoints.".format(
109
- num_before - num_after, min_keypoints_per_image
110
- )
111
- )
112
- return dataset_dicts
113
-
114
-
115
- def load_proposals_into_dataset(dataset_dicts, proposal_file):
116
- """
117
- Load precomputed object proposals into the dataset.
118
-
119
- The proposal file should be a pickled dict with the following keys:
120
-
121
- - "ids": list[int] or list[str], the image ids
122
- - "boxes": list[np.ndarray], each is an Nx4 array of boxes corresponding to the image id
123
- - "objectness_logits": list[np.ndarray], each is an N sized array of objectness scores
124
- corresponding to the boxes.
125
- - "bbox_mode": the BoxMode of the boxes array. Defaults to ``BoxMode.XYXY_ABS``.
126
-
127
- Args:
128
- dataset_dicts (list[dict]): annotations in Detectron2 Dataset format.
129
- proposal_file (str): file path of pre-computed proposals, in pkl format.
130
-
131
- Returns:
132
- list[dict]: the same format as dataset_dicts, but added proposal field.
133
- """
134
- logger = logging.getLogger(__name__)
135
- logger.info("Loading proposals from: {}".format(proposal_file))
136
-
137
- with PathManager.open(proposal_file, "rb") as f:
138
- proposals = pickle.load(f, encoding="latin1")
139
-
140
- # Rename the key names in D1 proposal files
141
- rename_keys = {"indexes": "ids", "scores": "objectness_logits"}
142
- for key in rename_keys:
143
- if key in proposals:
144
- proposals[rename_keys[key]] = proposals.pop(key)
145
-
146
- # Fetch the indexes of all proposals that are in the dataset
147
- # Convert image_id to str since they could be int.
148
- img_ids = set({str(record["image_id"]) for record in dataset_dicts})
149
- id_to_index = {str(id): i for i, id in enumerate(proposals["ids"]) if str(id) in img_ids}
150
-
151
- # Assuming default bbox_mode of precomputed proposals are 'XYXY_ABS'
152
- bbox_mode = BoxMode(proposals["bbox_mode"]) if "bbox_mode" in proposals else BoxMode.XYXY_ABS
153
-
154
- for record in dataset_dicts:
155
- # Get the index of the proposal
156
- i = id_to_index[str(record["image_id"])]
157
-
158
- boxes = proposals["boxes"][i]
159
- objectness_logits = proposals["objectness_logits"][i]
160
- # Sort the proposals in descending order of the scores
161
- inds = objectness_logits.argsort()[::-1]
162
- record["proposal_boxes"] = boxes[inds]
163
- record["proposal_objectness_logits"] = objectness_logits[inds]
164
- record["proposal_bbox_mode"] = bbox_mode
165
-
166
- return dataset_dicts
167
-
168
-
169
- def print_instances_class_histogram(dataset_dicts, class_names):
170
- """
171
- Args:
172
- dataset_dicts (list[dict]): list of dataset dicts.
173
- class_names (list[str]): list of class names (zero-indexed).
174
- """
175
- num_classes = len(class_names)
176
- hist_bins = np.arange(num_classes + 1)
177
- histogram = np.zeros((num_classes,), dtype=np.int)
178
- for entry in dataset_dicts:
179
- annos = entry["annotations"]
180
- classes = np.asarray(
181
- [x["category_id"] for x in annos if not x.get("iscrowd", 0)], dtype=np.int
182
- )
183
- if len(classes):
184
- assert classes.min() >= 0, f"Got an invalid category_id={classes.min()}"
185
- assert (
186
- classes.max() < num_classes
187
- ), f"Got an invalid category_id={classes.max()} for a dataset of {num_classes} classes"
188
- histogram += np.histogram(classes, bins=hist_bins)[0]
189
-
190
- N_COLS = min(6, len(class_names) * 2)
191
-
192
- def short_name(x):
193
- # make long class names shorter. useful for lvis
194
- if len(x) > 13:
195
- return x[:11] + ".."
196
- return x
197
-
198
- data = list(
199
- itertools.chain(*[[short_name(class_names[i]), int(v)] for i, v in enumerate(histogram)])
200
- )
201
- total_num_instances = sum(data[1::2])
202
- data.extend([None] * (N_COLS - (len(data) % N_COLS)))
203
- if num_classes > 1:
204
- data.extend(["total", total_num_instances])
205
- data = itertools.zip_longest(*[data[i::N_COLS] for i in range(N_COLS)])
206
- table = tabulate(
207
- data,
208
- headers=["category", "#instances"] * (N_COLS // 2),
209
- tablefmt="pipe",
210
- numalign="left",
211
- stralign="center",
212
- )
213
- log_first_n(
214
- logging.INFO,
215
- "Distribution of instances among all {} categories:\n".format(num_classes)
216
- + colored(table, "cyan"),
217
- key="message",
218
- )
219
-
220
-
221
- def get_detection_dataset_dicts(
222
- names,
223
- filter_empty=True,
224
- min_keypoints=0,
225
- proposal_files=None,
226
- check_consistency=True,
227
- ):
228
- """
229
- Load and prepare dataset dicts for instance detection/segmentation and semantic segmentation.
230
-
231
- Args:
232
- names (str or list[str]): a dataset name or a list of dataset names
233
- filter_empty (bool): whether to filter out images without instance annotations
234
- min_keypoints (int): filter out images with fewer keypoints than
235
- `min_keypoints`. Set to 0 to do nothing.
236
- proposal_files (list[str]): if given, a list of object proposal files
237
- that match each dataset in `names`.
238
- check_consistency (bool): whether to check if datasets have consistent metadata.
239
-
240
- Returns:
241
- list[dict]: a list of dicts following the standard dataset dict format.
242
- """
243
- if isinstance(names, str):
244
- names = [names]
245
- assert len(names), names
246
- dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in names]
247
-
248
- if isinstance(dataset_dicts[0], torchdata.Dataset):
249
- if len(dataset_dicts) > 1:
250
- # ConcatDataset does not work for iterable style dataset.
251
- # We could support concat for iterable as well, but it's often
252
- # not a good idea to concat iterables anyway.
253
- return torchdata.ConcatDataset(dataset_dicts)
254
- return dataset_dicts[0]
255
-
256
- for dataset_name, dicts in zip(names, dataset_dicts):
257
- assert len(dicts), "Dataset '{}' is empty!".format(dataset_name)
258
-
259
- if proposal_files is not None:
260
- assert len(names) == len(proposal_files)
261
- # load precomputed proposals from proposal files
262
- dataset_dicts = [
263
- load_proposals_into_dataset(dataset_i_dicts, proposal_file)
264
- for dataset_i_dicts, proposal_file in zip(dataset_dicts, proposal_files)
265
- ]
266
-
267
- dataset_dicts = list(itertools.chain.from_iterable(dataset_dicts))
268
-
269
- has_instances = "annotations" in dataset_dicts[0]
270
- if filter_empty and has_instances:
271
- dataset_dicts = filter_images_with_only_crowd_annotations(dataset_dicts)
272
- if min_keypoints > 0 and has_instances:
273
- dataset_dicts = filter_images_with_few_keypoints(dataset_dicts, min_keypoints)
274
-
275
- if check_consistency and has_instances:
276
- try:
277
- class_names = MetadataCatalog.get(names[0]).thing_classes
278
- check_metadata_consistency("thing_classes", names)
279
- print_instances_class_histogram(dataset_dicts, class_names)
280
- except AttributeError: # class names are not available for this dataset
281
- pass
282
-
283
- assert len(dataset_dicts), "No valid data found in {}.".format(",".join(names))
284
- return dataset_dicts
285
-
286
-
287
- def build_batch_data_loader(
288
- dataset,
289
- sampler,
290
- total_batch_size,
291
- *,
292
- aspect_ratio_grouping=False,
293
- num_workers=0,
294
- collate_fn=None,
295
- ):
296
- """
297
- Build a batched dataloader. The main differences from `torch.utils.data.DataLoader` are:
298
- 1. support aspect ratio grouping options
299
- 2. use no "batch collation", because this is common for detection training
300
-
301
- Args:
302
- dataset (torch.utils.data.Dataset): a pytorch map-style or iterable dataset.
303
- sampler (torch.utils.data.sampler.Sampler or None): a sampler that produces indices.
304
- Must be provided iff. ``dataset`` is a map-style dataset.
305
- total_batch_size, aspect_ratio_grouping, num_workers, collate_fn: see
306
- :func:`build_detection_train_loader`.
307
-
308
- Returns:
309
- iterable[list]. Length of each list is the batch size of the current
310
- GPU. Each element in the list comes from the dataset.
311
- """
312
- world_size = get_world_size()
313
- assert (
314
- total_batch_size > 0 and total_batch_size % world_size == 0
315
- ), "Total batch size ({}) must be divisible by the number of gpus ({}).".format(
316
- total_batch_size, world_size
317
- )
318
- batch_size = total_batch_size // world_size
319
-
320
- if isinstance(dataset, torchdata.IterableDataset):
321
- assert sampler is None, "sampler must be None if dataset is IterableDataset"
322
- else:
323
- dataset = ToIterableDataset(dataset, sampler)
324
-
325
- if aspect_ratio_grouping:
326
- data_loader = torchdata.DataLoader(
327
- dataset,
328
- num_workers=num_workers,
329
- collate_fn=operator.itemgetter(0), # don't batch, but yield individual elements
330
- worker_init_fn=worker_init_reset_seed,
331
- ) # yield individual mapped dict
332
- data_loader = AspectRatioGroupedDataset(data_loader, batch_size)
333
- if collate_fn is None:
334
- return data_loader
335
- return MapDataset(data_loader, collate_fn)
336
- else:
337
- return torchdata.DataLoader(
338
- dataset,
339
- batch_size=batch_size,
340
- drop_last=True,
341
- num_workers=num_workers,
342
- collate_fn=trivial_batch_collator if collate_fn is None else collate_fn,
343
- worker_init_fn=worker_init_reset_seed,
344
- )
345
-
346
-
347
- def _train_loader_from_config(cfg, mapper=None, *, dataset=None, sampler=None):
348
- if dataset is None:
349
- dataset = get_detection_dataset_dicts(
350
- cfg.DATASETS.TRAIN,
351
- filter_empty=cfg.DATALOADER.FILTER_EMPTY_ANNOTATIONS,
352
- min_keypoints=cfg.MODEL.ROI_KEYPOINT_HEAD.MIN_KEYPOINTS_PER_IMAGE
353
- if cfg.MODEL.KEYPOINT_ON
354
- else 0,
355
- proposal_files=cfg.DATASETS.PROPOSAL_FILES_TRAIN if cfg.MODEL.LOAD_PROPOSALS else None,
356
- )
357
- _log_api_usage("dataset." + cfg.DATASETS.TRAIN[0])
358
-
359
- if mapper is None:
360
- mapper = DatasetMapper(cfg, True)
361
-
362
- if sampler is None:
363
- sampler_name = cfg.DATALOADER.SAMPLER_TRAIN
364
- logger = logging.getLogger(__name__)
365
- if isinstance(dataset, torchdata.IterableDataset):
366
- logger.info("Not using any sampler since the dataset is IterableDataset.")
367
- sampler = None
368
- else:
369
- logger.info("Using training sampler {}".format(sampler_name))
370
- if sampler_name == "TrainingSampler":
371
- sampler = TrainingSampler(len(dataset))
372
- elif sampler_name == "RepeatFactorTrainingSampler":
373
- repeat_factors = RepeatFactorTrainingSampler.repeat_factors_from_category_frequency(
374
- dataset, cfg.DATALOADER.REPEAT_THRESHOLD
375
- )
376
- sampler = RepeatFactorTrainingSampler(repeat_factors)
377
- elif sampler_name == "RandomSubsetTrainingSampler":
378
- sampler = RandomSubsetTrainingSampler(
379
- len(dataset), cfg.DATALOADER.RANDOM_SUBSET_RATIO
380
- )
381
- else:
382
- raise ValueError("Unknown training sampler: {}".format(sampler_name))
383
-
384
- return {
385
- "dataset": dataset,
386
- "sampler": sampler,
387
- "mapper": mapper,
388
- "total_batch_size": cfg.SOLVER.IMS_PER_BATCH,
389
- "aspect_ratio_grouping": cfg.DATALOADER.ASPECT_RATIO_GROUPING,
390
- "num_workers": cfg.DATALOADER.NUM_WORKERS,
391
- }
392
-
393
-
394
- @configurable(from_config=_train_loader_from_config)
395
- def build_detection_train_loader(
396
- dataset,
397
- *,
398
- mapper,
399
- sampler=None,
400
- total_batch_size,
401
- aspect_ratio_grouping=True,
402
- num_workers=0,
403
- collate_fn=None,
404
- ):
405
- """
406
- Build a dataloader for object detection with some default features.
407
-
408
- Args:
409
- dataset (list or torch.utils.data.Dataset): a list of dataset dicts,
410
- or a pytorch dataset (either map-style or iterable). It can be obtained
411
- by using :func:`DatasetCatalog.get` or :func:`get_detection_dataset_dicts`.
412
- mapper (callable): a callable which takes a sample (dict) from dataset and
413
- returns the format to be consumed by the model.
414
- When using cfg, the default choice is ``DatasetMapper(cfg, is_train=True)``.
415
- sampler (torch.utils.data.sampler.Sampler or None): a sampler that produces
416
- indices to be applied on ``dataset``.
417
- If ``dataset`` is map-style, the default sampler is a :class:`TrainingSampler`,
418
- which coordinates an infinite random shuffle sequence across all workers.
419
- Sampler must be None if ``dataset`` is iterable.
420
- total_batch_size (int): total batch size across all workers.
421
- aspect_ratio_grouping (bool): whether to group images with similar
422
- aspect ratio for efficiency. When enabled, it requires each
423
- element in dataset be a dict with keys "width" and "height".
424
- num_workers (int): number of parallel data loading workers
425
- collate_fn: a function that determines how to do batching, same as the argument of
426
- `torch.utils.data.DataLoader`. Defaults to do no collation and return a list of
427
- data. No collation is OK for small batch size and simple data structures.
428
- If your batch size is large and each sample contains too many small tensors,
429
- it's more efficient to collate them in data loader.
430
-
431
- Returns:
432
- torch.utils.data.DataLoader:
433
- a dataloader. Each output from it is a ``list[mapped_element]`` of length
434
- ``total_batch_size / num_workers``, where ``mapped_element`` is produced
435
- by the ``mapper``.
436
- """
437
- if isinstance(dataset, list):
438
- dataset = DatasetFromList(dataset, copy=False)
439
- if mapper is not None:
440
- dataset = MapDataset(dataset, mapper)
441
-
442
- if isinstance(dataset, torchdata.IterableDataset):
443
- assert sampler is None, "sampler must be None if dataset is IterableDataset"
444
- else:
445
- if sampler is None:
446
- sampler = TrainingSampler(len(dataset))
447
- assert isinstance(sampler, torchdata.Sampler), f"Expect a Sampler but got {type(sampler)}"
448
- return build_batch_data_loader(
449
- dataset,
450
- sampler,
451
- total_batch_size,
452
- aspect_ratio_grouping=aspect_ratio_grouping,
453
- num_workers=num_workers,
454
- collate_fn=collate_fn,
455
- )
456
-
457
-
458
- def _test_loader_from_config(cfg, dataset_name, mapper=None):
459
- """
460
- Uses the given `dataset_name` argument (instead of the names in cfg), because the
461
- standard practice is to evaluate each test set individually (not combining them).
462
- """
463
- if isinstance(dataset_name, str):
464
- dataset_name = [dataset_name]
465
-
466
- dataset = get_detection_dataset_dicts(
467
- dataset_name,
468
- filter_empty=False,
469
- proposal_files=[
470
- cfg.DATASETS.PROPOSAL_FILES_TEST[list(cfg.DATASETS.TEST).index(x)] for x in dataset_name
471
- ]
472
- if cfg.MODEL.LOAD_PROPOSALS
473
- else None,
474
- )
475
- if mapper is None:
476
- mapper = DatasetMapper(cfg, False)
477
- return {
478
- "dataset": dataset,
479
- "mapper": mapper,
480
- "num_workers": cfg.DATALOADER.NUM_WORKERS,
481
- "sampler": InferenceSampler(len(dataset))
482
- if not isinstance(dataset, torchdata.IterableDataset)
483
- else None,
484
- }
485
-
486
-
487
- @configurable(from_config=_test_loader_from_config)
488
- def build_detection_test_loader(
489
- dataset: Union[List[Any], torchdata.Dataset],
490
- *,
491
- mapper: Callable[[Dict[str, Any]], Any],
492
- sampler: Optional[torchdata.Sampler] = None,
493
- batch_size: int = 1,
494
- num_workers: int = 0,
495
- collate_fn: Optional[Callable[[List[Any]], Any]] = None,
496
- ) -> torchdata.DataLoader:
497
- """
498
- Similar to `build_detection_train_loader`, with default batch size = 1,
499
- and sampler = :class:`InferenceSampler`. This sampler coordinates all workers
500
- to produce the exact set of all samples.
501
-
502
- Args:
503
- dataset: a list of dataset dicts,
504
- or a pytorch dataset (either map-style or iterable). They can be obtained
505
- by using :func:`DatasetCatalog.get` or :func:`get_detection_dataset_dicts`.
506
- mapper: a callable which takes a sample (dict) from dataset
507
- and returns the format to be consumed by the model.
508
- When using cfg, the default choice is ``DatasetMapper(cfg, is_train=False)``.
509
- sampler: a sampler that produces
510
- indices to be applied on ``dataset``. Default to :class:`InferenceSampler`,
511
- which splits the dataset across all workers. Sampler must be None
512
- if `dataset` is iterable.
513
- batch_size: the batch size of the data loader to be created.
514
- Default to 1 image per worker since this is the standard when reporting
515
- inference time in papers.
516
- num_workers: number of parallel data loading workers
517
- collate_fn: same as the argument of `torch.utils.data.DataLoader`.
518
- Defaults to do no collation and return a list of data.
519
-
520
- Returns:
521
- DataLoader: a torch DataLoader, that loads the given detection
522
- dataset, with test-time transformation and batching.
523
-
524
- Examples:
525
- ::
526
- data_loader = build_detection_test_loader(
527
- DatasetRegistry.get("my_test"),
528
- mapper=DatasetMapper(...))
529
-
530
- # or, instantiate with a CfgNode:
531
- data_loader = build_detection_test_loader(cfg, "my_test")
532
- """
533
- if isinstance(dataset, list):
534
- dataset = DatasetFromList(dataset, copy=False)
535
- if mapper is not None:
536
- dataset = MapDataset(dataset, mapper)
537
- if isinstance(dataset, torchdata.IterableDataset):
538
- assert sampler is None, "sampler must be None if dataset is IterableDataset"
539
- else:
540
- if sampler is None:
541
- sampler = InferenceSampler(len(dataset))
542
- return torchdata.DataLoader(
543
- dataset,
544
- batch_size=batch_size,
545
- sampler=sampler,
546
- drop_last=False,
547
- num_workers=num_workers,
548
- collate_fn=trivial_batch_collator if collate_fn is None else collate_fn,
549
- )
550
-
551
-
552
- def trivial_batch_collator(batch):
553
- """
554
- A batch collator that does nothing.
555
- """
556
- return batch
557
-
558
-
559
- def worker_init_reset_seed(worker_id):
560
- initial_seed = torch.initial_seed() % 2**31
561
- seed_all_rng(initial_seed + worker_id)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/data/dataset_mapper.py DELETED
@@ -1,193 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/data/dataset_mapper.py
3
-
4
- import copy
5
- import logging
6
- import numpy as np
7
- from typing import List, Optional, Union
8
- import torch
9
-
10
- from detectron2.config import configurable
11
-
12
- import data.detection_utils as utils
13
- import data.transforms as T
14
-
15
- """
16
- This file contains the default mapping that's applied to "dataset dicts".
17
- """
18
-
19
- __all__ = ["DatasetMapper"]
20
-
21
-
22
- class DatasetMapper:
23
- """
24
- A callable which takes a dataset dict in Detectron2 Dataset format,
25
- and map it into a format used by the model.
26
-
27
- This is the default callable to be used to map your dataset dict into training data.
28
- You may need to follow it to implement your own one for customized logic,
29
- such as a different way to read or transform images.
30
- See :doc:`/tutorials/data_loading` for details.
31
-
32
- The callable currently does the following:
33
-
34
- 1. Read the image from "file_name"
35
- 2. Applies cropping/geometric transforms to the image and annotations
36
- 3. Prepare data and annotations to Tensor and :class:`Instances`
37
- """
38
-
39
- @configurable
40
- def __init__(
41
- self,
42
- is_train: bool,
43
- *,
44
- augmentations: List[Union[T.Augmentation, T.Transform]],
45
- image_format: str,
46
- use_instance_mask: bool = False,
47
- use_keypoint: bool = False,
48
- instance_mask_format: str = "polygon",
49
- keypoint_hflip_indices: Optional[np.ndarray] = None,
50
- precomputed_proposal_topk: Optional[int] = None,
51
- recompute_boxes: bool = False,
52
- ):
53
- """
54
- NOTE: this interface is experimental.
55
-
56
- Args:
57
- is_train: whether it's used in training or inference
58
- augmentations: a list of augmentations or deterministic transforms to apply
59
- image_format: an image format supported by :func:`detection_utils.read_image`.
60
- use_instance_mask: whether to process instance segmentation annotations, if available
61
- use_keypoint: whether to process keypoint annotations if available
62
- instance_mask_format: one of "polygon" or "bitmask". Process instance segmentation
63
- masks into this format.
64
- keypoint_hflip_indices: see :func:`detection_utils.create_keypoint_hflip_indices`
65
- precomputed_proposal_topk: if given, will load pre-computed
66
- proposals from dataset_dict and keep the top k proposals for each image.
67
- recompute_boxes: whether to overwrite bounding box annotations
68
- by computing tight bounding boxes from instance mask annotations.
69
- """
70
- if recompute_boxes:
71
- assert use_instance_mask, "recompute_boxes requires instance masks"
72
- # fmt: off
73
- self.is_train = is_train
74
- self.augmentations = T.AugmentationList(augmentations)
75
- self.image_format = image_format
76
- self.use_instance_mask = use_instance_mask
77
- self.instance_mask_format = instance_mask_format
78
- self.use_keypoint = use_keypoint
79
- self.keypoint_hflip_indices = keypoint_hflip_indices
80
- self.proposal_topk = precomputed_proposal_topk
81
- self.recompute_boxes = recompute_boxes
82
- # fmt: on
83
- logger = logging.getLogger(__name__)
84
- mode = "training" if is_train else "inference"
85
- logger.info(f"[DatasetMapper] Augmentations used in {mode}: {augmentations}")
86
-
87
- @classmethod
88
- def from_config(cls, cfg, is_train: bool = True):
89
- augs = utils.build_augmentation(cfg, is_train)
90
- if cfg.INPUT.CROP.ENABLED and is_train:
91
- augs.insert(0, T.RandomCrop(cfg.INPUT.CROP.TYPE, cfg.INPUT.CROP.SIZE))
92
- recompute_boxes = cfg.MODEL.MASK_ON
93
- else:
94
- recompute_boxes = False
95
-
96
- ret = {
97
- "is_train": is_train,
98
- "augmentations": augs,
99
- "image_format": cfg.INPUT.FORMAT,
100
- "use_instance_mask": cfg.MODEL.MASK_ON,
101
- "instance_mask_format": cfg.INPUT.MASK_FORMAT,
102
- "use_keypoint": cfg.MODEL.KEYPOINT_ON,
103
- "recompute_boxes": recompute_boxes,
104
- }
105
-
106
- if cfg.MODEL.KEYPOINT_ON:
107
- ret["keypoint_hflip_indices"] = utils.create_keypoint_hflip_indices(cfg.DATASETS.TRAIN)
108
-
109
- if cfg.MODEL.LOAD_PROPOSALS:
110
- ret["precomputed_proposal_topk"] = (
111
- cfg.DATASETS.PRECOMPUTED_PROPOSAL_TOPK_TRAIN
112
- if is_train
113
- else cfg.DATASETS.PRECOMPUTED_PROPOSAL_TOPK_TEST
114
- )
115
- return ret
116
-
117
- def _transform_annotations(self, dataset_dict, transforms, image_shape):
118
- # USER: Modify this if you want to keep them for some reason.
119
- for anno in dataset_dict["annotations"]:
120
- if not self.use_instance_mask:
121
- anno.pop("segmentation", None)
122
- if not self.use_keypoint:
123
- anno.pop("keypoints", None)
124
-
125
- # USER: Implement additional transformations if you have other types of data
126
- annos = [
127
- utils.transform_instance_annotations(
128
- obj, transforms, image_shape, keypoint_hflip_indices=self.keypoint_hflip_indices
129
- )
130
- for obj in dataset_dict.pop("annotations")
131
- if obj.get("iscrowd", 0) == 0
132
- ]
133
- instances = utils.annotations_to_instances(
134
- annos, image_shape, mask_format=self.instance_mask_format
135
- )
136
-
137
- # After transforms such as cropping are applied, the bounding box may no longer
138
- # tightly bound the object. As an example, imagine a triangle object
139
- # [(0,0), (2,0), (0,2)] cropped by a box [(1,0),(2,2)] (XYXY format). The tight
140
- # bounding box of the cropped triangle should be [(1,0),(2,1)], which is not equal to
141
- # the intersection of original bounding box and the cropping box.
142
- if self.recompute_boxes:
143
- instances.gt_boxes = instances.gt_masks.get_bounding_boxes()
144
- dataset_dict["instances"] = utils.filter_empty_instances(instances)
145
-
146
- def __call__(self, dataset_dict):
147
- """
148
- Args:
149
- dataset_dict (dict): Metadata of one image, in Detectron2 Dataset format.
150
-
151
- Returns:
152
- dict: a format that builtin models in detectron2 accept
153
- """
154
- dataset_dict = copy.deepcopy(dataset_dict) # it will be modified by code below
155
- # USER: Write your own image loading if it's not from a file
156
- image = utils.read_image(dataset_dict["file_name"], format=self.image_format)
157
- utils.check_image_size(dataset_dict, image)
158
-
159
- # USER: Remove if you don't do semantic/panoptic segmentation.
160
- if "sem_seg_file_name" in dataset_dict:
161
- sem_seg_gt = utils.read_image(dataset_dict.pop("sem_seg_file_name"), "L").squeeze(2)
162
- else:
163
- sem_seg_gt = None
164
-
165
- aug_input = T.AugInput(image, sem_seg=sem_seg_gt)
166
- transforms = self.augmentations(aug_input)
167
- image, sem_seg_gt = aug_input.image, aug_input.sem_seg
168
-
169
- image_shape = image.shape[:2] # h, w
170
- # Pytorch's dataloader is efficient on torch.Tensor due to shared-memory,
171
- # but not efficient on large generic data structures due to the use of pickle & mp.Queue.
172
- # Therefore it's important to use torch.Tensor.
173
- dataset_dict["image"] = torch.as_tensor(np.ascontiguousarray(image.transpose(2, 0, 1)))
174
- if sem_seg_gt is not None:
175
- dataset_dict["sem_seg"] = torch.as_tensor(sem_seg_gt.astype("long"))
176
-
177
- # USER: Remove if you don't use pre-computed proposals.
178
- # Most users would not need this feature.
179
- if self.proposal_topk is not None:
180
- utils.transform_proposals(
181
- dataset_dict, image_shape, transforms, proposal_topk=self.proposal_topk
182
- )
183
-
184
- if not self.is_train:
185
- # USER: Modify this if you want to keep them for some reason.
186
- dataset_dict.pop("annotations", None)
187
- dataset_dict.pop("sem_seg_file_name", None)
188
- return dataset_dict
189
-
190
- if "annotations" in dataset_dict:
191
- self._transform_annotations(dataset_dict, transforms, image_shape)
192
-
193
- return dataset_dict
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/data/datasets/__init__.py DELETED
@@ -1,16 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- from .coco import load_coco_json, load_sem_seg, register_coco_instances, convert_to_coco_json
3
- from .builtin import (
4
- register_all_imagenet,
5
- register_all_uvo,
6
- register_all_coco_ca,
7
- register_all_coco_semi,
8
- register_all_lvis,
9
- register_all_voc,
10
- register_all_cross_domain,
11
- register_all_kitti,
12
- register_all_objects365,
13
- register_all_openimages,
14
- )
15
-
16
- __all__ = [k for k in globals().keys() if not k.startswith("_")]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/data/datasets/builtin.py DELETED
@@ -1,216 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/data/datasets/builtin.py
3
-
4
- """
5
- This file registers pre-defined datasets at hard-coded paths, and their metadata.
6
-
7
- We hard-code metadata for common datasets. This will enable:
8
- 1. Consistency check when loading the datasets
9
- 2. Use models on these standard datasets directly and run demos,
10
- without having to download the dataset annotations
11
-
12
- We hard-code some paths to the dataset that's assumed to
13
- exist in "./datasets/".
14
-
15
- Users SHOULD NOT use this file to create new dataset / metadata for new dataset.
16
- To add new dataset, refer to the tutorial "docs/DATASETS.md".
17
- """
18
-
19
- import os
20
-
21
- from .builtin_meta import _get_builtin_metadata
22
- from .coco import register_coco_instances
23
-
24
- # ==== Predefined datasets and splits for COCO ==========
25
-
26
- _PREDEFINED_SPLITS_COCO_SEMI = {}
27
- _PREDEFINED_SPLITS_COCO_SEMI["coco_semi"] = {
28
- # we use seed 42 to be consistent with previous works on SSL detection and segmentation
29
- "coco_semi_1perc": ("coco/train2017", "coco/annotations/1perc_instances_train2017.json"),
30
- "coco_semi_2perc": ("coco/train2017", "coco/annotations/2perc_instances_train2017.json"),
31
- "coco_semi_5perc": ("coco/train2017", "coco/annotations/5perc_instances_train2017.json"),
32
- "coco_semi_10perc": ("coco/train2017", "coco/annotations/10perc_instances_train2017.json"),
33
- "coco_semi_20perc": ("coco/train2017", "coco/annotations/20perc_instances_train2017.json"),
34
- "coco_semi_30perc": ("coco/train2017", "coco/annotations/30perc_instances_train2017.json"),
35
- "coco_semi_40perc": ("coco/train2017", "coco/annotations/40perc_instances_train2017.json"),
36
- "coco_semi_50perc": ("coco/train2017", "coco/annotations/50perc_instances_train2017.json"),
37
- "coco_semi_60perc": ("coco/train2017", "coco/annotations/60perc_instances_train2017.json"),
38
- "coco_semi_80perc": ("coco/train2017", "coco/annotations/80perc_instances_train2017.json"),
39
- }
40
-
41
- _PREDEFINED_SPLITS_COCO_CA = {}
42
- _PREDEFINED_SPLITS_COCO_CA["coco_cls_agnostic"] = {
43
- "cls_agnostic_coco": ("coco/val2017", "coco/annotations/coco_cls_agnostic_instances_val2017.json"),
44
- "cls_agnostic_coco20k": ("coco/train2014", "coco/annotations/coco20k_trainval_gt.json"),
45
- }
46
-
47
- _PREDEFINED_SPLITS_IMAGENET = {}
48
- _PREDEFINED_SPLITS_IMAGENET["imagenet"] = {
49
- # maskcut annotations
50
- "imagenet_train": ("imagenet/train", "imagenet/annotations/imagenet_train_fixsize480_tau0.15_N3.json"),
51
- # self-training round 1
52
- "imagenet_train_r1": ("imagenet/train", "imagenet/annotations/cutler_imagenet1k_train_r1.json"),
53
- # self-training round 2
54
- "imagenet_train_r2": ("imagenet/train", "imagenet/annotations/cutler_imagenet1k_train_r2.json"),
55
- # self-training round 3
56
- "imagenet_train_r3": ("imagenet/train", "imagenet/annotations/cutler_imagenet1k_train_r3.json"),
57
- }
58
-
59
- _PREDEFINED_SPLITS_VOC = {}
60
- _PREDEFINED_SPLITS_VOC["voc"] = {
61
- 'cls_agnostic_voc': ("voc/", "voc/annotations/trainvaltest_2007_cls_agnostic.json"),
62
- }
63
-
64
- _PREDEFINED_SPLITS_CROSSDOMAIN = {}
65
- _PREDEFINED_SPLITS_CROSSDOMAIN["cross_domain"] = {
66
- 'cls_agnostic_clipart': ("clipart/", "clipart/annotations/traintest_cls_agnostic.json"),
67
- 'cls_agnostic_watercolor': ("watercolor/", "watercolor/annotations/traintest_cls_agnostic.json"),
68
- 'cls_agnostic_comic': ("comic/", "comic/annotations/traintest_cls_agnostic.json"),
69
- }
70
-
71
- _PREDEFINED_SPLITS_KITTI = {}
72
- _PREDEFINED_SPLITS_KITTI["kitti"] = {
73
- 'cls_agnostic_kitti': ("kitti/", "kitti/annotations/trainval_cls_agnostic.json"),
74
- }
75
-
76
- _PREDEFINED_SPLITS_LVIS = {}
77
- _PREDEFINED_SPLITS_LVIS["lvis"] = {
78
- "cls_agnostic_lvis": ("coco/", "coco/annotations/lvis1.0_cocofied_val_cls_agnostic.json"),
79
- }
80
-
81
- _PREDEFINED_SPLITS_OBJECTS365 = {}
82
- _PREDEFINED_SPLITS_OBJECTS365["objects365"] = {
83
- 'cls_agnostic_objects365': ("objects365/val", "objects365/annotations/zhiyuan_objv2_val_cls_agnostic.json"),
84
- }
85
-
86
- _PREDEFINED_SPLITS_OpenImages = {}
87
- _PREDEFINED_SPLITS_OpenImages["openimages"] = {
88
- 'cls_agnostic_openimages': ("openImages/validation", "openImages/annotations/openimages_val_cls_agnostic.json"),
89
- }
90
-
91
- _PREDEFINED_SPLITS_UVO = {}
92
- _PREDEFINED_SPLITS_UVO["uvo"] = {
93
- "cls_agnostic_uvo": ("uvo/all_UVO_frames", "uvo/annotations/val_sparse_cleaned_cls_agnostic.json"),
94
- }
95
-
96
- def register_all_imagenet(root):
97
- for dataset_name, splits_per_dataset in _PREDEFINED_SPLITS_IMAGENET.items():
98
- for key, (image_root, json_file) in splits_per_dataset.items():
99
- # Assume pre-defined datasets live in `./datasets`.
100
- register_coco_instances(
101
- key,
102
- _get_builtin_metadata(dataset_name),
103
- os.path.join(root, json_file) if "://" not in json_file else json_file,
104
- os.path.join(root, image_root),
105
- )
106
-
107
- def register_all_voc(root):
108
- for dataset_name, splits_per_dataset in _PREDEFINED_SPLITS_VOC.items():
109
- for key, (image_root, json_file) in splits_per_dataset.items():
110
- # Assume pre-defined datasets live in `./datasets`.
111
- register_coco_instances(
112
- key,
113
- _get_builtin_metadata(dataset_name),
114
- os.path.join(root, json_file) if "://" not in json_file else json_file,
115
- os.path.join(root, image_root),
116
- )
117
-
118
- def register_all_cross_domain(root):
119
- for dataset_name, splits_per_dataset in _PREDEFINED_SPLITS_CROSSDOMAIN.items():
120
- for key, (image_root, json_file) in splits_per_dataset.items():
121
- # Assume pre-defined datasets live in `./datasets`.
122
- register_coco_instances(
123
- key,
124
- _get_builtin_metadata(dataset_name),
125
- os.path.join(root, json_file) if "://" not in json_file else json_file,
126
- os.path.join(root, image_root),
127
- )
128
-
129
- def register_all_kitti(root):
130
- for dataset_name, splits_per_dataset in _PREDEFINED_SPLITS_KITTI.items():
131
- for key, (image_root, json_file) in splits_per_dataset.items():
132
- # Assume pre-defined datasets live in `./datasets`.
133
- register_coco_instances(
134
- key,
135
- _get_builtin_metadata(dataset_name),
136
- os.path.join(root, json_file) if "://" not in json_file else json_file,
137
- os.path.join(root, image_root),
138
- )
139
-
140
- def register_all_objects365(root):
141
- for dataset_name, splits_per_dataset in _PREDEFINED_SPLITS_OBJECTS365.items():
142
- for key, (image_root, json_file) in splits_per_dataset.items():
143
- # Assume pre-defined datasets live in `./datasets`.
144
- register_coco_instances(
145
- key,
146
- _get_builtin_metadata(dataset_name),
147
- os.path.join(root, json_file) if "://" not in json_file else json_file,
148
- os.path.join(root, image_root),
149
- )
150
-
151
- def register_all_openimages(root):
152
- for dataset_name, splits_per_dataset in _PREDEFINED_SPLITS_OpenImages.items():
153
- for key, (image_root, json_file) in splits_per_dataset.items():
154
- # Assume pre-defined datasets live in `./datasets`.
155
- register_coco_instances(
156
- key,
157
- _get_builtin_metadata(dataset_name),
158
- os.path.join(root, json_file) if "://" not in json_file else json_file,
159
- os.path.join(root, image_root),
160
- )
161
-
162
- def register_all_lvis(root):
163
- for dataset_name, splits_per_dataset in _PREDEFINED_SPLITS_LVIS.items():
164
- for key, (image_root, json_file) in splits_per_dataset.items():
165
- # Assume pre-defined datasets live in `./datasets`.
166
- register_coco_instances(
167
- key,
168
- _get_builtin_metadata(dataset_name),
169
- os.path.join(root, json_file) if "://" not in json_file else json_file,
170
- os.path.join(root, image_root),
171
- )
172
-
173
- def register_all_uvo(root):
174
- for dataset_name, splits_per_dataset in _PREDEFINED_SPLITS_UVO.items():
175
- for key, (image_root, json_file) in splits_per_dataset.items():
176
- # Assume pre-defined datasets live in `./datasets`.
177
- register_coco_instances(
178
- key,
179
- _get_builtin_metadata(dataset_name),
180
- os.path.join(root, json_file) if "://" not in json_file else json_file,
181
- os.path.join(root, image_root),
182
- )
183
-
184
- def register_all_coco_semi(root):
185
- for dataset_name, splits_per_dataset in _PREDEFINED_SPLITS_COCO_SEMI.items():
186
- for key, (image_root, json_file) in splits_per_dataset.items():
187
- # Assume pre-defined datasets live in `./datasets`.
188
- register_coco_instances(
189
- key,
190
- _get_builtin_metadata(dataset_name),
191
- os.path.join(root, json_file) if "://" not in json_file else json_file,
192
- os.path.join(root, image_root),
193
- )
194
-
195
- def register_all_coco_ca(root):
196
- for dataset_name, splits_per_dataset in _PREDEFINED_SPLITS_COCO_CA.items():
197
- for key, (image_root, json_file) in splits_per_dataset.items():
198
- # Assume pre-defined datasets live in `./datasets`.
199
- register_coco_instances(
200
- key,
201
- _get_builtin_metadata(dataset_name),
202
- os.path.join(root, json_file) if "://" not in json_file else json_file,
203
- os.path.join(root, image_root),
204
- )
205
-
206
- _root = os.path.expanduser(os.getenv("DETECTRON2_DATASETS", "datasets"))
207
- register_all_coco_semi(_root)
208
- register_all_coco_ca(_root)
209
- register_all_imagenet(_root)
210
- register_all_uvo(_root)
211
- register_all_voc(_root)
212
- register_all_cross_domain(_root)
213
- register_all_kitti(_root)
214
- register_all_openimages(_root)
215
- register_all_objects365(_root)
216
- register_all_lvis(_root)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/data/datasets/builtin_meta.py DELETED
@@ -1,389 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/data/datasets/builtin_meta.py
3
-
4
- """
5
- Note:
6
- For your custom dataset, there is no need to hard-code metadata anywhere in the code.
7
- For example, for COCO-format dataset, metadata will be obtained automatically
8
- when calling `load_coco_json`. For other dataset, metadata may also be obtained in other ways
9
- during loading.
10
-
11
- However, we hard-coded metadata for a few common dataset here.
12
- The only goal is to allow users who don't have these dataset to use pre-trained models.
13
- Users don't have to download a COCO json (which contains metadata), in order to visualize a
14
- COCO model (with correct class names and colors).
15
- """
16
-
17
-
18
- # All coco categories, together with their nice-looking visualization colors
19
- # It's from https://github.com/cocodataset/panopticapi/blob/master/panoptic_coco_categories.json
20
- COCO_CATEGORIES = [
21
- {"color": [220, 20, 60], "isthing": 1, "id": 1, "name": "person"},
22
- {"color": [119, 11, 32], "isthing": 1, "id": 2, "name": "bicycle"},
23
- {"color": [0, 0, 142], "isthing": 1, "id": 3, "name": "car"},
24
- {"color": [0, 0, 230], "isthing": 1, "id": 4, "name": "motorcycle"},
25
- {"color": [106, 0, 228], "isthing": 1, "id": 5, "name": "airplane"},
26
- {"color": [0, 60, 100], "isthing": 1, "id": 6, "name": "bus"},
27
- {"color": [0, 80, 100], "isthing": 1, "id": 7, "name": "train"},
28
- {"color": [0, 0, 70], "isthing": 1, "id": 8, "name": "truck"},
29
- {"color": [0, 0, 192], "isthing": 1, "id": 9, "name": "boat"},
30
- {"color": [250, 170, 30], "isthing": 1, "id": 10, "name": "traffic light"},
31
- {"color": [100, 170, 30], "isthing": 1, "id": 11, "name": "fire hydrant"},
32
- {"color": [220, 220, 0], "isthing": 1, "id": 13, "name": "stop sign"},
33
- {"color": [175, 116, 175], "isthing": 1, "id": 14, "name": "parking meter"},
34
- {"color": [250, 0, 30], "isthing": 1, "id": 15, "name": "bench"},
35
- {"color": [165, 42, 42], "isthing": 1, "id": 16, "name": "bird"},
36
- {"color": [255, 77, 255], "isthing": 1, "id": 17, "name": "cat"},
37
- {"color": [0, 226, 252], "isthing": 1, "id": 18, "name": "dog"},
38
- {"color": [182, 182, 255], "isthing": 1, "id": 19, "name": "horse"},
39
- {"color": [0, 82, 0], "isthing": 1, "id": 20, "name": "sheep"},
40
- {"color": [120, 166, 157], "isthing": 1, "id": 21, "name": "cow"},
41
- {"color": [110, 76, 0], "isthing": 1, "id": 22, "name": "elephant"},
42
- {"color": [174, 57, 255], "isthing": 1, "id": 23, "name": "bear"},
43
- {"color": [199, 100, 0], "isthing": 1, "id": 24, "name": "zebra"},
44
- {"color": [72, 0, 118], "isthing": 1, "id": 25, "name": "giraffe"},
45
- {"color": [255, 179, 240], "isthing": 1, "id": 27, "name": "backpack"},
46
- {"color": [0, 125, 92], "isthing": 1, "id": 28, "name": "umbrella"},
47
- {"color": [209, 0, 151], "isthing": 1, "id": 31, "name": "handbag"},
48
- {"color": [188, 208, 182], "isthing": 1, "id": 32, "name": "tie"},
49
- {"color": [0, 220, 176], "isthing": 1, "id": 33, "name": "suitcase"},
50
- {"color": [255, 99, 164], "isthing": 1, "id": 34, "name": "frisbee"},
51
- {"color": [92, 0, 73], "isthing": 1, "id": 35, "name": "skis"},
52
- {"color": [133, 129, 255], "isthing": 1, "id": 36, "name": "snowboard"},
53
- {"color": [78, 180, 255], "isthing": 1, "id": 37, "name": "sports ball"},
54
- {"color": [0, 228, 0], "isthing": 1, "id": 38, "name": "kite"},
55
- {"color": [174, 255, 243], "isthing": 1, "id": 39, "name": "baseball bat"},
56
- {"color": [45, 89, 255], "isthing": 1, "id": 40, "name": "baseball glove"},
57
- {"color": [134, 134, 103], "isthing": 1, "id": 41, "name": "skateboard"},
58
- {"color": [145, 148, 174], "isthing": 1, "id": 42, "name": "surfboard"},
59
- {"color": [255, 208, 186], "isthing": 1, "id": 43, "name": "tennis racket"},
60
- {"color": [197, 226, 255], "isthing": 1, "id": 44, "name": "bottle"},
61
- {"color": [171, 134, 1], "isthing": 1, "id": 46, "name": "wine glass"},
62
- {"color": [109, 63, 54], "isthing": 1, "id": 47, "name": "cup"},
63
- {"color": [207, 138, 255], "isthing": 1, "id": 48, "name": "fork"},
64
- {"color": [151, 0, 95], "isthing": 1, "id": 49, "name": "knife"},
65
- {"color": [9, 80, 61], "isthing": 1, "id": 50, "name": "spoon"},
66
- {"color": [84, 105, 51], "isthing": 1, "id": 51, "name": "bowl"},
67
- {"color": [74, 65, 105], "isthing": 1, "id": 52, "name": "banana"},
68
- {"color": [166, 196, 102], "isthing": 1, "id": 53, "name": "apple"},
69
- {"color": [208, 195, 210], "isthing": 1, "id": 54, "name": "sandwich"},
70
- {"color": [255, 109, 65], "isthing": 1, "id": 55, "name": "orange"},
71
- {"color": [0, 143, 149], "isthing": 1, "id": 56, "name": "broccoli"},
72
- {"color": [179, 0, 194], "isthing": 1, "id": 57, "name": "carrot"},
73
- {"color": [209, 99, 106], "isthing": 1, "id": 58, "name": "hot dog"},
74
- {"color": [5, 121, 0], "isthing": 1, "id": 59, "name": "pizza"},
75
- {"color": [227, 255, 205], "isthing": 1, "id": 60, "name": "donut"},
76
- {"color": [147, 186, 208], "isthing": 1, "id": 61, "name": "cake"},
77
- {"color": [153, 69, 1], "isthing": 1, "id": 62, "name": "chair"},
78
- {"color": [3, 95, 161], "isthing": 1, "id": 63, "name": "couch"},
79
- {"color": [163, 255, 0], "isthing": 1, "id": 64, "name": "potted plant"},
80
- {"color": [119, 0, 170], "isthing": 1, "id": 65, "name": "bed"},
81
- {"color": [0, 182, 199], "isthing": 1, "id": 67, "name": "dining table"},
82
- {"color": [0, 165, 120], "isthing": 1, "id": 70, "name": "toilet"},
83
- {"color": [183, 130, 88], "isthing": 1, "id": 72, "name": "tv"},
84
- {"color": [95, 32, 0], "isthing": 1, "id": 73, "name": "laptop"},
85
- {"color": [130, 114, 135], "isthing": 1, "id": 74, "name": "mouse"},
86
- {"color": [110, 129, 133], "isthing": 1, "id": 75, "name": "remote"},
87
- {"color": [166, 74, 118], "isthing": 1, "id": 76, "name": "keyboard"},
88
- {"color": [219, 142, 185], "isthing": 1, "id": 77, "name": "cell phone"},
89
- {"color": [79, 210, 114], "isthing": 1, "id": 78, "name": "microwave"},
90
- {"color": [178, 90, 62], "isthing": 1, "id": 79, "name": "oven"},
91
- {"color": [65, 70, 15], "isthing": 1, "id": 80, "name": "toaster"},
92
- {"color": [127, 167, 115], "isthing": 1, "id": 81, "name": "sink"},
93
- {"color": [59, 105, 106], "isthing": 1, "id": 82, "name": "refrigerator"},
94
- {"color": [142, 108, 45], "isthing": 1, "id": 84, "name": "book"},
95
- {"color": [196, 172, 0], "isthing": 1, "id": 85, "name": "clock"},
96
- {"color": [95, 54, 80], "isthing": 1, "id": 86, "name": "vase"},
97
- {"color": [128, 76, 255], "isthing": 1, "id": 87, "name": "scissors"},
98
- {"color": [201, 57, 1], "isthing": 1, "id": 88, "name": "teddy bear"},
99
- {"color": [246, 0, 122], "isthing": 1, "id": 89, "name": "hair drier"},
100
- {"color": [191, 162, 208], "isthing": 1, "id": 90, "name": "toothbrush"},
101
- {"color": [255, 255, 128], "isthing": 0, "id": 92, "name": "banner"},
102
- {"color": [147, 211, 203], "isthing": 0, "id": 93, "name": "blanket"},
103
- {"color": [150, 100, 100], "isthing": 0, "id": 95, "name": "bridge"},
104
- {"color": [168, 171, 172], "isthing": 0, "id": 100, "name": "cardboard"},
105
- {"color": [146, 112, 198], "isthing": 0, "id": 107, "name": "counter"},
106
- {"color": [210, 170, 100], "isthing": 0, "id": 109, "name": "curtain"},
107
- {"color": [92, 136, 89], "isthing": 0, "id": 112, "name": "door-stuff"},
108
- {"color": [218, 88, 184], "isthing": 0, "id": 118, "name": "floor-wood"},
109
- {"color": [241, 129, 0], "isthing": 0, "id": 119, "name": "flower"},
110
- {"color": [217, 17, 255], "isthing": 0, "id": 122, "name": "fruit"},
111
- {"color": [124, 74, 181], "isthing": 0, "id": 125, "name": "gravel"},
112
- {"color": [70, 70, 70], "isthing": 0, "id": 128, "name": "house"},
113
- {"color": [255, 228, 255], "isthing": 0, "id": 130, "name": "light"},
114
- {"color": [154, 208, 0], "isthing": 0, "id": 133, "name": "mirror-stuff"},
115
- {"color": [193, 0, 92], "isthing": 0, "id": 138, "name": "net"},
116
- {"color": [76, 91, 113], "isthing": 0, "id": 141, "name": "pillow"},
117
- {"color": [255, 180, 195], "isthing": 0, "id": 144, "name": "platform"},
118
- {"color": [106, 154, 176], "isthing": 0, "id": 145, "name": "playingfield"},
119
- {"color": [230, 150, 140], "isthing": 0, "id": 147, "name": "railroad"},
120
- {"color": [60, 143, 255], "isthing": 0, "id": 148, "name": "river"},
121
- {"color": [128, 64, 128], "isthing": 0, "id": 149, "name": "road"},
122
- {"color": [92, 82, 55], "isthing": 0, "id": 151, "name": "roof"},
123
- {"color": [254, 212, 124], "isthing": 0, "id": 154, "name": "sand"},
124
- {"color": [73, 77, 174], "isthing": 0, "id": 155, "name": "sea"},
125
- {"color": [255, 160, 98], "isthing": 0, "id": 156, "name": "shelf"},
126
- {"color": [255, 255, 255], "isthing": 0, "id": 159, "name": "snow"},
127
- {"color": [104, 84, 109], "isthing": 0, "id": 161, "name": "stairs"},
128
- {"color": [169, 164, 131], "isthing": 0, "id": 166, "name": "tent"},
129
- {"color": [225, 199, 255], "isthing": 0, "id": 168, "name": "towel"},
130
- {"color": [137, 54, 74], "isthing": 0, "id": 171, "name": "wall-brick"},
131
- {"color": [135, 158, 223], "isthing": 0, "id": 175, "name": "wall-stone"},
132
- {"color": [7, 246, 231], "isthing": 0, "id": 176, "name": "wall-tile"},
133
- {"color": [107, 255, 200], "isthing": 0, "id": 177, "name": "wall-wood"},
134
- {"color": [58, 41, 149], "isthing": 0, "id": 178, "name": "water-other"},
135
- {"color": [183, 121, 142], "isthing": 0, "id": 180, "name": "window-blind"},
136
- {"color": [255, 73, 97], "isthing": 0, "id": 181, "name": "window-other"},
137
- {"color": [107, 142, 35], "isthing": 0, "id": 184, "name": "tree-merged"},
138
- {"color": [190, 153, 153], "isthing": 0, "id": 185, "name": "fence-merged"},
139
- {"color": [146, 139, 141], "isthing": 0, "id": 186, "name": "ceiling-merged"},
140
- {"color": [70, 130, 180], "isthing": 0, "id": 187, "name": "sky-other-merged"},
141
- {"color": [134, 199, 156], "isthing": 0, "id": 188, "name": "cabinet-merged"},
142
- {"color": [209, 226, 140], "isthing": 0, "id": 189, "name": "table-merged"},
143
- {"color": [96, 36, 108], "isthing": 0, "id": 190, "name": "floor-other-merged"},
144
- {"color": [96, 96, 96], "isthing": 0, "id": 191, "name": "pavement-merged"},
145
- {"color": [64, 170, 64], "isthing": 0, "id": 192, "name": "mountain-merged"},
146
- {"color": [152, 251, 152], "isthing": 0, "id": 193, "name": "grass-merged"},
147
- {"color": [208, 229, 228], "isthing": 0, "id": 194, "name": "dirt-merged"},
148
- {"color": [206, 186, 171], "isthing": 0, "id": 195, "name": "paper-merged"},
149
- {"color": [152, 161, 64], "isthing": 0, "id": 196, "name": "food-other-merged"},
150
- {"color": [116, 112, 0], "isthing": 0, "id": 197, "name": "building-other-merged"},
151
- {"color": [0, 114, 143], "isthing": 0, "id": 198, "name": "rock-merged"},
152
- {"color": [102, 102, 156], "isthing": 0, "id": 199, "name": "wall-other-merged"},
153
- {"color": [250, 141, 255], "isthing": 0, "id": 200, "name": "rug-merged"},
154
- ]
155
-
156
- IMAGENET_CATEGORIES = [
157
- {"color": [220, 20, 60], "isthing": 1, "id": 1, "name": "fg"},
158
- ]
159
-
160
- UVO_CATEGORIES = [
161
- {"color": [220, 20, 60], "isthing": 1, "id": 1, "name": "object"},
162
- ]
163
-
164
- # fmt: off
165
- COCO_PERSON_KEYPOINT_NAMES = (
166
- "nose",
167
- "left_eye", "right_eye",
168
- "left_ear", "right_ear",
169
- "left_shoulder", "right_shoulder",
170
- "left_elbow", "right_elbow",
171
- "left_wrist", "right_wrist",
172
- "left_hip", "right_hip",
173
- "left_knee", "right_knee",
174
- "left_ankle", "right_ankle",
175
- )
176
- # fmt: on
177
-
178
- # Pairs of keypoints that should be exchanged under horizontal flipping
179
- COCO_PERSON_KEYPOINT_FLIP_MAP = (
180
- ("left_eye", "right_eye"),
181
- ("left_ear", "right_ear"),
182
- ("left_shoulder", "right_shoulder"),
183
- ("left_elbow", "right_elbow"),
184
- ("left_wrist", "right_wrist"),
185
- ("left_hip", "right_hip"),
186
- ("left_knee", "right_knee"),
187
- ("left_ankle", "right_ankle"),
188
- )
189
-
190
- # rules for pairs of keypoints to draw a line between, and the line color to use.
191
- KEYPOINT_CONNECTION_RULES = [
192
- # face
193
- ("left_ear", "left_eye", (102, 204, 255)),
194
- ("right_ear", "right_eye", (51, 153, 255)),
195
- ("left_eye", "nose", (102, 0, 204)),
196
- ("nose", "right_eye", (51, 102, 255)),
197
- # upper-body
198
- ("left_shoulder", "right_shoulder", (255, 128, 0)),
199
- ("left_shoulder", "left_elbow", (153, 255, 204)),
200
- ("right_shoulder", "right_elbow", (128, 229, 255)),
201
- ("left_elbow", "left_wrist", (153, 255, 153)),
202
- ("right_elbow", "right_wrist", (102, 255, 224)),
203
- # lower-body
204
- ("left_hip", "right_hip", (255, 102, 0)),
205
- ("left_hip", "left_knee", (255, 255, 77)),
206
- ("right_hip", "right_knee", (153, 255, 204)),
207
- ("left_knee", "left_ankle", (191, 255, 128)),
208
- ("right_knee", "right_ankle", (255, 195, 77)),
209
- ]
210
-
211
- # All Cityscapes categories, together with their nice-looking visualization colors
212
- # It's from https://github.com/mcordts/cityscapesScripts/blob/master/cityscapesscripts/helpers/labels.py # noqa
213
- CITYSCAPES_CATEGORIES = [
214
- {"color": (128, 64, 128), "isthing": 0, "id": 7, "trainId": 0, "name": "road"},
215
- {"color": (244, 35, 232), "isthing": 0, "id": 8, "trainId": 1, "name": "sidewalk"},
216
- {"color": (70, 70, 70), "isthing": 0, "id": 11, "trainId": 2, "name": "building"},
217
- {"color": (102, 102, 156), "isthing": 0, "id": 12, "trainId": 3, "name": "wall"},
218
- {"color": (190, 153, 153), "isthing": 0, "id": 13, "trainId": 4, "name": "fence"},
219
- {"color": (153, 153, 153), "isthing": 0, "id": 17, "trainId": 5, "name": "pole"},
220
- {"color": (250, 170, 30), "isthing": 0, "id": 19, "trainId": 6, "name": "traffic light"},
221
- {"color": (220, 220, 0), "isthing": 0, "id": 20, "trainId": 7, "name": "traffic sign"},
222
- {"color": (107, 142, 35), "isthing": 0, "id": 21, "trainId": 8, "name": "vegetation"},
223
- {"color": (152, 251, 152), "isthing": 0, "id": 22, "trainId": 9, "name": "terrain"},
224
- {"color": (70, 130, 180), "isthing": 0, "id": 23, "trainId": 10, "name": "sky"},
225
- {"color": (220, 20, 60), "isthing": 1, "id": 24, "trainId": 11, "name": "person"},
226
- {"color": (255, 0, 0), "isthing": 1, "id": 25, "trainId": 12, "name": "rider"},
227
- {"color": (0, 0, 142), "isthing": 1, "id": 26, "trainId": 13, "name": "car"},
228
- {"color": (0, 0, 70), "isthing": 1, "id": 27, "trainId": 14, "name": "truck"},
229
- {"color": (0, 60, 100), "isthing": 1, "id": 28, "trainId": 15, "name": "bus"},
230
- {"color": (0, 80, 100), "isthing": 1, "id": 31, "trainId": 16, "name": "train"},
231
- {"color": (0, 0, 230), "isthing": 1, "id": 32, "trainId": 17, "name": "motorcycle"},
232
- {"color": (119, 11, 32), "isthing": 1, "id": 33, "trainId": 18, "name": "bicycle"},
233
- ]
234
-
235
- # fmt: off
236
- ADE20K_SEM_SEG_CATEGORIES = [
237
- "wall", "building", "sky", "floor", "tree", "ceiling", "road, route", "bed", "window ", "grass", "cabinet", "sidewalk, pavement", "person", "earth, ground", "door", "table", "mountain, mount", "plant", "curtain", "chair", "car", "water", "painting, picture", "sofa", "shelf", "house", "sea", "mirror", "rug", "field", "armchair", "seat", "fence", "desk", "rock, stone", "wardrobe, closet, press", "lamp", "tub", "rail", "cushion", "base, pedestal, stand", "box", "column, pillar", "signboard, sign", "chest of drawers, chest, bureau, dresser", "counter", "sand", "sink", "skyscraper", "fireplace", "refrigerator, icebox", "grandstand, covered stand", "path", "stairs", "runway", "case, display case, showcase, vitrine", "pool table, billiard table, snooker table", "pillow", "screen door, screen", "stairway, staircase", "river", "bridge, span", "bookcase", "blind, screen", "coffee table", "toilet, can, commode, crapper, pot, potty, stool, throne", "flower", "book", "hill", "bench", "countertop", "stove", "palm, palm tree", "kitchen island", "computer", "swivel chair", "boat", "bar", "arcade machine", "hovel, hut, hutch, shack, shanty", "bus", "towel", "light", "truck", "tower", "chandelier", "awning, sunshade, sunblind", "street lamp", "booth", "tv", "plane", "dirt track", "clothes", "pole", "land, ground, soil", "bannister, banister, balustrade, balusters, handrail", "escalator, moving staircase, moving stairway", "ottoman, pouf, pouffe, puff, hassock", "bottle", "buffet, counter, sideboard", "poster, posting, placard, notice, bill, card", "stage", "van", "ship", "fountain", "conveyer belt, conveyor belt, conveyer, conveyor, transporter", "canopy", "washer, automatic washer, washing machine", "plaything, toy", "pool", "stool", "barrel, cask", "basket, handbasket", "falls", "tent", "bag", "minibike, motorbike", "cradle", "oven", "ball", "food, solid food", "step, stair", "tank, storage tank", "trade name", "microwave", "pot", "animal", "bicycle", "lake", "dishwasher", "screen", "blanket, cover", "sculpture", "hood, exhaust hood", "sconce", "vase", "traffic light", "tray", "trash can", "fan", "pier", "crt screen", "plate", "monitor", "bulletin board", "shower", "radiator", "glass, drinking glass", "clock", "flag", # noqa
238
- ]
239
- # After processed by `prepare_ade20k_sem_seg.py`, id 255 means ignore
240
- # fmt: on
241
-
242
-
243
- def _get_coco_instances_meta():
244
- thing_ids = [k["id"] for k in COCO_CATEGORIES if k["isthing"] == 1]
245
- thing_colors = [k["color"] for k in COCO_CATEGORIES if k["isthing"] == 1]
246
- assert len(thing_ids) == 80, len(thing_ids)
247
- # Mapping from the incontiguous COCO category id to an id in [0, 79]
248
- thing_dataset_id_to_contiguous_id = {k: i for i, k in enumerate(thing_ids)}
249
- thing_classes = [k["name"] for k in COCO_CATEGORIES if k["isthing"] == 1]
250
- ret = {
251
- "thing_dataset_id_to_contiguous_id": thing_dataset_id_to_contiguous_id,
252
- "thing_classes": thing_classes,
253
- "thing_colors": thing_colors,
254
- }
255
- return ret
256
-
257
- def _get_imagenet_instances_meta():
258
- thing_ids = [k["id"] for k in IMAGENET_CATEGORIES if k["isthing"] == 1]
259
- thing_colors = [k["color"] for k in IMAGENET_CATEGORIES if k["isthing"] == 1]
260
- assert len(thing_ids) == 1, len(thing_ids)
261
- thing_dataset_id_to_contiguous_id = {k: i for i, k in enumerate(thing_ids)}
262
- thing_classes = [k["name"] for k in IMAGENET_CATEGORIES if k["isthing"] == 1]
263
- ret = {
264
- "thing_dataset_id_to_contiguous_id": thing_dataset_id_to_contiguous_id,
265
- "thing_classes": thing_classes,
266
- "thing_colors": thing_colors,
267
- "class_image_count": [{'id': 1, 'image_count': 116986}]
268
- }
269
- return ret
270
-
271
- def _get_UVO_instances_meta():
272
- thing_ids = [k["id"] for k in UVO_CATEGORIES if k["isthing"] == 1]
273
- thing_colors = [k["color"] for k in UVO_CATEGORIES if k["isthing"] == 1]
274
- assert len(thing_ids) == 1, len(thing_ids)
275
- thing_dataset_id_to_contiguous_id = {k: i for i, k in enumerate(thing_ids)}
276
- thing_classes = [k["name"] for k in UVO_CATEGORIES if k["isthing"] == 1]
277
- ret = {
278
- "thing_dataset_id_to_contiguous_id": thing_dataset_id_to_contiguous_id,
279
- "thing_classes": thing_classes,
280
- "thing_colors": thing_colors,
281
- "class_image_count": [{'id': 1, 'image_count': 116986}]
282
- }
283
- return ret
284
-
285
- def _get_coco_panoptic_separated_meta():
286
- """
287
- Returns metadata for "separated" version of the panoptic segmentation dataset.
288
- """
289
- stuff_ids = [k["id"] for k in COCO_CATEGORIES if k["isthing"] == 0]
290
- assert len(stuff_ids) == 53, len(stuff_ids)
291
-
292
- # For semantic segmentation, this mapping maps from contiguous stuff id
293
- # (in [0, 53], used in models) to ids in the dataset (used for processing results)
294
- # The id 0 is mapped to an extra category "thing".
295
- stuff_dataset_id_to_contiguous_id = {k: i + 1 for i, k in enumerate(stuff_ids)}
296
- # When converting COCO panoptic annotations to semantic annotations
297
- # We label the "thing" category to 0
298
- stuff_dataset_id_to_contiguous_id[0] = 0
299
-
300
- # 54 names for COCO stuff categories (including "things")
301
- stuff_classes = ["things"] + [
302
- k["name"].replace("-other", "").replace("-merged", "")
303
- for k in COCO_CATEGORIES
304
- if k["isthing"] == 0
305
- ]
306
-
307
- # NOTE: I randomly picked a color for things
308
- stuff_colors = [[82, 18, 128]] + [k["color"] for k in COCO_CATEGORIES if k["isthing"] == 0]
309
- ret = {
310
- "stuff_dataset_id_to_contiguous_id": stuff_dataset_id_to_contiguous_id,
311
- "stuff_classes": stuff_classes,
312
- "stuff_colors": stuff_colors,
313
- }
314
- ret.update(_get_coco_instances_meta())
315
- return ret
316
-
317
-
318
- def _get_builtin_metadata(dataset_name):
319
- if dataset_name in ["coco", "coco_semi"]:
320
- return _get_coco_instances_meta()
321
- if dataset_name == "coco_panoptic_separated":
322
- return _get_coco_panoptic_separated_meta()
323
- elif dataset_name in ["imagenet", "kitti", "cross_domain", "lvis", "voc", "coco_cls_agnostic", "objects365", 'openimages']:
324
- return _get_imagenet_instances_meta()
325
- elif dataset_name == "uvo":
326
- return _get_UVO_instances_meta()
327
- elif dataset_name == "coco_panoptic_standard":
328
- meta = {}
329
- # The following metadata maps contiguous id from [0, #thing categories +
330
- # #stuff categories) to their names and colors. We have to replica of the
331
- # same name and color under "thing_*" and "stuff_*" because the current
332
- # visualization function in D2 handles thing and class classes differently
333
- # due to some heuristic used in Panoptic FPN. We keep the same naming to
334
- # enable reusing existing visualization functions.
335
- thing_classes = [k["name"] for k in COCO_CATEGORIES]
336
- thing_colors = [k["color"] for k in COCO_CATEGORIES]
337
- stuff_classes = [k["name"] for k in COCO_CATEGORIES]
338
- stuff_colors = [k["color"] for k in COCO_CATEGORIES]
339
-
340
- meta["thing_classes"] = thing_classes
341
- meta["thing_colors"] = thing_colors
342
- meta["stuff_classes"] = stuff_classes
343
- meta["stuff_colors"] = stuff_colors
344
-
345
- # Convert category id for training:
346
- # category id: like semantic segmentation, it is the class id for each
347
- # pixel. Since there are some classes not used in evaluation, the category
348
- # id is not always contiguous and thus we have two set of category ids:
349
- # - original category id: category id in the original dataset, mainly
350
- # used for evaluation.
351
- # - contiguous category id: [0, #classes), in order to train the linear
352
- # softmax classifier.
353
- thing_dataset_id_to_contiguous_id = {}
354
- stuff_dataset_id_to_contiguous_id = {}
355
-
356
- for i, cat in enumerate(COCO_CATEGORIES):
357
- if cat["isthing"]:
358
- thing_dataset_id_to_contiguous_id[cat["id"]] = i
359
- else:
360
- stuff_dataset_id_to_contiguous_id[cat["id"]] = i
361
-
362
- meta["thing_dataset_id_to_contiguous_id"] = thing_dataset_id_to_contiguous_id
363
- meta["stuff_dataset_id_to_contiguous_id"] = stuff_dataset_id_to_contiguous_id
364
-
365
- return meta
366
- elif dataset_name == "coco_person":
367
- return {
368
- "thing_classes": ["person"],
369
- "keypoint_names": COCO_PERSON_KEYPOINT_NAMES,
370
- "keypoint_flip_map": COCO_PERSON_KEYPOINT_FLIP_MAP,
371
- "keypoint_connection_rules": KEYPOINT_CONNECTION_RULES,
372
- }
373
- elif dataset_name == "cityscapes":
374
- # fmt: off
375
- CITYSCAPES_THING_CLASSES = [
376
- "person", "rider", "car", "truck",
377
- "bus", "train", "motorcycle", "bicycle",
378
- ]
379
- CITYSCAPES_STUFF_CLASSES = [
380
- "road", "sidewalk", "building", "wall", "fence", "pole", "traffic light",
381
- "traffic sign", "vegetation", "terrain", "sky", "person", "rider", "car",
382
- "truck", "bus", "train", "motorcycle", "bicycle",
383
- ]
384
- # fmt: on
385
- return {
386
- "thing_classes": CITYSCAPES_THING_CLASSES,
387
- "stuff_classes": CITYSCAPES_STUFF_CLASSES,
388
- }
389
- raise KeyError("No built-in metadata for dataset {}".format(dataset_name))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/data/datasets/coco.py DELETED
@@ -1,544 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/data/datasets/coco.py
3
-
4
- import contextlib
5
- import datetime
6
- import io
7
- import json
8
- import logging
9
- import numpy as np
10
- import os
11
- import shutil
12
- import pycocotools.mask as mask_util
13
- from fvcore.common.timer import Timer
14
- from iopath.common.file_io import file_lock
15
- from PIL import Image
16
-
17
- from detectron2.structures import Boxes, BoxMode, PolygonMasks, RotatedBoxes
18
- from detectron2.utils.file_io import PathManager
19
-
20
- from detectron2.data import DatasetCatalog, MetadataCatalog
21
-
22
- """
23
- This file contains functions to parse COCO-format annotations into dicts in "Detectron2 format".
24
- """
25
-
26
-
27
- logger = logging.getLogger(__name__)
28
-
29
- __all__ = ["load_coco_json", "load_sem_seg", "convert_to_coco_json", "register_coco_instances"]
30
-
31
-
32
- def load_coco_json(json_file, image_root, dataset_name=None, extra_annotation_keys=None):
33
- """
34
- Load a json file with COCO's instances annotation format.
35
- Currently supports instance detection, instance segmentation,
36
- and person keypoints annotations.
37
-
38
- Args:
39
- json_file (str): full path to the json file in COCO instances annotation format.
40
- image_root (str or path-like): the directory where the images in this json file exists.
41
- dataset_name (str or None): the name of the dataset (e.g., coco_2017_train).
42
- When provided, this function will also do the following:
43
-
44
- * Put "thing_classes" into the metadata associated with this dataset.
45
- * Map the category ids into a contiguous range (needed by standard dataset format),
46
- and add "thing_dataset_id_to_contiguous_id" to the metadata associated
47
- with this dataset.
48
-
49
- This option should usually be provided, unless users need to load
50
- the original json content and apply more processing manually.
51
- extra_annotation_keys (list[str]): list of per-annotation keys that should also be
52
- loaded into the dataset dict (besides "iscrowd", "bbox", "keypoints",
53
- "category_id", "segmentation"). The values for these keys will be returned as-is.
54
- For example, the densepose annotations are loaded in this way.
55
-
56
- Returns:
57
- list[dict]: a list of dicts in Detectron2 standard dataset dicts format (See
58
- `Using Custom Datasets </tutorials/datasets.html>`_ ) when `dataset_name` is not None.
59
- If `dataset_name` is None, the returned `category_ids` may be
60
- incontiguous and may not conform to the Detectron2 standard format.
61
-
62
- Notes:
63
- 1. This function does not read the image files.
64
- The results do not have the "image" field.
65
- """
66
- from pycocotools.coco import COCO
67
-
68
- timer = Timer()
69
- json_file = PathManager.get_local_path(json_file)
70
- with contextlib.redirect_stdout(io.StringIO()):
71
- coco_api = COCO(json_file)
72
- if timer.seconds() > 1:
73
- logger.info("Loading {} takes {:.2f} seconds.".format(json_file, timer.seconds()))
74
-
75
- id_map = None
76
- if dataset_name is not None:
77
- meta = MetadataCatalog.get(dataset_name)
78
- cat_ids = sorted(coco_api.getCatIds())
79
- cats = coco_api.loadCats(cat_ids)
80
- # The categories in a custom json file may not be sorted.
81
- thing_classes = [c["name"] for c in sorted(cats, key=lambda x: x["id"])]
82
- if "imagenet" not in dataset_name and "cls_agnostic" not in dataset_name:
83
- meta.thing_classes = thing_classes
84
-
85
- # In COCO, certain category ids are artificially removed,
86
- # and by convention they are always ignored.
87
- # We deal with COCO's id issue and translate
88
- # the category ids to contiguous ids in [0, 80).
89
-
90
- # It works by looking at the "categories" field in the json, therefore
91
- # if users' own json also have incontiguous ids, we'll
92
- # apply this mapping as well but print a warning.
93
- if not (min(cat_ids) == 1 and max(cat_ids) == len(cat_ids)):
94
- if "coco" not in dataset_name:
95
- logger.warning(
96
- """
97
- Category ids in annotations are not in [1, #categories]! We'll apply a mapping for you.
98
- """
99
- )
100
- id_map = {v: i for i, v in enumerate(cat_ids)}
101
- meta.thing_dataset_id_to_contiguous_id = id_map
102
- else:
103
- id_map = meta.thing_dataset_id_to_contiguous_id
104
-
105
- # sort indices for reproducible results
106
- img_ids = sorted(coco_api.imgs.keys())
107
- # imgs is a list of dicts, each looks something like:
108
- # {'license': 4,
109
- # 'url': 'http://farm6.staticflickr.com/5454/9413846304_881d5e5c3b_z.jpg',
110
- # 'file_name': 'COCO_val2014_000000001268.jpg',
111
- # 'height': 427,
112
- # 'width': 640,
113
- # 'date_captured': '2013-11-17 05:57:24',
114
- # 'id': 1268}
115
- imgs = coco_api.loadImgs(img_ids)
116
- # anns is a list[list[dict]], where each dict is an annotation
117
- # record for an object. The inner list enumerates the objects in an image
118
- # and the outer list enumerates over images. Example of anns[0]:
119
- # [{'segmentation': [[192.81,
120
- # 247.09,
121
- # ...
122
- # 219.03,
123
- # 249.06]],
124
- # 'area': 1035.749,
125
- # 'iscrowd': 0,
126
- # 'image_id': 1268,
127
- # 'bbox': [192.81, 224.8, 74.73, 33.43],
128
- # 'category_id': 16,
129
- # 'id': 42986},
130
- # ...]
131
- anns = [coco_api.imgToAnns[img_id] for img_id in img_ids]
132
- total_num_valid_anns = sum([len(x) for x in anns])
133
- total_num_anns = len(coco_api.anns)
134
- if total_num_valid_anns < total_num_anns:
135
- logger.warning(
136
- f"{json_file} contains {total_num_anns} annotations, but only "
137
- f"{total_num_valid_anns} of them match to images in the file."
138
- )
139
-
140
- if "minival" not in json_file:
141
- # The popular valminusminival & minival annotations for COCO2014 contain this bug.
142
- # However the ratio of buggy annotations there is tiny and does not affect accuracy.
143
- # Therefore we explicitly white-list them.
144
- ann_ids = [ann["id"] for anns_per_image in anns for ann in anns_per_image]
145
- assert len(set(ann_ids)) == len(ann_ids), "Annotation ids in '{}' are not unique!".format(
146
- json_file
147
- )
148
-
149
- imgs_anns = list(zip(imgs, anns))
150
- logger.info("Loaded {} images in COCO format from {}".format(len(imgs_anns), json_file))
151
-
152
- dataset_dicts = []
153
-
154
- ann_keys = ["iscrowd", "bbox", "keypoints", "category_id"] + (extra_annotation_keys or [])
155
-
156
- num_instances_without_valid_segmentation = 0
157
-
158
- for (img_dict, anno_dict_list) in imgs_anns:
159
- record = {}
160
- record["file_name"] = os.path.join(image_root, img_dict["file_name"])
161
- record["height"] = img_dict["height"]
162
- record["width"] = img_dict["width"]
163
- image_id = record["image_id"] = img_dict["id"]
164
-
165
- objs = []
166
- for anno in anno_dict_list:
167
- # Check that the image_id in this annotation is the same as
168
- # the image_id we're looking at.
169
- # This fails only when the data parsing logic or the annotation file is buggy.
170
-
171
- # The original COCO valminusminival2014 & minival2014 annotation files
172
- # actually contains bugs that, together with certain ways of using COCO API,
173
- # can trigger this assertion.
174
- assert anno["image_id"] == image_id
175
-
176
- assert anno.get("ignore", 0) == 0, '"ignore" in COCO json file is not supported.'
177
-
178
- obj = {key: anno[key] for key in ann_keys if key in anno}
179
- if "bbox" in obj and len(obj["bbox"]) == 0:
180
- raise ValueError(
181
- f"One annotation of image {image_id} contains empty 'bbox' value! "
182
- "This json does not have valid COCO format."
183
- )
184
-
185
- segm = anno.get("segmentation", None)
186
- if segm: # either list[list[float]] or dict(RLE)
187
- if isinstance(segm, dict):
188
- if isinstance(segm["counts"], list):
189
- # convert to compressed RLE
190
- segm = mask_util.frPyObjects(segm, *segm["size"])
191
- else:
192
- # filter out invalid polygons (< 3 points)
193
- segm = [poly for poly in segm if len(poly) % 2 == 0 and len(poly) >= 6]
194
- if len(segm) == 0:
195
- num_instances_without_valid_segmentation += 1
196
- continue # ignore this instance
197
- obj["segmentation"] = segm
198
-
199
- keypts = anno.get("keypoints", None)
200
- if keypts: # list[int]
201
- for idx, v in enumerate(keypts):
202
- if idx % 3 != 2:
203
- # COCO's segmentation coordinates are floating points in [0, H or W],
204
- # but keypoint coordinates are integers in [0, H-1 or W-1]
205
- # Therefore we assume the coordinates are "pixel indices" and
206
- # add 0.5 to convert to floating point coordinates.
207
- keypts[idx] = v + 0.5
208
- obj["keypoints"] = keypts
209
-
210
- obj["bbox_mode"] = BoxMode.XYWH_ABS
211
- if id_map:
212
- annotation_category_id = obj["category_id"]
213
- try:
214
- obj["category_id"] = id_map[annotation_category_id]
215
- except KeyError as e:
216
- raise KeyError(
217
- f"Encountered category_id={annotation_category_id} "
218
- "but this id does not exist in 'categories' of the json file."
219
- ) from e
220
- objs.append(obj)
221
- record["annotations"] = objs
222
- dataset_dicts.append(record)
223
-
224
- if num_instances_without_valid_segmentation > 0:
225
- logger.warning(
226
- "Filtered out {} instances without valid segmentation. ".format(
227
- num_instances_without_valid_segmentation
228
- )
229
- + "There might be issues in your dataset generation process. Please "
230
- "check https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html carefully"
231
- )
232
- return dataset_dicts
233
-
234
-
235
- def load_sem_seg(gt_root, image_root, gt_ext="png", image_ext="jpg"):
236
- """
237
- Load semantic segmentation datasets. All files under "gt_root" with "gt_ext" extension are
238
- treated as ground truth annotations and all files under "image_root" with "image_ext" extension
239
- as input images. Ground truth and input images are matched using file paths relative to
240
- "gt_root" and "image_root" respectively without taking into account file extensions.
241
- This works for COCO as well as some other datasets.
242
-
243
- Args:
244
- gt_root (str): full path to ground truth semantic segmentation files. Semantic segmentation
245
- annotations are stored as images with integer values in pixels that represent
246
- corresponding semantic labels.
247
- image_root (str): the directory where the input images are.
248
- gt_ext (str): file extension for ground truth annotations.
249
- image_ext (str): file extension for input images.
250
-
251
- Returns:
252
- list[dict]:
253
- a list of dicts in detectron2 standard format without instance-level
254
- annotation.
255
-
256
- Notes:
257
- 1. This function does not read the image and ground truth files.
258
- The results do not have the "image" and "sem_seg" fields.
259
- """
260
-
261
- # We match input images with ground truth based on their relative filepaths (without file
262
- # extensions) starting from 'image_root' and 'gt_root' respectively.
263
- def file2id(folder_path, file_path):
264
- # extract relative path starting from `folder_path`
265
- image_id = os.path.normpath(os.path.relpath(file_path, start=folder_path))
266
- # remove file extension
267
- image_id = os.path.splitext(image_id)[0]
268
- return image_id
269
-
270
- input_files = sorted(
271
- (os.path.join(image_root, f) for f in PathManager.ls(image_root) if f.endswith(image_ext)),
272
- key=lambda file_path: file2id(image_root, file_path),
273
- )
274
- gt_files = sorted(
275
- (os.path.join(gt_root, f) for f in PathManager.ls(gt_root) if f.endswith(gt_ext)),
276
- key=lambda file_path: file2id(gt_root, file_path),
277
- )
278
-
279
- assert len(gt_files) > 0, "No annotations found in {}.".format(gt_root)
280
-
281
- # Use the intersection, so that val2017_100 annotations can run smoothly with val2017 images
282
- if len(input_files) != len(gt_files):
283
- logger.warn(
284
- "Directory {} and {} has {} and {} files, respectively.".format(
285
- image_root, gt_root, len(input_files), len(gt_files)
286
- )
287
- )
288
- input_basenames = [os.path.basename(f)[: -len(image_ext)] for f in input_files]
289
- gt_basenames = [os.path.basename(f)[: -len(gt_ext)] for f in gt_files]
290
- intersect = list(set(input_basenames) & set(gt_basenames))
291
- # sort, otherwise each worker may obtain a list[dict] in different order
292
- intersect = sorted(intersect)
293
- logger.warn("Will use their intersection of {} files.".format(len(intersect)))
294
- input_files = [os.path.join(image_root, f + image_ext) for f in intersect]
295
- gt_files = [os.path.join(gt_root, f + gt_ext) for f in intersect]
296
-
297
- logger.info(
298
- "Loaded {} images with semantic segmentation from {}".format(len(input_files), image_root)
299
- )
300
-
301
- dataset_dicts = []
302
- for (img_path, gt_path) in zip(input_files, gt_files):
303
- record = {}
304
- record["file_name"] = img_path
305
- record["sem_seg_file_name"] = gt_path
306
- dataset_dicts.append(record)
307
-
308
- return dataset_dicts
309
-
310
-
311
- def convert_to_coco_dict(dataset_name):
312
- """
313
- Convert an instance detection/segmentation or keypoint detection dataset
314
- in detectron2's standard format into COCO json format.
315
-
316
- Generic dataset description can be found here:
317
- https://detectron2.readthedocs.io/tutorials/datasets.html#register-a-dataset
318
-
319
- COCO data format description can be found here:
320
- http://cocodataset.org/#format-data
321
-
322
- Args:
323
- dataset_name (str):
324
- name of the source dataset
325
- Must be registered in DatastCatalog and in detectron2's standard format.
326
- Must have corresponding metadata "thing_classes"
327
- Returns:
328
- coco_dict: serializable dict in COCO json format
329
- """
330
-
331
- dataset_dicts = DatasetCatalog.get(dataset_name)
332
- metadata = MetadataCatalog.get(dataset_name)
333
-
334
- # unmap the category mapping ids for COCO
335
- if hasattr(metadata, "thing_dataset_id_to_contiguous_id"):
336
- reverse_id_mapping = {v: k for k, v in metadata.thing_dataset_id_to_contiguous_id.items()}
337
- reverse_id_mapper = lambda contiguous_id: reverse_id_mapping[contiguous_id] # noqa
338
- else:
339
- reverse_id_mapper = lambda contiguous_id: contiguous_id # noqa
340
-
341
- categories = [
342
- {"id": reverse_id_mapper(id), "name": name}
343
- for id, name in enumerate(metadata.thing_classes)
344
- ]
345
-
346
- logger.info("Converting dataset dicts into COCO format")
347
- coco_images = []
348
- coco_annotations = []
349
-
350
- for image_id, image_dict in enumerate(dataset_dicts):
351
- coco_image = {
352
- "id": image_dict.get("image_id", image_id),
353
- "width": int(image_dict["width"]),
354
- "height": int(image_dict["height"]),
355
- "file_name": str(image_dict["file_name"]),
356
- }
357
- coco_images.append(coco_image)
358
-
359
- anns_per_image = image_dict.get("annotations", [])
360
- for annotation in anns_per_image:
361
- # create a new dict with only COCO fields
362
- coco_annotation = {}
363
-
364
- # COCO requirement: XYWH box format for axis-align and XYWHA for rotated
365
- bbox = annotation["bbox"]
366
- if isinstance(bbox, np.ndarray):
367
- if bbox.ndim != 1:
368
- raise ValueError(f"bbox has to be 1-dimensional. Got shape={bbox.shape}.")
369
- bbox = bbox.tolist()
370
- if len(bbox) not in [4, 5]:
371
- raise ValueError(f"bbox has to has length 4 or 5. Got {bbox}.")
372
- from_bbox_mode = annotation["bbox_mode"]
373
- to_bbox_mode = BoxMode.XYWH_ABS if len(bbox) == 4 else BoxMode.XYWHA_ABS
374
- bbox = BoxMode.convert(bbox, from_bbox_mode, to_bbox_mode)
375
-
376
- # COCO requirement: instance area
377
- if "segmentation" in annotation:
378
- # Computing areas for instances by counting the pixels
379
- segmentation = annotation["segmentation"]
380
- # TODO: check segmentation type: RLE, BinaryMask or Polygon
381
- if isinstance(segmentation, list):
382
- polygons = PolygonMasks([segmentation])
383
- area = polygons.area()[0].item()
384
- elif isinstance(segmentation, dict): # RLE
385
- area = mask_util.area(segmentation).item()
386
- else:
387
- raise TypeError(f"Unknown segmentation type {type(segmentation)}!")
388
- else:
389
- # Computing areas using bounding boxes
390
- if to_bbox_mode == BoxMode.XYWH_ABS:
391
- bbox_xy = BoxMode.convert(bbox, to_bbox_mode, BoxMode.XYXY_ABS)
392
- area = Boxes([bbox_xy]).area()[0].item()
393
- else:
394
- area = RotatedBoxes([bbox]).area()[0].item()
395
-
396
- if "keypoints" in annotation:
397
- keypoints = annotation["keypoints"] # list[int]
398
- for idx, v in enumerate(keypoints):
399
- if idx % 3 != 2:
400
- # COCO's segmentation coordinates are floating points in [0, H or W],
401
- # but keypoint coordinates are integers in [0, H-1 or W-1]
402
- # For COCO format consistency we substract 0.5
403
- # https://github.com/facebookresearch/detectron2/pull/175#issuecomment-551202163
404
- keypoints[idx] = v - 0.5
405
- if "num_keypoints" in annotation:
406
- num_keypoints = annotation["num_keypoints"]
407
- else:
408
- num_keypoints = sum(kp > 0 for kp in keypoints[2::3])
409
-
410
- # COCO requirement:
411
- # linking annotations to images
412
- # "id" field must start with 1
413
- coco_annotation["id"] = len(coco_annotations) + 1
414
- coco_annotation["image_id"] = coco_image["id"]
415
- coco_annotation["bbox"] = [round(float(x), 3) for x in bbox]
416
- coco_annotation["area"] = float(area)
417
- coco_annotation["iscrowd"] = int(annotation.get("iscrowd", 0))
418
- coco_annotation["category_id"] = int(reverse_id_mapper(annotation["category_id"]))
419
-
420
- # Add optional fields
421
- if "keypoints" in annotation:
422
- coco_annotation["keypoints"] = keypoints
423
- coco_annotation["num_keypoints"] = num_keypoints
424
-
425
- if "segmentation" in annotation:
426
- seg = coco_annotation["segmentation"] = annotation["segmentation"]
427
- if isinstance(seg, dict): # RLE
428
- counts = seg["counts"]
429
- if not isinstance(counts, str):
430
- # make it json-serializable
431
- seg["counts"] = counts.decode("ascii")
432
-
433
- coco_annotations.append(coco_annotation)
434
-
435
- logger.info(
436
- "Conversion finished, "
437
- f"#images: {len(coco_images)}, #annotations: {len(coco_annotations)}"
438
- )
439
-
440
- info = {
441
- "date_created": str(datetime.datetime.now()),
442
- "description": "Automatically generated COCO json file for Detectron2.",
443
- }
444
- coco_dict = {"info": info, "images": coco_images, "categories": categories, "licenses": None}
445
- if len(coco_annotations) > 0:
446
- coco_dict["annotations"] = coco_annotations
447
- return coco_dict
448
-
449
-
450
- def convert_to_coco_json(dataset_name, output_file, allow_cached=True):
451
- """
452
- Converts dataset into COCO format and saves it to a json file.
453
- dataset_name must be registered in DatasetCatalog and in detectron2's standard format.
454
-
455
- Args:
456
- dataset_name:
457
- reference from the config file to the catalogs
458
- must be registered in DatasetCatalog and in detectron2's standard format
459
- output_file: path of json file that will be saved to
460
- allow_cached: if json file is already present then skip conversion
461
- """
462
-
463
- # TODO: The dataset or the conversion script *may* change,
464
- # a checksum would be useful for validating the cached data
465
-
466
- PathManager.mkdirs(os.path.dirname(output_file))
467
- with file_lock(output_file):
468
- if PathManager.exists(output_file) and allow_cached:
469
- logger.warning(
470
- f"Using previously cached COCO format annotations at '{output_file}'. "
471
- "You need to clear the cache file if your dataset has been modified."
472
- )
473
- else:
474
- logger.info(f"Converting annotations of dataset '{dataset_name}' to COCO format ...)")
475
- coco_dict = convert_to_coco_dict(dataset_name)
476
-
477
- logger.info(f"Caching COCO format annotations at '{output_file}' ...")
478
- tmp_file = output_file + ".tmp"
479
- with PathManager.open(tmp_file, "w") as f:
480
- json.dump(coco_dict, f)
481
- shutil.move(tmp_file, output_file)
482
-
483
-
484
- def register_coco_instances(name, metadata, json_file, image_root):
485
- """
486
- Register a dataset in COCO's json annotation format for
487
- instance detection, instance segmentation and keypoint detection.
488
- (i.e., Type 1 and 2 in http://cocodataset.org/#format-data.
489
- `instances*.json` and `person_keypoints*.json` in the dataset).
490
-
491
- This is an example of how to register a new dataset.
492
- You can do something similar to this function, to register new datasets.
493
-
494
- Args:
495
- name (str): the name that identifies a dataset, e.g. "coco_2014_train".
496
- metadata (dict): extra metadata associated with this dataset. You can
497
- leave it as an empty dict.
498
- json_file (str): path to the json instance annotation file.
499
- image_root (str or path-like): directory which contains all the images.
500
- """
501
- assert isinstance(name, str), name
502
- assert isinstance(json_file, (str, os.PathLike)), json_file
503
- assert isinstance(image_root, (str, os.PathLike)), image_root
504
- # 1. register a function which returns dicts
505
- DatasetCatalog.register(name, lambda: load_coco_json(json_file, image_root, name))
506
-
507
- # 2. Optionally, add metadata about this dataset,
508
- # since they might be useful in evaluation, visualization or logging
509
- MetadataCatalog.get(name).set(
510
- json_file=json_file, image_root=image_root, evaluator_type="coco", **metadata
511
- )
512
-
513
-
514
- if __name__ == "__main__":
515
- """
516
- Test the COCO json dataset loader.
517
-
518
- Usage:
519
- python -m detectron2.data.datasets.coco \
520
- path/to/json path/to/image_root dataset_name
521
-
522
- "dataset_name" can be "coco_2014_minival_100", or other
523
- pre-registered ones
524
- """
525
- from detectron2.utils.logger import setup_logger
526
- from detectron2.utils.visualizer import Visualizer
527
- import detectron2.data.datasets # noqa # add pre-defined metadata
528
- import sys
529
-
530
- logger = setup_logger(name=__name__)
531
- assert sys.argv[3] in DatasetCatalog.list()
532
- meta = MetadataCatalog.get(sys.argv[3])
533
-
534
- dicts = load_coco_json(sys.argv[1], sys.argv[2], sys.argv[3])
535
- logger.info("Done loading {} samples.".format(len(dicts)))
536
-
537
- dirname = "coco-data-vis"
538
- os.makedirs(dirname, exist_ok=True)
539
- for d in dicts:
540
- img = np.array(Image.open(d["file_name"]))
541
- visualizer = Visualizer(img, metadata=meta)
542
- vis = visualizer.draw_dataset_dict(d)
543
- fpath = os.path.join(dirname, os.path.basename(d["file_name"]))
544
- vis.save(fpath)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/data/detection_utils.py DELETED
@@ -1,650 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/data/detection_utils.py
3
-
4
- """
5
- Common data processing utilities that are used in a
6
- typical object detection data pipeline.
7
- """
8
- import logging
9
- import numpy as np
10
- from typing import List, Union
11
- import pycocotools.mask as mask_util
12
- import torch
13
- from PIL import Image
14
-
15
- from detectron2.structures import (
16
- Boxes,
17
- BoxMode,
18
- BitMasks,
19
- Instances,
20
- Keypoints,
21
- PolygonMasks,
22
- RotatedBoxes,
23
- polygons_to_bitmask,
24
- )
25
-
26
- from detectron2.utils.file_io import PathManager
27
-
28
- from data import transforms as T
29
- from detectron2.data.catalog import MetadataCatalog
30
-
31
- __all__ = [
32
- "SizeMismatchError",
33
- "convert_image_to_rgb",
34
- "check_image_size",
35
- "transform_proposals",
36
- "transform_instance_annotations",
37
- "annotations_to_instances",
38
- "annotations_to_instances_rotated",
39
- "build_augmentation",
40
- "build_transform_gen",
41
- "create_keypoint_hflip_indices",
42
- "filter_empty_instances",
43
- "read_image",
44
- ]
45
-
46
-
47
- class SizeMismatchError(ValueError):
48
- """
49
- When loaded image has difference width/height compared with annotation.
50
- """
51
-
52
-
53
- # https://en.wikipedia.org/wiki/YUV#SDTV_with_BT.601
54
- _M_RGB2YUV = [[0.299, 0.587, 0.114], [-0.14713, -0.28886, 0.436], [0.615, -0.51499, -0.10001]]
55
- _M_YUV2RGB = [[1.0, 0.0, 1.13983], [1.0, -0.39465, -0.58060], [1.0, 2.03211, 0.0]]
56
-
57
- # https://www.exiv2.org/tags.html
58
- _EXIF_ORIENT = 274 # exif 'Orientation' tag
59
-
60
-
61
- def convert_PIL_to_numpy(image, format):
62
- """
63
- Convert PIL image to numpy array of target format.
64
-
65
- Args:
66
- image (PIL.Image): a PIL image
67
- format (str): the format of output image
68
-
69
- Returns:
70
- (np.ndarray): also see `read_image`
71
- """
72
- if format is not None:
73
- # PIL only supports RGB, so convert to RGB and flip channels over below
74
- conversion_format = format
75
- if format in ["BGR", "YUV-BT.601"]:
76
- conversion_format = "RGB"
77
- image = image.convert(conversion_format)
78
- image = np.asarray(image)
79
- # PIL squeezes out the channel dimension for "L", so make it HWC
80
- if format == "L":
81
- image = np.expand_dims(image, -1)
82
-
83
- # handle formats not supported by PIL
84
- elif format == "BGR":
85
- # flip channels if needed
86
- image = image[:, :, ::-1]
87
- elif format == "YUV-BT.601":
88
- image = image / 255.0
89
- image = np.dot(image, np.array(_M_RGB2YUV).T)
90
-
91
- return image
92
-
93
-
94
- def convert_image_to_rgb(image, format):
95
- """
96
- Convert an image from given format to RGB.
97
-
98
- Args:
99
- image (np.ndarray or Tensor): an HWC image
100
- format (str): the format of input image, also see `read_image`
101
-
102
- Returns:
103
- (np.ndarray): (H,W,3) RGB image in 0-255 range, can be either float or uint8
104
- """
105
- if isinstance(image, torch.Tensor):
106
- image = image.cpu().numpy()
107
- if format == "BGR":
108
- image = image[:, :, [2, 1, 0]]
109
- elif format == "YUV-BT.601":
110
- image = np.dot(image, np.array(_M_YUV2RGB).T)
111
- image = image * 255.0
112
- else:
113
- if format == "L":
114
- image = image[:, :, 0]
115
- image = image.astype(np.uint8)
116
- image = np.asarray(Image.fromarray(image, mode=format).convert("RGB"))
117
- return image
118
-
119
-
120
- def _apply_exif_orientation(image):
121
- """
122
- Applies the exif orientation correctly.
123
-
124
- This code exists per the bug:
125
- https://github.com/python-pillow/Pillow/issues/3973
126
- with the function `ImageOps.exif_transpose`. The Pillow source raises errors with
127
- various methods, especially `tobytes`
128
-
129
- Function based on:
130
- https://github.com/wkentaro/labelme/blob/v4.5.4/labelme/utils/image.py#L59
131
- https://github.com/python-pillow/Pillow/blob/7.1.2/src/PIL/ImageOps.py#L527
132
-
133
- Args:
134
- image (PIL.Image): a PIL image
135
-
136
- Returns:
137
- (PIL.Image): the PIL image with exif orientation applied, if applicable
138
- """
139
- if not hasattr(image, "getexif"):
140
- return image
141
-
142
- try:
143
- exif = image.getexif()
144
- except Exception: # https://github.com/facebookresearch/detectron2/issues/1885
145
- exif = None
146
-
147
- if exif is None:
148
- return image
149
-
150
- orientation = exif.get(_EXIF_ORIENT)
151
-
152
- method = {
153
- 2: Image.FLIP_LEFT_RIGHT,
154
- 3: Image.ROTATE_180,
155
- 4: Image.FLIP_TOP_BOTTOM,
156
- 5: Image.TRANSPOSE,
157
- 6: Image.ROTATE_270,
158
- 7: Image.TRANSVERSE,
159
- 8: Image.ROTATE_90,
160
- }.get(orientation)
161
-
162
- if method is not None:
163
- return image.transpose(method)
164
- return image
165
-
166
-
167
- def read_image(file_name, format=None):
168
- """
169
- Read an image into the given format.
170
- Will apply rotation and flipping if the image has such exif information.
171
-
172
- Args:
173
- file_name (str): image file path
174
- format (str): one of the supported image modes in PIL, or "BGR" or "YUV-BT.601".
175
-
176
- Returns:
177
- image (np.ndarray):
178
- an HWC image in the given format, which is 0-255, uint8 for
179
- supported image modes in PIL or "BGR"; float (0-1 for Y) for YUV-BT.601.
180
- """
181
- with PathManager.open(file_name, "rb") as f:
182
- image = Image.open(f)
183
-
184
- # work around this bug: https://github.com/python-pillow/Pillow/issues/3973
185
- image = _apply_exif_orientation(image)
186
- return convert_PIL_to_numpy(image, format)
187
-
188
-
189
- def check_image_size(dataset_dict, image):
190
- """
191
- Raise an error if the image does not match the size specified in the dict.
192
- """
193
- if "width" in dataset_dict or "height" in dataset_dict:
194
- image_wh = (image.shape[1], image.shape[0])
195
- expected_wh = (dataset_dict["width"], dataset_dict["height"])
196
- if not image_wh == expected_wh:
197
- expected_wh = (dataset_dict["height"], dataset_dict["width"])
198
- dataset_dict["height"], dataset_dict["width"] = dataset_dict["width"], dataset_dict["height"]
199
- if image_wh != expected_wh:
200
- raise SizeMismatchError(
201
- "Mismatched image shape{}, got {}, expect {}.".format(
202
- " for image " + dataset_dict["file_name"]
203
- if "file_name" in dataset_dict
204
- else "",
205
- image_wh,
206
- expected_wh,
207
- )
208
- + " Please check the width/height in your annotation."
209
- )
210
-
211
- # To ensure bbox always remap to original image size
212
- if "width" not in dataset_dict:
213
- dataset_dict["width"] = image.shape[1]
214
- if "height" not in dataset_dict:
215
- dataset_dict["height"] = image.shape[0]
216
-
217
-
218
- def transform_proposals(dataset_dict, image_shape, transforms, *, proposal_topk, min_box_size=0):
219
- """
220
- Apply transformations to the proposals in dataset_dict, if any.
221
-
222
- Args:
223
- dataset_dict (dict): a dict read from the dataset, possibly
224
- contains fields "proposal_boxes", "proposal_objectness_logits", "proposal_bbox_mode"
225
- image_shape (tuple): height, width
226
- transforms (TransformList):
227
- proposal_topk (int): only keep top-K scoring proposals
228
- min_box_size (int): proposals with either side smaller than this
229
- threshold are removed
230
-
231
- The input dict is modified in-place, with abovementioned keys removed. A new
232
- key "proposals" will be added. Its value is an `Instances`
233
- object which contains the transformed proposals in its field
234
- "proposal_boxes" and "objectness_logits".
235
- """
236
- if "proposal_boxes" in dataset_dict:
237
- # Transform proposal boxes
238
- boxes = transforms.apply_box(
239
- BoxMode.convert(
240
- dataset_dict.pop("proposal_boxes"),
241
- dataset_dict.pop("proposal_bbox_mode"),
242
- BoxMode.XYXY_ABS,
243
- )
244
- )
245
- boxes = Boxes(boxes)
246
- objectness_logits = torch.as_tensor(
247
- dataset_dict.pop("proposal_objectness_logits").astype("float32")
248
- )
249
-
250
- boxes.clip(image_shape)
251
- keep = boxes.nonempty(threshold=min_box_size)
252
- boxes = boxes[keep]
253
- objectness_logits = objectness_logits[keep]
254
-
255
- proposals = Instances(image_shape)
256
- proposals.proposal_boxes = boxes[:proposal_topk]
257
- proposals.objectness_logits = objectness_logits[:proposal_topk]
258
- dataset_dict["proposals"] = proposals
259
-
260
-
261
- def transform_instance_annotations(
262
- annotation, transforms, image_size, *, keypoint_hflip_indices=None
263
- ):
264
- """
265
- Apply transforms to box, segmentation and keypoints annotations of a single instance.
266
-
267
- It will use `transforms.apply_box` for the box, and
268
- `transforms.apply_coords` for segmentation polygons & keypoints.
269
- If you need anything more specially designed for each data structure,
270
- you'll need to implement your own version of this function or the transforms.
271
-
272
- Args:
273
- annotation (dict): dict of instance annotations for a single instance.
274
- It will be modified in-place.
275
- transforms (TransformList or list[Transform]):
276
- image_size (tuple): the height, width of the transformed image
277
- keypoint_hflip_indices (ndarray[int]): see `create_keypoint_hflip_indices`.
278
-
279
- Returns:
280
- dict:
281
- the same input dict with fields "bbox", "segmentation", "keypoints"
282
- transformed according to `transforms`.
283
- The "bbox_mode" field will be set to XYXY_ABS.
284
- """
285
- if isinstance(transforms, (tuple, list)):
286
- transforms = T.TransformList(transforms)
287
- # bbox is 1d (per-instance bounding box)
288
- bbox = BoxMode.convert(annotation["bbox"], annotation["bbox_mode"], BoxMode.XYXY_ABS)
289
- # clip transformed bbox to image size
290
- bbox = transforms.apply_box(np.array([bbox]))[0].clip(min=0)
291
- annotation["bbox"] = np.minimum(bbox, list(image_size + image_size)[::-1])
292
- annotation["bbox_mode"] = BoxMode.XYXY_ABS
293
-
294
- if "segmentation" in annotation:
295
- # each instance contains 1 or more polygons
296
- segm = annotation["segmentation"]
297
- if isinstance(segm, list):
298
- # polygons
299
- polygons = [np.asarray(p).reshape(-1, 2) for p in segm]
300
- annotation["segmentation"] = [
301
- p.reshape(-1) for p in transforms.apply_polygons(polygons)
302
- ]
303
- elif isinstance(segm, dict):
304
- # RLE
305
- mask = mask_util.decode(segm)
306
- mask = transforms.apply_segmentation(mask)
307
- assert tuple(mask.shape[:2]) == image_size
308
- annotation["segmentation"] = mask
309
- else:
310
- raise ValueError(
311
- "Cannot transform segmentation of type '{}'!"
312
- "Supported types are: polygons as list[list[float] or ndarray],"
313
- " COCO-style RLE as a dict.".format(type(segm))
314
- )
315
-
316
- if "keypoints" in annotation:
317
- keypoints = transform_keypoint_annotations(
318
- annotation["keypoints"], transforms, image_size, keypoint_hflip_indices
319
- )
320
- annotation["keypoints"] = keypoints
321
-
322
- return annotation
323
-
324
-
325
- def transform_keypoint_annotations(keypoints, transforms, image_size, keypoint_hflip_indices=None):
326
- """
327
- Transform keypoint annotations of an image.
328
- If a keypoint is transformed out of image boundary, it will be marked "unlabeled" (visibility=0)
329
-
330
- Args:
331
- keypoints (list[float]): Nx3 float in Detectron2's Dataset format.
332
- Each point is represented by (x, y, visibility).
333
- transforms (TransformList):
334
- image_size (tuple): the height, width of the transformed image
335
- keypoint_hflip_indices (ndarray[int]): see `create_keypoint_hflip_indices`.
336
- When `transforms` includes horizontal flip, will use the index
337
- mapping to flip keypoints.
338
- """
339
- # (N*3,) -> (N, 3)
340
- keypoints = np.asarray(keypoints, dtype="float64").reshape(-1, 3)
341
- keypoints_xy = transforms.apply_coords(keypoints[:, :2])
342
-
343
- # Set all out-of-boundary points to "unlabeled"
344
- inside = (keypoints_xy >= np.array([0, 0])) & (keypoints_xy <= np.array(image_size[::-1]))
345
- inside = inside.all(axis=1)
346
- keypoints[:, :2] = keypoints_xy
347
- keypoints[:, 2][~inside] = 0
348
-
349
- # This assumes that HorizFlipTransform is the only one that does flip
350
- do_hflip = sum(isinstance(t, T.HFlipTransform) for t in transforms.transforms) % 2 == 1
351
-
352
- # Alternative way: check if probe points was horizontally flipped.
353
- # probe = np.asarray([[0.0, 0.0], [image_width, 0.0]])
354
- # probe_aug = transforms.apply_coords(probe.copy())
355
- # do_hflip = np.sign(probe[1][0] - probe[0][0]) != np.sign(probe_aug[1][0] - probe_aug[0][0]) # noqa
356
-
357
- # If flipped, swap each keypoint with its opposite-handed equivalent
358
- if do_hflip:
359
- if keypoint_hflip_indices is None:
360
- raise ValueError("Cannot flip keypoints without providing flip indices!")
361
- if len(keypoints) != len(keypoint_hflip_indices):
362
- raise ValueError(
363
- "Keypoint data has {} points, but metadata "
364
- "contains {} points!".format(len(keypoints), len(keypoint_hflip_indices))
365
- )
366
- keypoints = keypoints[np.asarray(keypoint_hflip_indices, dtype=np.int32), :]
367
-
368
- # Maintain COCO convention that if visibility == 0 (unlabeled), then x, y = 0
369
- keypoints[keypoints[:, 2] == 0] = 0
370
- return keypoints
371
-
372
-
373
- def annotations_to_instances(annos, image_size, mask_format="polygon"):
374
- """
375
- Create an :class:`Instances` object used by the models,
376
- from instance annotations in the dataset dict.
377
-
378
- Args:
379
- annos (list[dict]): a list of instance annotations in one image, each
380
- element for one instance.
381
- image_size (tuple): height, width
382
-
383
- Returns:
384
- Instances:
385
- It will contain fields "gt_boxes", "gt_classes",
386
- "gt_masks", "gt_keypoints", if they can be obtained from `annos`.
387
- This is the format that builtin models expect.
388
- """
389
- boxes = (
390
- np.stack(
391
- [BoxMode.convert(obj["bbox"], obj["bbox_mode"], BoxMode.XYXY_ABS) for obj in annos]
392
- )
393
- if len(annos)
394
- else np.zeros((0, 4))
395
- )
396
- target = Instances(image_size)
397
- target.gt_boxes = Boxes(boxes)
398
-
399
- classes = [int(obj["category_id"]) for obj in annos]
400
- classes = torch.tensor(classes, dtype=torch.int64)
401
- target.gt_classes = classes
402
-
403
- if len(annos) and "segmentation" in annos[0]:
404
- segms = [obj["segmentation"] for obj in annos]
405
- if mask_format == "polygon":
406
- try:
407
- masks = PolygonMasks(segms)
408
- except ValueError as e:
409
- raise ValueError(
410
- "Failed to use mask_format=='polygon' from the given annotations!"
411
- ) from e
412
- else:
413
- assert mask_format == "bitmask", mask_format
414
- masks = []
415
- for segm in segms:
416
- if isinstance(segm, list):
417
- # polygon
418
- masks.append(polygons_to_bitmask(segm, *image_size))
419
- elif isinstance(segm, dict):
420
- # COCO RLE
421
- masks.append(mask_util.decode(segm))
422
- elif isinstance(segm, np.ndarray):
423
- assert segm.ndim == 2, "Expect segmentation of 2 dimensions, got {}.".format(
424
- segm.ndim
425
- )
426
- # mask array
427
- masks.append(segm)
428
- else:
429
- raise ValueError(
430
- "Cannot convert segmentation of type '{}' to BitMasks!"
431
- "Supported types are: polygons as list[list[float] or ndarray],"
432
- " COCO-style RLE as a dict, or a binary segmentation mask "
433
- " in a 2D numpy array of shape HxW.".format(type(segm))
434
- )
435
- # torch.from_numpy does not support array with negative stride.
436
- masks = BitMasks(
437
- torch.stack([torch.from_numpy(np.ascontiguousarray(x)) for x in masks])
438
- )
439
- target.gt_masks = masks
440
-
441
- if len(annos) and "keypoints" in annos[0]:
442
- kpts = [obj.get("keypoints", []) for obj in annos]
443
- target.gt_keypoints = Keypoints(kpts)
444
-
445
- return target
446
-
447
-
448
- def annotations_to_instances_rotated(annos, image_size):
449
- """
450
- Create an :class:`Instances` object used by the models,
451
- from instance annotations in the dataset dict.
452
- Compared to `annotations_to_instances`, this function is for rotated boxes only
453
-
454
- Args:
455
- annos (list[dict]): a list of instance annotations in one image, each
456
- element for one instance.
457
- image_size (tuple): height, width
458
-
459
- Returns:
460
- Instances:
461
- Containing fields "gt_boxes", "gt_classes",
462
- if they can be obtained from `annos`.
463
- This is the format that builtin models expect.
464
- """
465
- boxes = [obj["bbox"] for obj in annos]
466
- target = Instances(image_size)
467
- boxes = target.gt_boxes = RotatedBoxes(boxes)
468
- boxes.clip(image_size)
469
-
470
- classes = [obj["category_id"] for obj in annos]
471
- classes = torch.tensor(classes, dtype=torch.int64)
472
- target.gt_classes = classes
473
-
474
- return target
475
-
476
-
477
- def filter_empty_instances(
478
- instances, by_box=True, by_mask=True, box_threshold=1e-5, return_mask=False
479
- ):
480
- """
481
- Filter out empty instances in an `Instances` object.
482
-
483
- Args:
484
- instances (Instances):
485
- by_box (bool): whether to filter out instances with empty boxes
486
- by_mask (bool): whether to filter out instances with empty masks
487
- box_threshold (float): minimum width and height to be considered non-empty
488
- return_mask (bool): whether to return boolean mask of filtered instances
489
-
490
- Returns:
491
- Instances: the filtered instances.
492
- tensor[bool], optional: boolean mask of filtered instances
493
- """
494
- assert by_box or by_mask
495
- r = []
496
- if by_box:
497
- r.append(instances.gt_boxes.nonempty(threshold=box_threshold))
498
- if instances.has("gt_masks") and by_mask:
499
- r.append(instances.gt_masks.nonempty())
500
-
501
- # TODO: can also filter visible keypoints
502
-
503
- if not r:
504
- return instances
505
- m = r[0]
506
- for x in r[1:]:
507
- m = m & x
508
- if return_mask:
509
- return instances[m], m
510
- return instances[m]
511
-
512
-
513
- def create_keypoint_hflip_indices(dataset_names: Union[str, List[str]]) -> List[int]:
514
- """
515
- Args:
516
- dataset_names: list of dataset names
517
-
518
- Returns:
519
- list[int]: a list of size=#keypoints, storing the
520
- horizontally-flipped keypoint indices.
521
- """
522
- if isinstance(dataset_names, str):
523
- dataset_names = [dataset_names]
524
-
525
- check_metadata_consistency("keypoint_names", dataset_names)
526
- check_metadata_consistency("keypoint_flip_map", dataset_names)
527
-
528
- meta = MetadataCatalog.get(dataset_names[0])
529
- names = meta.keypoint_names
530
- # TODO flip -> hflip
531
- flip_map = dict(meta.keypoint_flip_map)
532
- flip_map.update({v: k for k, v in flip_map.items()})
533
- flipped_names = [i if i not in flip_map else flip_map[i] for i in names]
534
- flip_indices = [names.index(i) for i in flipped_names]
535
- return flip_indices
536
-
537
-
538
- def get_fed_loss_cls_weights(dataset_names: Union[str, List[str]], freq_weight_power=1.0):
539
- """
540
- Get frequency weight for each class sorted by class id.
541
- We now calcualte freqency weight using image_count to the power freq_weight_power.
542
-
543
- Args:
544
- dataset_names: list of dataset names
545
- freq_weight_power: power value
546
- """
547
- if isinstance(dataset_names, str):
548
- dataset_names = [dataset_names]
549
-
550
- check_metadata_consistency("class_image_count", dataset_names)
551
-
552
- meta = MetadataCatalog.get(dataset_names[0])
553
- class_freq_meta = meta.class_image_count
554
- class_freq = torch.tensor(
555
- [c["image_count"] for c in sorted(class_freq_meta, key=lambda x: x["id"])]
556
- )
557
- class_freq_weight = class_freq.float() ** freq_weight_power
558
- return class_freq_weight
559
-
560
-
561
- def gen_crop_transform_with_instance(crop_size, image_size, instance):
562
- """
563
- Generate a CropTransform so that the cropping region contains
564
- the center of the given instance.
565
-
566
- Args:
567
- crop_size (tuple): h, w in pixels
568
- image_size (tuple): h, w
569
- instance (dict): an annotation dict of one instance, in Detectron2's
570
- dataset format.
571
- """
572
- crop_size = np.asarray(crop_size, dtype=np.int32)
573
- bbox = BoxMode.convert(instance["bbox"], instance["bbox_mode"], BoxMode.XYXY_ABS)
574
- center_yx = (bbox[1] + bbox[3]) * 0.5, (bbox[0] + bbox[2]) * 0.5
575
- assert (
576
- image_size[0] >= center_yx[0] and image_size[1] >= center_yx[1]
577
- ), "The annotation bounding box is outside of the image!"
578
- assert (
579
- image_size[0] >= crop_size[0] and image_size[1] >= crop_size[1]
580
- ), "Crop size is larger than image size!"
581
-
582
- min_yx = np.maximum(np.floor(center_yx).astype(np.int32) - crop_size, 0)
583
- max_yx = np.maximum(np.asarray(image_size, dtype=np.int32) - crop_size, 0)
584
- max_yx = np.minimum(max_yx, np.ceil(center_yx).astype(np.int32))
585
-
586
- y0 = np.random.randint(min_yx[0], max_yx[0] + 1)
587
- x0 = np.random.randint(min_yx[1], max_yx[1] + 1)
588
- return T.CropTransform(x0, y0, crop_size[1], crop_size[0])
589
-
590
-
591
- def check_metadata_consistency(key, dataset_names):
592
- """
593
- Check that the datasets have consistent metadata.
594
-
595
- Args:
596
- key (str): a metadata key
597
- dataset_names (list[str]): a list of dataset names
598
-
599
- Raises:
600
- AttributeError: if the key does not exist in the metadata
601
- ValueError: if the given datasets do not have the same metadata values defined by key
602
- """
603
- if len(dataset_names) == 0:
604
- return
605
- logger = logging.getLogger(__name__)
606
- entries_per_dataset = [getattr(MetadataCatalog.get(d), key) for d in dataset_names]
607
- for idx, entry in enumerate(entries_per_dataset):
608
- if entry != entries_per_dataset[0]:
609
- logger.error(
610
- "Metadata '{}' for dataset '{}' is '{}'".format(key, dataset_names[idx], str(entry))
611
- )
612
- logger.error(
613
- "Metadata '{}' for dataset '{}' is '{}'".format(
614
- key, dataset_names[0], str(entries_per_dataset[0])
615
- )
616
- )
617
- raise ValueError("Datasets have different metadata '{}'!".format(key))
618
-
619
-
620
- def build_augmentation(cfg, is_train):
621
- """
622
- Create a list of default :class:`Augmentation` from config.
623
- Now it includes resizing and flipping.
624
-
625
- Returns:
626
- list[Augmentation]
627
- """
628
- if is_train:
629
- min_size = cfg.INPUT.MIN_SIZE_TRAIN
630
- max_size = cfg.INPUT.MAX_SIZE_TRAIN
631
- sample_style = cfg.INPUT.MIN_SIZE_TRAIN_SAMPLING
632
- else:
633
- min_size = cfg.INPUT.MIN_SIZE_TEST
634
- max_size = cfg.INPUT.MAX_SIZE_TEST
635
- sample_style = "choice"
636
- augmentation = [T.ResizeShortestEdge(min_size, max_size, sample_style)]
637
- if is_train and cfg.INPUT.RANDOM_FLIP != "none":
638
- augmentation.append(
639
- T.RandomFlip(
640
- horizontal=cfg.INPUT.RANDOM_FLIP == "horizontal",
641
- vertical=cfg.INPUT.RANDOM_FLIP == "vertical",
642
- )
643
- )
644
- return augmentation
645
-
646
-
647
- build_transform_gen = build_augmentation
648
- """
649
- Alias for backward-compatibility.
650
- """
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/data/transforms/__init__.py DELETED
@@ -1,15 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/data/transforms/__init__.py
3
-
4
- from fvcore.transforms.transform import *
5
- from .transform import *
6
- from detectron2.data.transforms.augmentation import *
7
- from .augmentation_impl import *
8
-
9
- __all__ = [k for k in globals().keys() if not k.startswith("_")]
10
-
11
-
12
- from detectron2.utils.env import fixup_module_metadata
13
-
14
- fixup_module_metadata(__name__, globals(), __all__)
15
- del fixup_module_metadata
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/data/transforms/augmentation_impl.py DELETED
@@ -1,616 +0,0 @@
1
- # -*- coding: utf-8 -*-
2
- # Copyright (c) Meta Platforms, Inc. and affiliates.
3
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/data/transforms/augmentation_impl.py
4
-
5
- """
6
- Implement many useful :class:`Augmentation`.
7
- """
8
- import numpy as np
9
- import sys
10
- from typing import Tuple
11
- import torch
12
- from fvcore.transforms.transform import (
13
- BlendTransform,
14
- CropTransform,
15
- HFlipTransform,
16
- NoOpTransform,
17
- PadTransform,
18
- Transform,
19
- TransformList,
20
- VFlipTransform,
21
- )
22
- from PIL import Image
23
-
24
- from detectron2.data.transforms.augmentation import Augmentation, _transform_to_aug
25
- from .transform import ExtentTransform, ResizeTransform, RotationTransform
26
-
27
- __all__ = [
28
- "FixedSizeCrop",
29
- "RandomApply",
30
- "RandomBrightness",
31
- "RandomContrast",
32
- "RandomCrop",
33
- "RandomExtent",
34
- "RandomFlip",
35
- "RandomSaturation",
36
- "RandomLighting",
37
- "RandomRotation",
38
- "Resize",
39
- "ResizeScale",
40
- "ResizeShortestEdge",
41
- "RandomCrop_CategoryAreaConstraint",
42
- ]
43
-
44
-
45
- class RandomApply(Augmentation):
46
- """
47
- Randomly apply an augmentation with a given probability.
48
- """
49
-
50
- def __init__(self, tfm_or_aug, prob=0.5):
51
- """
52
- Args:
53
- tfm_or_aug (Transform, Augmentation): the transform or augmentation
54
- to be applied. It can either be a `Transform` or `Augmentation`
55
- instance.
56
- prob (float): probability between 0.0 and 1.0 that
57
- the wrapper transformation is applied
58
- """
59
- super().__init__()
60
- self.aug = _transform_to_aug(tfm_or_aug)
61
- assert 0.0 <= prob <= 1.0, f"Probablity must be between 0.0 and 1.0 (given: {prob})"
62
- self.prob = prob
63
-
64
- def get_transform(self, *args):
65
- do = self._rand_range() < self.prob
66
- if do:
67
- return self.aug.get_transform(*args)
68
- else:
69
- return NoOpTransform()
70
-
71
- def __call__(self, aug_input):
72
- do = self._rand_range() < self.prob
73
- if do:
74
- return self.aug(aug_input)
75
- else:
76
- return NoOpTransform()
77
-
78
-
79
- class RandomFlip(Augmentation):
80
- """
81
- Flip the image horizontally or vertically with the given probability.
82
- """
83
-
84
- def __init__(self, prob=0.5, *, horizontal=True, vertical=False):
85
- """
86
- Args:
87
- prob (float): probability of flip.
88
- horizontal (boolean): whether to apply horizontal flipping
89
- vertical (boolean): whether to apply vertical flipping
90
- """
91
- super().__init__()
92
-
93
- if horizontal and vertical:
94
- raise ValueError("Cannot do both horiz and vert. Please use two Flip instead.")
95
- if not horizontal and not vertical:
96
- raise ValueError("At least one of horiz or vert has to be True!")
97
- self._init(locals())
98
-
99
- def get_transform(self, image):
100
- h, w = image.shape[:2]
101
- do = self._rand_range() < self.prob
102
- if do:
103
- if self.horizontal:
104
- return HFlipTransform(w)
105
- elif self.vertical:
106
- return VFlipTransform(h)
107
- else:
108
- return NoOpTransform()
109
-
110
-
111
- class Resize(Augmentation):
112
- """Resize image to a fixed target size"""
113
-
114
- def __init__(self, shape, interp=Image.BILINEAR):
115
- """
116
- Args:
117
- shape: (h, w) tuple or a int
118
- interp: PIL interpolation method
119
- """
120
- if isinstance(shape, int):
121
- shape = (shape, shape)
122
- shape = tuple(shape)
123
- self._init(locals())
124
-
125
- def get_transform(self, image):
126
- return ResizeTransform(
127
- image.shape[0], image.shape[1], self.shape[0], self.shape[1], self.interp
128
- )
129
-
130
-
131
- class ResizeShortestEdge(Augmentation):
132
- """
133
- Resize the image while keeping the aspect ratio unchanged.
134
- It attempts to scale the shorter edge to the given `short_edge_length`,
135
- as long as the longer edge does not exceed `max_size`.
136
- If `max_size` is reached, then downscale so that the longer edge does not exceed max_size.
137
- """
138
-
139
- @torch.jit.unused
140
- def __init__(
141
- self, short_edge_length, max_size=sys.maxsize, sample_style="range", interp=Image.BILINEAR
142
- ):
143
- """
144
- Args:
145
- short_edge_length (list[int]): If ``sample_style=="range"``,
146
- a [min, max] interval from which to sample the shortest edge length.
147
- If ``sample_style=="choice"``, a list of shortest edge lengths to sample from.
148
- max_size (int): maximum allowed longest edge length.
149
- sample_style (str): either "range" or "choice".
150
- """
151
- super().__init__()
152
- assert sample_style in ["range", "choice"], sample_style
153
-
154
- self.is_range = sample_style == "range"
155
- if isinstance(short_edge_length, int):
156
- short_edge_length = (short_edge_length, short_edge_length)
157
- if self.is_range:
158
- assert len(short_edge_length) == 2, (
159
- "short_edge_length must be two values using 'range' sample style."
160
- f" Got {short_edge_length}!"
161
- )
162
- self._init(locals())
163
-
164
- @torch.jit.unused
165
- def get_transform(self, image):
166
- h, w = image.shape[:2]
167
- if self.is_range:
168
- size = np.random.randint(self.short_edge_length[0], self.short_edge_length[1] + 1)
169
- else:
170
- size = np.random.choice(self.short_edge_length)
171
- if size == 0:
172
- return NoOpTransform()
173
-
174
- newh, neww = ResizeShortestEdge.get_output_shape(h, w, size, self.max_size)
175
- return ResizeTransform(h, w, newh, neww, self.interp)
176
-
177
- @staticmethod
178
- def get_output_shape(
179
- oldh: int, oldw: int, short_edge_length: int, max_size: int
180
- ) -> Tuple[int, int]:
181
- """
182
- Compute the output size given input size and target short edge length.
183
- """
184
- h, w = oldh, oldw
185
- size = short_edge_length * 1.0
186
- scale = size / min(h, w)
187
- if h < w:
188
- newh, neww = size, scale * w
189
- else:
190
- newh, neww = scale * h, size
191
- if max(newh, neww) > max_size:
192
- scale = max_size * 1.0 / max(newh, neww)
193
- newh = newh * scale
194
- neww = neww * scale
195
- neww = int(neww + 0.5)
196
- newh = int(newh + 0.5)
197
- return (newh, neww)
198
-
199
-
200
- class ResizeScale(Augmentation):
201
- """
202
- Takes target size as input and randomly scales the given target size between `min_scale`
203
- and `max_scale`. It then scales the input image such that it fits inside the scaled target
204
- box, keeping the aspect ratio constant.
205
- This implements the resize part of the Google's 'resize_and_crop' data augmentation:
206
- https://github.com/tensorflow/tpu/blob/master/models/official/detection/utils/input_utils.py#L127
207
- """
208
-
209
- def __init__(
210
- self,
211
- min_scale: float,
212
- max_scale: float,
213
- target_height: int,
214
- target_width: int,
215
- interp: int = Image.BILINEAR,
216
- ):
217
- """
218
- Args:
219
- min_scale: minimum image scale range.
220
- max_scale: maximum image scale range.
221
- target_height: target image height.
222
- target_width: target image width.
223
- interp: image interpolation method.
224
- """
225
- super().__init__()
226
- self._init(locals())
227
-
228
- def _get_resize(self, image: np.ndarray, scale: float) -> Transform:
229
- input_size = image.shape[:2]
230
-
231
- # Compute new target size given a scale.
232
- target_size = (self.target_height, self.target_width)
233
- target_scale_size = np.multiply(target_size, scale)
234
-
235
- # Compute actual rescaling applied to input image and output size.
236
- output_scale = np.minimum(
237
- target_scale_size[0] / input_size[0], target_scale_size[1] / input_size[1]
238
- )
239
- output_size = np.round(np.multiply(input_size, output_scale)).astype(int)
240
-
241
- return ResizeTransform(
242
- input_size[0], input_size[1], output_size[0], output_size[1], self.interp
243
- )
244
-
245
- def get_transform(self, image: np.ndarray) -> Transform:
246
- random_scale = np.random.uniform(self.min_scale, self.max_scale)
247
- return self._get_resize(image, random_scale)
248
-
249
-
250
- class RandomRotation(Augmentation):
251
- """
252
- This method returns a copy of this image, rotated the given
253
- number of degrees counter clockwise around the given center.
254
- """
255
-
256
- def __init__(self, angle, expand=True, center=None, sample_style="range", interp=None):
257
- """
258
- Args:
259
- angle (list[float]): If ``sample_style=="range"``,
260
- a [min, max] interval from which to sample the angle (in degrees).
261
- If ``sample_style=="choice"``, a list of angles to sample from
262
- expand (bool): choose if the image should be resized to fit the whole
263
- rotated image (default), or simply cropped
264
- center (list[[float, float]]): If ``sample_style=="range"``,
265
- a [[minx, miny], [maxx, maxy]] relative interval from which to sample the center,
266
- [0, 0] being the top left of the image and [1, 1] the bottom right.
267
- If ``sample_style=="choice"``, a list of centers to sample from
268
- Default: None, which means that the center of rotation is the center of the image
269
- center has no effect if expand=True because it only affects shifting
270
- """
271
- super().__init__()
272
- assert sample_style in ["range", "choice"], sample_style
273
- self.is_range = sample_style == "range"
274
- if isinstance(angle, (float, int)):
275
- angle = (angle, angle)
276
- if center is not None and isinstance(center[0], (float, int)):
277
- center = (center, center)
278
- self._init(locals())
279
-
280
- def get_transform(self, image):
281
- h, w = image.shape[:2]
282
- center = None
283
- if self.is_range:
284
- angle = np.random.uniform(self.angle[0], self.angle[1])
285
- if self.center is not None:
286
- center = (
287
- np.random.uniform(self.center[0][0], self.center[1][0]),
288
- np.random.uniform(self.center[0][1], self.center[1][1]),
289
- )
290
- else:
291
- angle = np.random.choice(self.angle)
292
- if self.center is not None:
293
- center = np.random.choice(self.center)
294
-
295
- if center is not None:
296
- center = (w * center[0], h * center[1]) # Convert to absolute coordinates
297
-
298
- if angle % 360 == 0:
299
- return NoOpTransform()
300
-
301
- return RotationTransform(h, w, angle, expand=self.expand, center=center, interp=self.interp)
302
-
303
-
304
- class FixedSizeCrop(Augmentation):
305
- """
306
- If `crop_size` is smaller than the input image size, then it uses a random crop of
307
- the crop size. If `crop_size` is larger than the input image size, then it pads
308
- the right and the bottom of the image to the crop size if `pad` is True, otherwise
309
- it returns the smaller image.
310
- """
311
-
312
- def __init__(self, crop_size: Tuple[int], pad: bool = True, pad_value: float = 128.0):
313
- """
314
- Args:
315
- crop_size: target image (height, width).
316
- pad: if True, will pad images smaller than `crop_size` up to `crop_size`
317
- pad_value: the padding value.
318
- """
319
- super().__init__()
320
- self._init(locals())
321
-
322
- def _get_crop(self, image: np.ndarray) -> Transform:
323
- # Compute the image scale and scaled size.
324
- input_size = image.shape[:2]
325
- output_size = self.crop_size
326
-
327
- # Add random crop if the image is scaled up.
328
- max_offset = np.subtract(input_size, output_size)
329
- max_offset = np.maximum(max_offset, 0)
330
- offset = np.multiply(max_offset, np.random.uniform(0.0, 1.0))
331
- offset = np.round(offset).astype(int)
332
- return CropTransform(
333
- offset[1], offset[0], output_size[1], output_size[0], input_size[1], input_size[0]
334
- )
335
-
336
- def _get_pad(self, image: np.ndarray) -> Transform:
337
- # Compute the image scale and scaled size.
338
- input_size = image.shape[:2]
339
- output_size = self.crop_size
340
-
341
- # Add padding if the image is scaled down.
342
- pad_size = np.subtract(output_size, input_size)
343
- pad_size = np.maximum(pad_size, 0)
344
- original_size = np.minimum(input_size, output_size)
345
- return PadTransform(
346
- 0, 0, pad_size[1], pad_size[0], original_size[1], original_size[0], self.pad_value
347
- )
348
-
349
- def get_transform(self, image: np.ndarray) -> TransformList:
350
- transforms = [self._get_crop(image)]
351
- if self.pad:
352
- transforms.append(self._get_pad(image))
353
- return TransformList(transforms)
354
-
355
-
356
- class RandomCrop(Augmentation):
357
- """
358
- Randomly crop a rectangle region out of an image.
359
- """
360
-
361
- def __init__(self, crop_type: str, crop_size):
362
- """
363
- Args:
364
- crop_type (str): one of "relative_range", "relative", "absolute", "absolute_range".
365
- crop_size (tuple[float, float]): two floats, explained below.
366
-
367
- - "relative": crop a (H * crop_size[0], W * crop_size[1]) region from an input image of
368
- size (H, W). crop size should be in (0, 1]
369
- - "relative_range": uniformly sample two values from [crop_size[0], 1]
370
- and [crop_size[1]], 1], and use them as in "relative" crop type.
371
- - "absolute" crop a (crop_size[0], crop_size[1]) region from input image.
372
- crop_size must be smaller than the input image size.
373
- - "absolute_range", for an input of size (H, W), uniformly sample H_crop in
374
- [crop_size[0], min(H, crop_size[1])] and W_crop in [crop_size[0], min(W, crop_size[1])].
375
- Then crop a region (H_crop, W_crop).
376
- """
377
- # TODO style of relative_range and absolute_range are not consistent:
378
- # one takes (h, w) but another takes (min, max)
379
- super().__init__()
380
- assert crop_type in ["relative_range", "relative", "absolute", "absolute_range"]
381
- self._init(locals())
382
-
383
- def get_transform(self, image):
384
- h, w = image.shape[:2]
385
- croph, cropw = self.get_crop_size((h, w))
386
- assert h >= croph and w >= cropw, "Shape computation in {} has bugs.".format(self)
387
- h0 = np.random.randint(h - croph + 1)
388
- w0 = np.random.randint(w - cropw + 1)
389
- return CropTransform(w0, h0, cropw, croph)
390
-
391
- def get_crop_size(self, image_size):
392
- """
393
- Args:
394
- image_size (tuple): height, width
395
-
396
- Returns:
397
- crop_size (tuple): height, width in absolute pixels
398
- """
399
- h, w = image_size
400
- if self.crop_type == "relative":
401
- ch, cw = self.crop_size
402
- return int(h * ch + 0.5), int(w * cw + 0.5)
403
- elif self.crop_type == "relative_range":
404
- crop_size = np.asarray(self.crop_size, dtype=np.float32)
405
- ch, cw = crop_size + np.random.rand(2) * (1 - crop_size)
406
- return int(h * ch + 0.5), int(w * cw + 0.5)
407
- elif self.crop_type == "absolute":
408
- return (min(self.crop_size[0], h), min(self.crop_size[1], w))
409
- elif self.crop_type == "absolute_range":
410
- assert self.crop_size[0] <= self.crop_size[1]
411
- ch = np.random.randint(min(h, self.crop_size[0]), min(h, self.crop_size[1]) + 1)
412
- cw = np.random.randint(min(w, self.crop_size[0]), min(w, self.crop_size[1]) + 1)
413
- return ch, cw
414
- else:
415
- raise NotImplementedError("Unknown crop type {}".format(self.crop_type))
416
-
417
-
418
- class RandomCrop_CategoryAreaConstraint(Augmentation):
419
- """
420
- Similar to :class:`RandomCrop`, but find a cropping window such that no single category
421
- occupies a ratio of more than `single_category_max_area` in semantic segmentation ground
422
- truth, which can cause unstability in training. The function attempts to find such a valid
423
- cropping window for at most 10 times.
424
- """
425
-
426
- def __init__(
427
- self,
428
- crop_type: str,
429
- crop_size,
430
- single_category_max_area: float = 1.0,
431
- ignored_category: int = None,
432
- ):
433
- """
434
- Args:
435
- crop_type, crop_size: same as in :class:`RandomCrop`
436
- single_category_max_area: the maximum allowed area ratio of a
437
- category. Set to 1.0 to disable
438
- ignored_category: allow this category in the semantic segmentation
439
- ground truth to exceed the area ratio. Usually set to the category
440
- that's ignored in training.
441
- """
442
- self.crop_aug = RandomCrop(crop_type, crop_size)
443
- self._init(locals())
444
-
445
- def get_transform(self, image, sem_seg):
446
- if self.single_category_max_area >= 1.0:
447
- return self.crop_aug.get_transform(image)
448
- else:
449
- h, w = sem_seg.shape
450
- for _ in range(10):
451
- crop_size = self.crop_aug.get_crop_size((h, w))
452
- y0 = np.random.randint(h - crop_size[0] + 1)
453
- x0 = np.random.randint(w - crop_size[1] + 1)
454
- sem_seg_temp = sem_seg[y0 : y0 + crop_size[0], x0 : x0 + crop_size[1]]
455
- labels, cnt = np.unique(sem_seg_temp, return_counts=True)
456
- if self.ignored_category is not None:
457
- cnt = cnt[labels != self.ignored_category]
458
- if len(cnt) > 1 and np.max(cnt) < np.sum(cnt) * self.single_category_max_area:
459
- break
460
- crop_tfm = CropTransform(x0, y0, crop_size[1], crop_size[0])
461
- return crop_tfm
462
-
463
-
464
- class RandomExtent(Augmentation):
465
- """
466
- Outputs an image by cropping a random "subrect" of the source image.
467
-
468
- The subrect can be parameterized to include pixels outside the source image,
469
- in which case they will be set to zeros (i.e. black). The size of the output
470
- image will vary with the size of the random subrect.
471
- """
472
-
473
- def __init__(self, scale_range, shift_range):
474
- """
475
- Args:
476
- output_size (h, w): Dimensions of output image
477
- scale_range (l, h): Range of input-to-output size scaling factor
478
- shift_range (x, y): Range of shifts of the cropped subrect. The rect
479
- is shifted by [w / 2 * Uniform(-x, x), h / 2 * Uniform(-y, y)],
480
- where (w, h) is the (width, height) of the input image. Set each
481
- component to zero to crop at the image's center.
482
- """
483
- super().__init__()
484
- self._init(locals())
485
-
486
- def get_transform(self, image):
487
- img_h, img_w = image.shape[:2]
488
-
489
- # Initialize src_rect to fit the input image.
490
- src_rect = np.array([-0.5 * img_w, -0.5 * img_h, 0.5 * img_w, 0.5 * img_h])
491
-
492
- # Apply a random scaling to the src_rect.
493
- src_rect *= np.random.uniform(self.scale_range[0], self.scale_range[1])
494
-
495
- # Apply a random shift to the coordinates origin.
496
- src_rect[0::2] += self.shift_range[0] * img_w * (np.random.rand() - 0.5)
497
- src_rect[1::2] += self.shift_range[1] * img_h * (np.random.rand() - 0.5)
498
-
499
- # Map src_rect coordinates into image coordinates (center at corner).
500
- src_rect[0::2] += 0.5 * img_w
501
- src_rect[1::2] += 0.5 * img_h
502
-
503
- return ExtentTransform(
504
- src_rect=(src_rect[0], src_rect[1], src_rect[2], src_rect[3]),
505
- output_size=(int(src_rect[3] - src_rect[1]), int(src_rect[2] - src_rect[0])),
506
- )
507
-
508
-
509
- class RandomContrast(Augmentation):
510
- """
511
- Randomly transforms image contrast.
512
-
513
- Contrast intensity is uniformly sampled in (intensity_min, intensity_max).
514
- - intensity < 1 will reduce contrast
515
- - intensity = 1 will preserve the input image
516
- - intensity > 1 will increase contrast
517
-
518
- See: https://pillow.readthedocs.io/en/3.0.x/reference/ImageEnhance.html
519
- """
520
-
521
- def __init__(self, intensity_min, intensity_max):
522
- """
523
- Args:
524
- intensity_min (float): Minimum augmentation
525
- intensity_max (float): Maximum augmentation
526
- """
527
- super().__init__()
528
- self._init(locals())
529
-
530
- def get_transform(self, image):
531
- w = np.random.uniform(self.intensity_min, self.intensity_max)
532
- return BlendTransform(src_image=image.mean(), src_weight=1 - w, dst_weight=w)
533
-
534
-
535
- class RandomBrightness(Augmentation):
536
- """
537
- Randomly transforms image brightness.
538
-
539
- Brightness intensity is uniformly sampled in (intensity_min, intensity_max).
540
- - intensity < 1 will reduce brightness
541
- - intensity = 1 will preserve the input image
542
- - intensity > 1 will increase brightness
543
-
544
- See: https://pillow.readthedocs.io/en/3.0.x/reference/ImageEnhance.html
545
- """
546
-
547
- def __init__(self, intensity_min, intensity_max):
548
- """
549
- Args:
550
- intensity_min (float): Minimum augmentation
551
- intensity_max (float): Maximum augmentation
552
- """
553
- super().__init__()
554
- self._init(locals())
555
-
556
- def get_transform(self, image):
557
- w = np.random.uniform(self.intensity_min, self.intensity_max)
558
- return BlendTransform(src_image=0, src_weight=1 - w, dst_weight=w)
559
-
560
-
561
- class RandomSaturation(Augmentation):
562
- """
563
- Randomly transforms saturation of an RGB image.
564
- Input images are assumed to have 'RGB' channel order.
565
-
566
- Saturation intensity is uniformly sampled in (intensity_min, intensity_max).
567
- - intensity < 1 will reduce saturation (make the image more grayscale)
568
- - intensity = 1 will preserve the input image
569
- - intensity > 1 will increase saturation
570
-
571
- See: https://pillow.readthedocs.io/en/3.0.x/reference/ImageEnhance.html
572
- """
573
-
574
- def __init__(self, intensity_min, intensity_max):
575
- """
576
- Args:
577
- intensity_min (float): Minimum augmentation (1 preserves input).
578
- intensity_max (float): Maximum augmentation (1 preserves input).
579
- """
580
- super().__init__()
581
- self._init(locals())
582
-
583
- def get_transform(self, image):
584
- assert image.shape[-1] == 3, "RandomSaturation only works on RGB images"
585
- w = np.random.uniform(self.intensity_min, self.intensity_max)
586
- grayscale = image.dot([0.299, 0.587, 0.114])[:, :, np.newaxis]
587
- return BlendTransform(src_image=grayscale, src_weight=1 - w, dst_weight=w)
588
-
589
-
590
- class RandomLighting(Augmentation):
591
- """
592
- The "lighting" augmentation described in AlexNet, using fixed PCA over ImageNet.
593
- Input images are assumed to have 'RGB' channel order.
594
-
595
- The degree of color jittering is randomly sampled via a normal distribution,
596
- with standard deviation given by the scale parameter.
597
- """
598
-
599
- def __init__(self, scale):
600
- """
601
- Args:
602
- scale (float): Standard deviation of principal component weighting.
603
- """
604
- super().__init__()
605
- self._init(locals())
606
- self.eigen_vecs = np.array(
607
- [[-0.5675, 0.7192, 0.4009], [-0.5808, -0.0045, -0.8140], [-0.5836, -0.6948, 0.4203]]
608
- )
609
- self.eigen_vals = np.array([0.2175, 0.0188, 0.0045])
610
-
611
- def get_transform(self, image):
612
- assert image.shape[-1] == 3, "RandomLighting only works on RGB images"
613
- weights = np.random.normal(scale=self.scale, size=3)
614
- return BlendTransform(
615
- src_image=self.eigen_vecs.dot(weights * self.eigen_vals), src_weight=1.0, dst_weight=1.0
616
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/data/transforms/transform.py DELETED
@@ -1,355 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/data/transforms/transform.py
3
-
4
- """
5
- See "Data Augmentation" tutorial for an overview of the system:
6
- https://detectron2.readthedocs.io/tutorials/augmentation.html
7
- """
8
-
9
- import numpy as np
10
- import torch
11
- import torch.nn.functional as F
12
- from fvcore.transforms.transform import (
13
- CropTransform,
14
- HFlipTransform,
15
- NoOpTransform,
16
- Transform,
17
- TransformList,
18
- )
19
- from PIL import Image
20
-
21
- try:
22
- import cv2 # noqa
23
- except ImportError:
24
- # OpenCV is an optional dependency at the moment
25
- pass
26
-
27
- __all__ = [
28
- "ExtentTransform",
29
- "ResizeTransform",
30
- "RotationTransform",
31
- "ColorTransform",
32
- "PILColorTransform",
33
- ]
34
-
35
-
36
- class ExtentTransform(Transform):
37
- """
38
- Extracts a subregion from the source image and scales it to the output size.
39
-
40
- The fill color is used to map pixels from the source rect that fall outside
41
- the source image.
42
-
43
- See: https://pillow.readthedocs.io/en/latest/PIL.html#PIL.ImageTransform.ExtentTransform
44
- """
45
-
46
- def __init__(self, src_rect, output_size, interp=Image.LINEAR, fill=0):
47
- """
48
- Args:
49
- src_rect (x0, y0, x1, y1): src coordinates
50
- output_size (h, w): dst image size
51
- interp: PIL interpolation methods
52
- fill: Fill color used when src_rect extends outside image
53
- """
54
- super().__init__()
55
- self._set_attributes(locals())
56
-
57
- def apply_image(self, img, interp=None):
58
- h, w = self.output_size
59
- if len(img.shape) > 2 and img.shape[2] == 1:
60
- pil_image = Image.fromarray(img[:, :, 0], mode="L")
61
- else:
62
- pil_image = Image.fromarray(img)
63
- pil_image = pil_image.transform(
64
- size=(w, h),
65
- method=Image.EXTENT,
66
- data=self.src_rect,
67
- resample=interp if interp else self.interp,
68
- fill=self.fill,
69
- )
70
- ret = np.asarray(pil_image)
71
- if len(img.shape) > 2 and img.shape[2] == 1:
72
- ret = np.expand_dims(ret, -1)
73
- return ret
74
-
75
- def apply_coords(self, coords):
76
- # Transform image center from source coordinates into output coordinates
77
- # and then map the new origin to the corner of the output image.
78
- h, w = self.output_size
79
- x0, y0, x1, y1 = self.src_rect
80
- new_coords = coords.astype(np.float32)
81
- new_coords[:, 0] -= 0.5 * (x0 + x1)
82
- new_coords[:, 1] -= 0.5 * (y0 + y1)
83
- new_coords[:, 0] *= w / (x1 - x0)
84
- new_coords[:, 1] *= h / (y1 - y0)
85
- new_coords[:, 0] += 0.5 * w
86
- new_coords[:, 1] += 0.5 * h
87
- return new_coords
88
-
89
- def apply_segmentation(self, segmentation):
90
- segmentation = self.apply_image(segmentation, interp=Image.NEAREST)
91
- return segmentation
92
-
93
-
94
- class ResizeTransform(Transform):
95
- """
96
- Resize the image to a target size.
97
- """
98
-
99
- def __init__(self, h, w, new_h, new_w, interp=None):
100
- """
101
- Args:
102
- h, w (int): original image size
103
- new_h, new_w (int): new image size
104
- interp: PIL interpolation methods, defaults to bilinear.
105
- """
106
- # TODO decide on PIL vs opencv
107
- super().__init__()
108
- if interp is None:
109
- interp = Image.BILINEAR
110
- self._set_attributes(locals())
111
-
112
- def apply_image(self, img, interp=None):
113
- try:
114
- img.shape[:2] == (self.h, self.w)
115
- except:
116
- (self.h, self.w) = (self.w, self.h)
117
- assert img.shape[:2] == (self.h, self.w)
118
- assert len(img.shape) <= 4
119
- interp_method = interp if interp is not None else self.interp
120
-
121
- if img.dtype == np.uint8:
122
- if len(img.shape) > 2 and img.shape[2] == 1:
123
- pil_image = Image.fromarray(img[:, :, 0], mode="L")
124
- else:
125
- pil_image = Image.fromarray(img)
126
- pil_image = pil_image.resize((self.new_w, self.new_h), interp_method)
127
- ret = np.asarray(pil_image)
128
- if len(img.shape) > 2 and img.shape[2] == 1:
129
- ret = np.expand_dims(ret, -1)
130
- else:
131
- # PIL only supports uint8
132
- if any(x < 0 for x in img.strides):
133
- img = np.ascontiguousarray(img)
134
- img = torch.from_numpy(img)
135
- shape = list(img.shape)
136
- shape_4d = shape[:2] + [1] * (4 - len(shape)) + shape[2:]
137
- img = img.view(shape_4d).permute(2, 3, 0, 1) # hw(c) -> nchw
138
- _PIL_RESIZE_TO_INTERPOLATE_MODE = {
139
- Image.NEAREST: "nearest",
140
- Image.BILINEAR: "bilinear",
141
- Image.BICUBIC: "bicubic",
142
- }
143
- mode = _PIL_RESIZE_TO_INTERPOLATE_MODE[interp_method]
144
- align_corners = None if mode == "nearest" else False
145
- img = F.interpolate(
146
- img, (self.new_h, self.new_w), mode=mode, align_corners=align_corners
147
- )
148
- shape[:2] = (self.new_h, self.new_w)
149
- ret = img.permute(2, 3, 0, 1).view(shape).numpy() # nchw -> hw(c)
150
-
151
- return ret
152
-
153
- def apply_coords(self, coords):
154
- coords[:, 0] = coords[:, 0] * (self.new_w * 1.0 / self.w)
155
- coords[:, 1] = coords[:, 1] * (self.new_h * 1.0 / self.h)
156
- return coords
157
-
158
- def apply_segmentation(self, segmentation):
159
- segmentation = self.apply_image(segmentation, interp=Image.NEAREST)
160
- return segmentation
161
-
162
- def inverse(self):
163
- return ResizeTransform(self.new_h, self.new_w, self.h, self.w, self.interp)
164
-
165
-
166
- class RotationTransform(Transform):
167
- """
168
- This method returns a copy of this image, rotated the given
169
- number of degrees counter clockwise around its center.
170
- """
171
-
172
- def __init__(self, h, w, angle, expand=True, center=None, interp=None):
173
- """
174
- Args:
175
- h, w (int): original image size
176
- angle (float): degrees for rotation
177
- expand (bool): choose if the image should be resized to fit the whole
178
- rotated image (default), or simply cropped
179
- center (tuple (width, height)): coordinates of the rotation center
180
- if left to None, the center will be fit to the center of each image
181
- center has no effect if expand=True because it only affects shifting
182
- interp: cv2 interpolation method, default cv2.INTER_LINEAR
183
- """
184
- super().__init__()
185
- image_center = np.array((w / 2, h / 2))
186
- if center is None:
187
- center = image_center
188
- if interp is None:
189
- interp = cv2.INTER_LINEAR
190
- abs_cos, abs_sin = (abs(np.cos(np.deg2rad(angle))), abs(np.sin(np.deg2rad(angle))))
191
- if expand:
192
- # find the new width and height bounds
193
- bound_w, bound_h = np.rint(
194
- [h * abs_sin + w * abs_cos, h * abs_cos + w * abs_sin]
195
- ).astype(int)
196
- else:
197
- bound_w, bound_h = w, h
198
-
199
- self._set_attributes(locals())
200
- self.rm_coords = self.create_rotation_matrix()
201
- # Needed because of this problem https://github.com/opencv/opencv/issues/11784
202
- self.rm_image = self.create_rotation_matrix(offset=-0.5)
203
-
204
- def apply_image(self, img, interp=None):
205
- """
206
- img should be a numpy array, formatted as Height * Width * Nchannels
207
- """
208
- if len(img) == 0 or self.angle % 360 == 0:
209
- return img
210
- assert img.shape[:2] == (self.h, self.w)
211
- interp = interp if interp is not None else self.interp
212
- return cv2.warpAffine(img, self.rm_image, (self.bound_w, self.bound_h), flags=interp)
213
-
214
- def apply_coords(self, coords):
215
- """
216
- coords should be a N * 2 array-like, containing N couples of (x, y) points
217
- """
218
- coords = np.asarray(coords, dtype=float)
219
- if len(coords) == 0 or self.angle % 360 == 0:
220
- return coords
221
- return cv2.transform(coords[:, np.newaxis, :], self.rm_coords)[:, 0, :]
222
-
223
- def apply_segmentation(self, segmentation):
224
- segmentation = self.apply_image(segmentation, interp=cv2.INTER_NEAREST)
225
- return segmentation
226
-
227
- def create_rotation_matrix(self, offset=0):
228
- center = (self.center[0] + offset, self.center[1] + offset)
229
- rm = cv2.getRotationMatrix2D(tuple(center), self.angle, 1)
230
- if self.expand:
231
- # Find the coordinates of the center of rotation in the new image
232
- # The only point for which we know the future coordinates is the center of the image
233
- rot_im_center = cv2.transform(self.image_center[None, None, :] + offset, rm)[0, 0, :]
234
- new_center = np.array([self.bound_w / 2, self.bound_h / 2]) + offset - rot_im_center
235
- # shift the rotation center to the new coordinates
236
- rm[:, 2] += new_center
237
- return rm
238
-
239
- def inverse(self):
240
- """
241
- The inverse is to rotate it back with expand, and crop to get the original shape.
242
- """
243
- if not self.expand: # Not possible to inverse if a part of the image is lost
244
- raise NotImplementedError()
245
- rotation = RotationTransform(
246
- self.bound_h, self.bound_w, -self.angle, True, None, self.interp
247
- )
248
- crop = CropTransform(
249
- (rotation.bound_w - self.w) // 2, (rotation.bound_h - self.h) // 2, self.w, self.h
250
- )
251
- return TransformList([rotation, crop])
252
-
253
-
254
- class ColorTransform(Transform):
255
- """
256
- Generic wrapper for any photometric transforms.
257
- These transformations should only affect the color space and
258
- not the coordinate space of the image (e.g. annotation
259
- coordinates such as bounding boxes should not be changed)
260
- """
261
-
262
- def __init__(self, op):
263
- """
264
- Args:
265
- op (Callable): operation to be applied to the image,
266
- which takes in an ndarray and returns an ndarray.
267
- """
268
- if not callable(op):
269
- raise ValueError("op parameter should be callable")
270
- super().__init__()
271
- self._set_attributes(locals())
272
-
273
- def apply_image(self, img):
274
- return self.op(img)
275
-
276
- def apply_coords(self, coords):
277
- return coords
278
-
279
- def inverse(self):
280
- return NoOpTransform()
281
-
282
- def apply_segmentation(self, segmentation):
283
- return segmentation
284
-
285
-
286
- class PILColorTransform(ColorTransform):
287
- """
288
- Generic wrapper for PIL Photometric image transforms,
289
- which affect the color space and not the coordinate
290
- space of the image
291
- """
292
-
293
- def __init__(self, op):
294
- """
295
- Args:
296
- op (Callable): operation to be applied to the image,
297
- which takes in a PIL Image and returns a transformed
298
- PIL Image.
299
- For reference on possible operations see:
300
- - https://pillow.readthedocs.io/en/stable/
301
- """
302
- if not callable(op):
303
- raise ValueError("op parameter should be callable")
304
- super().__init__(op)
305
-
306
- def apply_image(self, img):
307
- img = Image.fromarray(img)
308
- return np.asarray(super().apply_image(img))
309
-
310
-
311
- def HFlip_rotated_box(transform, rotated_boxes):
312
- """
313
- Apply the horizontal flip transform on rotated boxes.
314
-
315
- Args:
316
- rotated_boxes (ndarray): Nx5 floating point array of
317
- (x_center, y_center, width, height, angle_degrees) format
318
- in absolute coordinates.
319
- """
320
- # Transform x_center
321
- rotated_boxes[:, 0] = transform.width - rotated_boxes[:, 0]
322
- # Transform angle
323
- rotated_boxes[:, 4] = -rotated_boxes[:, 4]
324
- return rotated_boxes
325
-
326
-
327
- def Resize_rotated_box(transform, rotated_boxes):
328
- """
329
- Apply the resizing transform on rotated boxes. For details of how these (approximation)
330
- formulas are derived, please refer to :meth:`RotatedBoxes.scale`.
331
-
332
- Args:
333
- rotated_boxes (ndarray): Nx5 floating point array of
334
- (x_center, y_center, width, height, angle_degrees) format
335
- in absolute coordinates.
336
- """
337
- scale_factor_x = transform.new_w * 1.0 / transform.w
338
- scale_factor_y = transform.new_h * 1.0 / transform.h
339
- rotated_boxes[:, 0] *= scale_factor_x
340
- rotated_boxes[:, 1] *= scale_factor_y
341
- theta = rotated_boxes[:, 4] * np.pi / 180.0
342
- c = np.cos(theta)
343
- s = np.sin(theta)
344
- rotated_boxes[:, 2] *= np.sqrt(np.square(scale_factor_x * c) + np.square(scale_factor_y * s))
345
- rotated_boxes[:, 3] *= np.sqrt(np.square(scale_factor_x * s) + np.square(scale_factor_y * c))
346
- rotated_boxes[:, 4] = np.arctan2(scale_factor_x * s, scale_factor_y * c) * 180 / np.pi
347
-
348
- return rotated_boxes
349
-
350
-
351
- HFlipTransform.register_type("rotated_box", HFlip_rotated_box)
352
- ResizeTransform.register_type("rotated_box", Resize_rotated_box)
353
-
354
- # not necessary any more with latest fvcore
355
- NoOpTransform.register_type("rotated_box", lambda t, x: x)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/demo/__init__.py DELETED
@@ -1,5 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- from demo import *
3
- from predictor import *
4
-
5
- __all__ = [k for k in globals().keys() if not k.startswith("_")]
 
 
 
 
 
 
cutler/demo/demo.py DELETED
@@ -1,197 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/demo/demo.py
3
-
4
- import argparse
5
- import glob
6
- import multiprocessing as mp
7
- import numpy as np
8
- import os
9
- import tempfile
10
- import time
11
- import warnings
12
- import cv2
13
- import tqdm
14
-
15
- from detectron2.config import get_cfg
16
- from detectron2.data.detection_utils import read_image
17
- from detectron2.utils.logger import setup_logger
18
- import sys
19
- sys.path.append('./')
20
- sys.path.append('../')
21
- from config import add_cutler_config
22
-
23
- from predictor import VisualizationDemo
24
-
25
- # constants
26
- WINDOW_NAME = "CutLER detections"
27
-
28
-
29
- def setup_cfg(args):
30
- # load config from file and command-line arguments
31
- cfg = get_cfg()
32
- add_cutler_config(cfg)
33
- cfg.merge_from_file(args.config_file)
34
- cfg.merge_from_list(args.opts)
35
- # Disable the use of SyncBN normalization when running on a CPU
36
- # SyncBN is not supported on CPU and can cause errors, so we switch to BN instead
37
- if cfg.MODEL.DEVICE == 'cpu' and cfg.MODEL.RESNETS.NORM == 'SyncBN':
38
- cfg.MODEL.RESNETS.NORM = "BN"
39
- cfg.MODEL.FPN.NORM = "BN"
40
- # Set score_threshold for builtin models
41
- cfg.MODEL.RETINANET.SCORE_THRESH_TEST = args.confidence_threshold
42
- cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = args.confidence_threshold
43
- cfg.MODEL.PANOPTIC_FPN.COMBINE.INSTANCES_CONFIDENCE_THRESH = args.confidence_threshold
44
- cfg.freeze()
45
- return cfg
46
-
47
-
48
- def get_parser():
49
- parser = argparse.ArgumentParser(description="Detectron2 demo for builtin configs")
50
- parser.add_argument(
51
- "--config-file",
52
- default="model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml",
53
- metavar="FILE",
54
- help="path to config file",
55
- )
56
- parser.add_argument("--webcam", action="store_true", help="Take inputs from webcam.")
57
- parser.add_argument("--video-input", help="Path to video file.")
58
- parser.add_argument(
59
- "--input",
60
- nargs="+",
61
- help="A list of space separated input images; "
62
- "or a single glob pattern such as 'directory/*.jpg'",
63
- )
64
- parser.add_argument(
65
- "--output",
66
- help="A file or directory to save output visualizations. "
67
- "If not given, will show output in an OpenCV window.",
68
- )
69
-
70
- parser.add_argument(
71
- "--confidence-threshold",
72
- type=float,
73
- default=0.35,
74
- help="Minimum score for instance predictions to be shown",
75
- )
76
- parser.add_argument(
77
- "--opts",
78
- help="Modify config options using the command-line 'KEY VALUE' pairs",
79
- default=[],
80
- nargs=argparse.REMAINDER,
81
- )
82
- return parser
83
-
84
-
85
- def test_opencv_video_format(codec, file_ext):
86
- with tempfile.TemporaryDirectory(prefix="video_format_test") as dir:
87
- filename = os.path.join(dir, "test_file" + file_ext)
88
- writer = cv2.VideoWriter(
89
- filename=filename,
90
- fourcc=cv2.VideoWriter_fourcc(*codec),
91
- fps=float(30),
92
- frameSize=(10, 10),
93
- isColor=True,
94
- )
95
- [writer.write(np.zeros((10, 10, 3), np.uint8)) for _ in range(30)]
96
- writer.release()
97
- if os.path.isfile(filename):
98
- return True
99
- return False
100
-
101
-
102
- if __name__ == "__main__":
103
- mp.set_start_method("spawn", force=True)
104
- args = get_parser().parse_args()
105
- setup_logger(name="fvcore")
106
- logger = setup_logger()
107
- logger.info("Arguments: " + str(args))
108
-
109
- cfg = setup_cfg(args)
110
-
111
- demo = VisualizationDemo(cfg)
112
-
113
- if args.input:
114
- if len(args.input) == 1:
115
- args.input = glob.glob(os.path.expanduser(args.input[0]))
116
- assert args.input, "The input path(s) was not found"
117
- for path in tqdm.tqdm(args.input, disable=not args.output):
118
- # use PIL, to be consistent with evaluation
119
- img = read_image(path, format="BGR")
120
- start_time = time.time()
121
- predictions, visualized_output = demo.run_on_image(img)
122
- logger.info(
123
- "{}: {} in {:.2f}s".format(
124
- path,
125
- "detected {} instances".format(len(predictions["instances"]))
126
- if "instances" in predictions
127
- else "finished",
128
- time.time() - start_time,
129
- )
130
- )
131
-
132
- if args.output:
133
- if os.path.isdir(args.output):
134
- assert os.path.isdir(args.output), args.output
135
- out_filename = os.path.join(args.output, os.path.basename(path))
136
- else:
137
- assert len(args.input) == 1, "Please specify a directory with args.output"
138
- out_filename = args.output
139
- visualized_output.save(out_filename)
140
- else:
141
- cv2.namedWindow(WINDOW_NAME, cv2.WINDOW_NORMAL)
142
- cv2.imshow(WINDOW_NAME, visualized_output.get_image()[:, :, ::-1])
143
- if cv2.waitKey(0) == 27:
144
- break # esc to quit
145
- elif args.webcam:
146
- assert args.input is None, "Cannot have both --input and --webcam!"
147
- assert args.output is None, "output not yet supported with --webcam!"
148
- cam = cv2.VideoCapture(0)
149
- for vis in tqdm.tqdm(demo.run_on_video(cam)):
150
- cv2.namedWindow(WINDOW_NAME, cv2.WINDOW_NORMAL)
151
- cv2.imshow(WINDOW_NAME, vis)
152
- if cv2.waitKey(1) == 27:
153
- break # esc to quit
154
- cam.release()
155
- cv2.destroyAllWindows()
156
- elif args.video_input:
157
- video = cv2.VideoCapture(args.video_input)
158
- width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
159
- height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
160
- frames_per_second = video.get(cv2.CAP_PROP_FPS)
161
- num_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
162
- basename = os.path.basename(args.video_input)
163
- codec, file_ext = (
164
- ("x264", ".mkv") if test_opencv_video_format("x264", ".mkv") else ("mp4v", ".mp4")
165
- )
166
- if codec == ".mp4v":
167
- warnings.warn("x264 codec not available, switching to mp4v")
168
- if args.output:
169
- if os.path.isdir(args.output):
170
- output_fname = os.path.join(args.output, basename)
171
- output_fname = os.path.splitext(output_fname)[0] + file_ext
172
- else:
173
- output_fname = args.output
174
- assert not os.path.isfile(output_fname), output_fname
175
- output_file = cv2.VideoWriter(
176
- filename=output_fname,
177
- # some installation of opencv may not support x264 (due to its license),
178
- # you can try other format (e.g. MPEG)
179
- fourcc=cv2.VideoWriter_fourcc(*codec),
180
- fps=float(frames_per_second),
181
- frameSize=(width, height),
182
- isColor=True,
183
- )
184
- assert os.path.isfile(args.video_input)
185
- for vis_frame in tqdm.tqdm(demo.run_on_video(video), total=num_frames):
186
- if args.output:
187
- output_file.write(vis_frame)
188
- else:
189
- cv2.namedWindow(basename, cv2.WINDOW_NORMAL)
190
- cv2.imshow(basename, vis_frame)
191
- if cv2.waitKey(1) == 27:
192
- break # esc to quit
193
- video.release()
194
- if args.output:
195
- output_file.release()
196
- else:
197
- cv2.destroyAllWindows()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/demo/predictor.py DELETED
@@ -1,219 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- import atexit
3
- import bisect
4
- import multiprocessing as mp
5
- from collections import deque
6
- import cv2
7
- import torch
8
-
9
- from detectron2.data import MetadataCatalog
10
- import sys
11
- sys.path.append('./')
12
- from engine.defaults import DefaultPredictor
13
- from detectron2.utils.video_visualizer import VideoVisualizer
14
- from detectron2.utils.visualizer import ColorMode, Visualizer
15
-
16
-
17
- class VisualizationDemo(object):
18
- def __init__(self, cfg, instance_mode=ColorMode.IMAGE, parallel=False):
19
- """
20
- Args:
21
- cfg (CfgNode):
22
- instance_mode (ColorMode):
23
- parallel (bool): whether to run the model in different processes from visualization.
24
- Useful since the visualization logic can be slow.
25
- """
26
- self.metadata = MetadataCatalog.get(
27
- cfg.DATASETS.TEST[0] if len(cfg.DATASETS.TEST) else "__unused"
28
- )
29
- self.cpu_device = torch.device("cpu")
30
- self.instance_mode = instance_mode
31
-
32
- self.parallel = parallel
33
- if parallel:
34
- num_gpu = torch.cuda.device_count()
35
- self.predictor = AsyncPredictor(cfg, num_gpus=num_gpu)
36
- else:
37
- self.predictor = DefaultPredictor(cfg)
38
-
39
- def run_on_image(self, image):
40
- """
41
- Args:
42
- image (np.ndarray): an image of shape (H, W, C) (in BGR order).
43
- This is the format used by OpenCV.
44
- Returns:
45
- predictions (dict): the output of the model.
46
- vis_output (VisImage): the visualized image output.
47
- """
48
- vis_output = None
49
- predictions = self.predictor(image)
50
- # Convert image from OpenCV BGR format to Matplotlib RGB format.
51
- image = image[:, :, ::-1]
52
- visualizer = Visualizer(image, self.metadata, instance_mode=self.instance_mode)
53
- if "panoptic_seg" in predictions:
54
- panoptic_seg, segments_info = predictions["panoptic_seg"]
55
- vis_output = visualizer.draw_panoptic_seg_predictions(
56
- panoptic_seg.to(self.cpu_device), segments_info
57
- )
58
- else:
59
- if "sem_seg" in predictions:
60
- vis_output = visualizer.draw_sem_seg(
61
- predictions["sem_seg"].argmax(dim=0).to(self.cpu_device)
62
- )
63
- if "instances" in predictions:
64
- instances = predictions["instances"].to(self.cpu_device)
65
- vis_output = visualizer.draw_instance_predictions(predictions=instances)
66
-
67
- return predictions, vis_output
68
-
69
- def _frame_from_video(self, video):
70
- while video.isOpened():
71
- success, frame = video.read()
72
- if success:
73
- yield frame
74
- else:
75
- break
76
-
77
- def run_on_video(self, video):
78
- """
79
- Visualizes predictions on frames of the input video.
80
- Args:
81
- video (cv2.VideoCapture): a :class:`VideoCapture` object, whose source can be
82
- either a webcam or a video file.
83
- Yields:
84
- ndarray: BGR visualizations of each video frame.
85
- """
86
- video_visualizer = VideoVisualizer(self.metadata, self.instance_mode)
87
-
88
- def process_predictions(frame, predictions):
89
- frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
90
- if "panoptic_seg" in predictions:
91
- panoptic_seg, segments_info = predictions["panoptic_seg"]
92
- vis_frame = video_visualizer.draw_panoptic_seg_predictions(
93
- frame, panoptic_seg.to(self.cpu_device), segments_info
94
- )
95
- elif "instances" in predictions:
96
- predictions = predictions["instances"].to(self.cpu_device)
97
- vis_frame = video_visualizer.draw_instance_predictions(frame, predictions)
98
- elif "sem_seg" in predictions:
99
- vis_frame = video_visualizer.draw_sem_seg(
100
- frame, predictions["sem_seg"].argmax(dim=0).to(self.cpu_device)
101
- )
102
-
103
- # Converts Matplotlib RGB format to OpenCV BGR format
104
- vis_frame = cv2.cvtColor(vis_frame.get_image(), cv2.COLOR_RGB2BGR)
105
- return vis_frame
106
-
107
- frame_gen = self._frame_from_video(video)
108
- if self.parallel:
109
- buffer_size = self.predictor.default_buffer_size
110
-
111
- frame_data = deque()
112
-
113
- for cnt, frame in enumerate(frame_gen):
114
- frame_data.append(frame)
115
- self.predictor.put(frame)
116
-
117
- if cnt >= buffer_size:
118
- frame = frame_data.popleft()
119
- predictions = self.predictor.get()
120
- yield process_predictions(frame, predictions)
121
-
122
- while len(frame_data):
123
- frame = frame_data.popleft()
124
- predictions = self.predictor.get()
125
- yield process_predictions(frame, predictions)
126
- else:
127
- for frame in frame_gen:
128
- yield process_predictions(frame, self.predictor(frame))
129
-
130
-
131
- class AsyncPredictor:
132
- """
133
- A predictor that runs the model asynchronously, possibly on >1 GPUs.
134
- Because rendering the visualization takes considerably amount of time,
135
- this helps improve throughput a little bit when rendering videos.
136
- """
137
-
138
- class _StopToken:
139
- pass
140
-
141
- class _PredictWorker(mp.Process):
142
- def __init__(self, cfg, task_queue, result_queue):
143
- self.cfg = cfg
144
- self.task_queue = task_queue
145
- self.result_queue = result_queue
146
- super().__init__()
147
-
148
- def run(self):
149
- predictor = DefaultPredictor(self.cfg)
150
-
151
- while True:
152
- task = self.task_queue.get()
153
- if isinstance(task, AsyncPredictor._StopToken):
154
- break
155
- idx, data = task
156
- result = predictor(data)
157
- self.result_queue.put((idx, result))
158
-
159
- def __init__(self, cfg, num_gpus: int = 1):
160
- """
161
- Args:
162
- cfg (CfgNode):
163
- num_gpus (int): if 0, will run on CPU
164
- """
165
- num_workers = max(num_gpus, 1)
166
- self.task_queue = mp.Queue(maxsize=num_workers * 3)
167
- self.result_queue = mp.Queue(maxsize=num_workers * 3)
168
- self.procs = []
169
- for gpuid in range(max(num_gpus, 1)):
170
- cfg = cfg.clone()
171
- cfg.defrost()
172
- cfg.MODEL.DEVICE = "cuda:{}".format(gpuid) if num_gpus > 0 else "cpu"
173
- self.procs.append(
174
- AsyncPredictor._PredictWorker(cfg, self.task_queue, self.result_queue)
175
- )
176
-
177
- self.put_idx = 0
178
- self.get_idx = 0
179
- self.result_rank = []
180
- self.result_data = []
181
-
182
- for p in self.procs:
183
- p.start()
184
- atexit.register(self.shutdown)
185
-
186
- def put(self, image):
187
- self.put_idx += 1
188
- self.task_queue.put((self.put_idx, image))
189
-
190
- def get(self):
191
- self.get_idx += 1 # the index needed for this request
192
- if len(self.result_rank) and self.result_rank[0] == self.get_idx:
193
- res = self.result_data[0]
194
- del self.result_data[0], self.result_rank[0]
195
- return res
196
-
197
- while True:
198
- # make sure the results are returned in the correct order
199
- idx, res = self.result_queue.get()
200
- if idx == self.get_idx:
201
- return res
202
- insert = bisect.bisect(self.result_rank, idx)
203
- self.result_rank.insert(insert, idx)
204
- self.result_data.insert(insert, res)
205
-
206
- def __len__(self):
207
- return self.put_idx - self.get_idx
208
-
209
- def __call__(self, image):
210
- self.put(image)
211
- return self.get()
212
-
213
- def shutdown(self):
214
- for _ in self.procs:
215
- self.task_queue.put(AsyncPredictor._StopToken())
216
-
217
- @property
218
- def default_buffer_size(self):
219
- return len(self.procs) * 5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/engine/__init__.py DELETED
@@ -1,7 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
-
3
- from .train_loop import *
4
-
5
- __all__ = [k for k in globals().keys() if not k.startswith("_")]
6
-
7
- from .defaults import *
 
 
 
 
 
 
 
 
cutler/engine/defaults.py DELETED
@@ -1,726 +0,0 @@
1
- # -*- coding: utf-8 -*-
2
- # Copyright (c) Meta Platforms, Inc. and affiliates.
3
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/engine/defaults.py
4
-
5
- """
6
- This file contains components with some default boilerplate logic user may need
7
- in training / testing. They will not work for everyone, but many users may find them useful.
8
-
9
- The behavior of functions/classes in this file is subject to change,
10
- since they are meant to represent the "common default behavior" people need in their projects.
11
- """
12
-
13
- import argparse
14
- import logging
15
- import os
16
- import sys
17
- import weakref
18
- from collections import OrderedDict
19
- from typing import Optional
20
- import torch
21
- from fvcore.nn.precise_bn import get_bn_modules
22
- from omegaconf import OmegaConf
23
- from torch.nn.parallel import DistributedDataParallel
24
-
25
- import data.transforms as T
26
- from detectron2.checkpoint import DetectionCheckpointer
27
- from detectron2.config import CfgNode, LazyConfig
28
- from detectron2.data import (
29
- MetadataCatalog,
30
- )
31
- from data import (
32
- build_detection_test_loader,
33
- build_detection_train_loader,
34
- )
35
- from detectron2.evaluation import (
36
- DatasetEvaluator,
37
- inference_on_dataset,
38
- print_csv_format,
39
- verify_results,
40
- )
41
- from modeling import build_model
42
- from solver import build_lr_scheduler, build_optimizer
43
- from detectron2.utils import comm
44
- from detectron2.utils.collect_env import collect_env_info
45
- from detectron2.utils.env import seed_all_rng
46
- from detectron2.utils.events import CommonMetricPrinter, JSONWriter, TensorboardXWriter
47
- from detectron2.utils.file_io import PathManager
48
- from detectron2.utils.logger import setup_logger
49
-
50
- from detectron2.engine import hooks
51
- from detectron2.engine import TrainerBase
52
- from .train_loop import CustomAMPTrainer, CustomSimpleTrainer
53
-
54
- __all__ = [
55
- "create_ddp_model",
56
- "default_argument_parser",
57
- "default_setup",
58
- "default_writers",
59
- "DefaultPredictor",
60
- "DefaultTrainer",
61
- ]
62
-
63
-
64
- def create_ddp_model(model, *, fp16_compression=False, **kwargs):
65
- """
66
- Create a DistributedDataParallel model if there are >1 processes.
67
-
68
- Args:
69
- model: a torch.nn.Module
70
- fp16_compression: add fp16 compression hooks to the ddp object.
71
- See more at https://pytorch.org/docs/stable/ddp_comm_hooks.html#torch.distributed.algorithms.ddp_comm_hooks.default_hooks.fp16_compress_hook
72
- kwargs: other arguments of :module:`torch.nn.parallel.DistributedDataParallel`.
73
- """ # noqa
74
- if comm.get_world_size() == 1:
75
- return model
76
- if "device_ids" not in kwargs:
77
- kwargs["device_ids"] = [comm.get_local_rank()]
78
- ddp = DistributedDataParallel(model, **kwargs)
79
- if fp16_compression:
80
- from torch.distributed.algorithms.ddp_comm_hooks import default as comm_hooks
81
-
82
- ddp.register_comm_hook(state=None, hook=comm_hooks.fp16_compress_hook)
83
- return ddp
84
-
85
-
86
- def default_argument_parser(epilog=None):
87
- """
88
- Create a parser with some common arguments used by detectron2 users.
89
-
90
- Args:
91
- epilog (str): epilog passed to ArgumentParser describing the usage.
92
-
93
- Returns:
94
- argparse.ArgumentParser:
95
- """
96
- parser = argparse.ArgumentParser(
97
- epilog=epilog
98
- or f"""
99
- Examples:
100
-
101
- Run on single machine:
102
- $ {sys.argv[0]} --num-gpus 8 --config-file cfg.yaml
103
-
104
- Change some config options:
105
- $ {sys.argv[0]} --config-file cfg.yaml MODEL.WEIGHTS /path/to/weight.pth SOLVER.BASE_LR 0.001
106
-
107
- Run on multiple machines:
108
- (machine0)$ {sys.argv[0]} --machine-rank 0 --num-machines 2 --dist-url <URL> [--other-flags]
109
- (machine1)$ {sys.argv[0]} --machine-rank 1 --num-machines 2 --dist-url <URL> [--other-flags]
110
- """,
111
- formatter_class=argparse.RawDescriptionHelpFormatter,
112
- )
113
- parser.add_argument("--config-file", default="", metavar="FILE", help="path to config file")
114
- parser.add_argument(
115
- "--resume",
116
- action="store_true",
117
- help="Whether to attempt to resume from the checkpoint directory. "
118
- "See documentation of `DefaultTrainer.resume_or_load()` for what it means.",
119
- )
120
- parser.add_argument("--eval-only", action="store_true", help="perform evaluation only")
121
- parser.add_argument("--num-gpus", type=int, default=1, help="number of gpus *per machine*")
122
- parser.add_argument("--num-machines", type=int, default=1, help="total number of machines")
123
- parser.add_argument(
124
- "--machine-rank", type=int, default=0, help="the rank of this machine (unique per machine)"
125
- )
126
- parser.add_argument(
127
- "--test-dataset", type=str, default="", help="the dataset used for evaluation"
128
- )
129
- parser.add_argument(
130
- "--train-dataset", type=str, default="", help="the dataset used for training"
131
- )
132
- parser.add_argument("--no-segm", action="store_true", help="perform evaluation on detection only")
133
- # PyTorch still may leave orphan processes in multi-gpu training.
134
- # Therefore we use a deterministic way to obtain port,
135
- # so that users are aware of orphan processes by seeing the port occupied.
136
- port = 2**15 + 2**14 + hash(os.getuid() if sys.platform != "win32" else 1) % 2**14
137
- parser.add_argument(
138
- "--dist-url",
139
- default="tcp://127.0.0.1:{}".format(port),
140
- help="initialization URL for pytorch distributed backend. See "
141
- "https://pytorch.org/docs/stable/distributed.html for details.",
142
- )
143
- parser.add_argument(
144
- "opts",
145
- help="""
146
- Modify config options at the end of the command. For Yacs configs, use
147
- space-separated "PATH.KEY VALUE" pairs.
148
- For python-based LazyConfig, use "path.key=value".
149
- """.strip(),
150
- default=None,
151
- nargs=argparse.REMAINDER,
152
- )
153
- return parser
154
-
155
-
156
- def _try_get_key(cfg, *keys, default=None):
157
- """
158
- Try select keys from cfg until the first key that exists. Otherwise return default.
159
- """
160
- if isinstance(cfg, CfgNode):
161
- cfg = OmegaConf.create(cfg.dump())
162
- for k in keys:
163
- none = object()
164
- p = OmegaConf.select(cfg, k, default=none)
165
- if p is not none:
166
- return p
167
- return default
168
-
169
-
170
- def _highlight(code, filename):
171
- try:
172
- import pygments
173
- except ImportError:
174
- return code
175
-
176
- from pygments.lexers import Python3Lexer, YamlLexer
177
- from pygments.formatters import Terminal256Formatter
178
-
179
- lexer = Python3Lexer() if filename.endswith(".py") else YamlLexer()
180
- code = pygments.highlight(code, lexer, Terminal256Formatter(style="monokai"))
181
- return code
182
-
183
-
184
- def default_setup(cfg, args):
185
- """
186
- Perform some basic common setups at the beginning of a job, including:
187
-
188
- 1. Set up the detectron2 logger
189
- 2. Log basic information about environment, cmdline arguments, and config
190
- 3. Backup the config to the output directory
191
-
192
- Args:
193
- cfg (CfgNode or omegaconf.DictConfig): the full config to be used
194
- args (argparse.NameSpace): the command line arguments to be logged
195
- """
196
- output_dir = _try_get_key(cfg, "OUTPUT_DIR", "output_dir", "train.output_dir")
197
- if comm.is_main_process() and output_dir:
198
- PathManager.mkdirs(output_dir)
199
-
200
- rank = comm.get_rank()
201
- setup_logger(output_dir, distributed_rank=rank, name="fvcore")
202
- logger = setup_logger(output_dir, distributed_rank=rank)
203
-
204
- logger.info("Rank of current process: {}. World size: {}".format(rank, comm.get_world_size()))
205
- logger.info("Environment info:\n" + collect_env_info())
206
-
207
- logger.info("Command line arguments: " + str(args))
208
- if hasattr(args, "config_file") and args.config_file != "":
209
- logger.info(
210
- "Contents of args.config_file={}:\n{}".format(
211
- args.config_file,
212
- _highlight(PathManager.open(args.config_file, "r").read(), args.config_file),
213
- )
214
- )
215
-
216
- if comm.is_main_process() and output_dir:
217
- # Note: some of our scripts may expect the existence of
218
- # config.yaml in output directory
219
- path = os.path.join(output_dir, "config.yaml")
220
- if isinstance(cfg, CfgNode):
221
- logger.info("Running with full config:\n{}".format(_highlight(cfg.dump(), ".yaml")))
222
- with PathManager.open(path, "w") as f:
223
- f.write(cfg.dump())
224
- else:
225
- LazyConfig.save(cfg, path)
226
- logger.info("Full config saved to {}".format(path))
227
-
228
- # make sure each worker has a different, yet deterministic seed if specified
229
- seed = _try_get_key(cfg, "SEED", "train.seed", default=-1)
230
- seed_all_rng(None if seed < 0 else seed + rank)
231
-
232
- # cudnn benchmark has large overhead. It shouldn't be used considering the small size of
233
- # typical validation set.
234
- if not (hasattr(args, "eval_only") and args.eval_only):
235
- torch.backends.cudnn.benchmark = _try_get_key(
236
- cfg, "CUDNN_BENCHMARK", "train.cudnn_benchmark", default=False
237
- )
238
-
239
-
240
- def default_writers(output_dir: str, max_iter: Optional[int] = None):
241
- """
242
- Build a list of :class:`EventWriter` to be used.
243
- It now consists of a :class:`CommonMetricPrinter`,
244
- :class:`TensorboardXWriter` and :class:`JSONWriter`.
245
-
246
- Args:
247
- output_dir: directory to store JSON metrics and tensorboard events
248
- max_iter: the total number of iterations
249
-
250
- Returns:
251
- list[EventWriter]: a list of :class:`EventWriter` objects.
252
- """
253
- PathManager.mkdirs(output_dir)
254
- return [
255
- # It may not always print what you want to see, since it prints "common" metrics only.
256
- CommonMetricPrinter(max_iter),
257
- JSONWriter(os.path.join(output_dir, "metrics.json")),
258
- TensorboardXWriter(output_dir),
259
- ]
260
-
261
-
262
- class DefaultPredictor:
263
- """
264
- Create a simple end-to-end predictor with the given config that runs on
265
- single device for a single input image.
266
-
267
- Compared to using the model directly, this class does the following additions:
268
-
269
- 1. Load checkpoint from `cfg.MODEL.WEIGHTS`.
270
- 2. Always take BGR image as the input and apply conversion defined by `cfg.INPUT.FORMAT`.
271
- 3. Apply resizing defined by `cfg.INPUT.{MIN,MAX}_SIZE_TEST`.
272
- 4. Take one input image and produce a single output, instead of a batch.
273
-
274
- This is meant for simple demo purposes, so it does the above steps automatically.
275
- This is not meant for benchmarks or running complicated inference logic.
276
- If you'd like to do anything more complicated, please refer to its source code as
277
- examples to build and use the model manually.
278
-
279
- Attributes:
280
- metadata (Metadata): the metadata of the underlying dataset, obtained from
281
- cfg.DATASETS.TEST.
282
-
283
- Examples:
284
- ::
285
- pred = DefaultPredictor(cfg)
286
- inputs = cv2.imread("input.jpg")
287
- outputs = pred(inputs)
288
- """
289
-
290
- def __init__(self, cfg):
291
- self.cfg = cfg.clone() # cfg can be modified by model
292
- self.model = build_model(self.cfg)
293
- self.model.eval()
294
- if len(cfg.DATASETS.TEST):
295
- self.metadata = MetadataCatalog.get(cfg.DATASETS.TEST[0])
296
-
297
- checkpointer = DetectionCheckpointer(self.model)
298
- checkpointer.load(cfg.MODEL.WEIGHTS)
299
-
300
- self.aug = T.ResizeShortestEdge(
301
- [cfg.INPUT.MIN_SIZE_TEST, cfg.INPUT.MIN_SIZE_TEST], cfg.INPUT.MAX_SIZE_TEST
302
- )
303
-
304
- self.input_format = cfg.INPUT.FORMAT
305
- assert self.input_format in ["RGB", "BGR"], self.input_format
306
-
307
- def __call__(self, original_image):
308
- """
309
- Args:
310
- original_image (np.ndarray): an image of shape (H, W, C) (in BGR order).
311
-
312
- Returns:
313
- predictions (dict):
314
- the output of the model for one image only.
315
- See :doc:`/tutorials/models` for details about the format.
316
- """
317
- with torch.no_grad(): # https://github.com/sphinx-doc/sphinx/issues/4258
318
- # Apply pre-processing to image.
319
- if self.input_format == "RGB":
320
- # whether the model expects BGR inputs or RGB
321
- original_image = original_image[:, :, ::-1]
322
- height, width = original_image.shape[:2]
323
- image = self.aug.get_transform(original_image).apply_image(original_image)
324
- image = torch.as_tensor(image.astype("float32").transpose(2, 0, 1))
325
-
326
- inputs = {"image": image, "height": height, "width": width}
327
- predictions = self.model([inputs])[0]
328
- return predictions
329
-
330
-
331
- class DefaultTrainer(TrainerBase):
332
- """
333
- A trainer with default training logic. It does the following:
334
-
335
- 1. Create a :class:`SimpleTrainer` using model, optimizer, dataloader
336
- defined by the given config. Create a LR scheduler defined by the config.
337
- 2. Load the last checkpoint or `cfg.MODEL.WEIGHTS`, if exists, when
338
- `resume_or_load` is called.
339
- 3. Register a few common hooks defined by the config.
340
-
341
- It is created to simplify the **standard model training workflow** and reduce code boilerplate
342
- for users who only need the standard training workflow, with standard features.
343
- It means this class makes *many assumptions* about your training logic that
344
- may easily become invalid in a new research. In fact, any assumptions beyond those made in the
345
- :class:`SimpleTrainer` are too much for research.
346
-
347
- The code of this class has been annotated about restrictive assumptions it makes.
348
- When they do not work for you, you're encouraged to:
349
-
350
- 1. Overwrite methods of this class, OR:
351
- 2. Use :class:`SimpleTrainer`, which only does minimal SGD training and
352
- nothing else. You can then add your own hooks if needed. OR:
353
- 3. Write your own training loop similar to `tools/plain_train_net.py`.
354
-
355
- See the :doc:`/tutorials/training` tutorials for more details.
356
-
357
- Note that the behavior of this class, like other functions/classes in
358
- this file, is not stable, since it is meant to represent the "common default behavior".
359
- It is only guaranteed to work well with the standard models and training workflow in detectron2.
360
- To obtain more stable behavior, write your own training logic with other public APIs.
361
-
362
- Examples:
363
- ::
364
- trainer = DefaultTrainer(cfg)
365
- trainer.resume_or_load() # load last checkpoint or MODEL.WEIGHTS
366
- trainer.train()
367
-
368
- Attributes:
369
- scheduler:
370
- checkpointer (DetectionCheckpointer):
371
- cfg (CfgNode):
372
- """
373
-
374
- def __init__(self, cfg):
375
- """
376
- Args:
377
- cfg (CfgNode):
378
- """
379
- super().__init__()
380
- logger = logging.getLogger("detectron2")
381
- if not logger.isEnabledFor(logging.INFO): # setup_logger is not called for d2
382
- setup_logger()
383
- cfg = DefaultTrainer.auto_scale_workers(cfg, comm.get_world_size())
384
-
385
- # Assume these objects must be constructed in this order.
386
- model = self.build_model(cfg)
387
- optimizer = self.build_optimizer(cfg, model)
388
- data_loader = self.build_train_loader(cfg)
389
-
390
- model = create_ddp_model(model, broadcast_buffers=False)
391
- if cfg.SOLVER.AMP.ENABLED:
392
- self._trainer = CustomAMPTrainer(model, data_loader, optimizer, cfg=cfg)
393
- else:
394
- self._trainer = CustomSimpleTrainer(model, data_loader, optimizer, cfg=cfg)
395
-
396
- self.scheduler = self.build_lr_scheduler(cfg, optimizer)
397
- self.checkpointer = DetectionCheckpointer(
398
- # Assume you want to save checkpoints together with logs/statistics
399
- model,
400
- cfg.OUTPUT_DIR,
401
- trainer=weakref.proxy(self),
402
- )
403
- self.start_iter = 0
404
- self.max_iter = cfg.SOLVER.MAX_ITER
405
- self.cfg = cfg
406
-
407
- self.register_hooks(self.build_hooks())
408
-
409
- def resume_or_load(self, resume=True):
410
- """
411
- If `resume==True` and `cfg.OUTPUT_DIR` contains the last checkpoint (defined by
412
- a `last_checkpoint` file), resume from the file. Resuming means loading all
413
- available states (eg. optimizer and scheduler) and update iteration counter
414
- from the checkpoint. ``cfg.MODEL.WEIGHTS`` will not be used.
415
-
416
- Otherwise, this is considered as an independent training. The method will load model
417
- weights from the file `cfg.MODEL.WEIGHTS` (but will not load other states) and start
418
- from iteration 0.
419
-
420
- Args:
421
- resume (bool): whether to do resume or not
422
- """
423
- self.checkpointer.resume_or_load(self.cfg.MODEL.WEIGHTS, resume=resume)
424
- if resume and self.checkpointer.has_checkpoint():
425
- # The checkpoint stores the training iteration that just finished, thus we start
426
- # at the next iteration
427
- self.start_iter = self.iter + 1
428
-
429
- def build_hooks(self):
430
- """
431
- Build a list of default hooks, including timing, evaluation,
432
- checkpointing, lr scheduling, precise BN, writing events.
433
-
434
- Returns:
435
- list[HookBase]:
436
- """
437
- cfg = self.cfg.clone()
438
- cfg.defrost()
439
- cfg.DATALOADER.NUM_WORKERS = 0 # save some memory and time for PreciseBN
440
-
441
- ret = [
442
- hooks.IterationTimer(),
443
- hooks.LRScheduler(),
444
- hooks.PreciseBN(
445
- # Run at the same freq as (but before) evaluation.
446
- cfg.TEST.EVAL_PERIOD,
447
- self.model,
448
- # Build a new data loader to not affect training
449
- self.build_train_loader(cfg),
450
- cfg.TEST.PRECISE_BN.NUM_ITER,
451
- )
452
- if cfg.TEST.PRECISE_BN.ENABLED and get_bn_modules(self.model)
453
- else None,
454
- ]
455
-
456
- # Do PreciseBN before checkpointer, because it updates the model and need to
457
- # be saved by checkpointer.
458
- # This is not always the best: if checkpointing has a different frequency,
459
- # some checkpoints may have more precise statistics than others.
460
- if comm.is_main_process():
461
- ret.append(hooks.PeriodicCheckpointer(self.checkpointer, cfg.SOLVER.CHECKPOINT_PERIOD))
462
-
463
- def test_and_save_results():
464
- self._last_eval_results = self.test(self.cfg, self.model)
465
- return self._last_eval_results
466
-
467
- # Do evaluation after checkpointer, because then if it fails,
468
- # we can use the saved checkpoint to debug.
469
- ret.append(hooks.EvalHook(cfg.TEST.EVAL_PERIOD, test_and_save_results))
470
-
471
- if comm.is_main_process():
472
- # Here the default print/log frequency of each writer is used.
473
- # run writers in the end, so that evaluation metrics are written
474
- ret.append(hooks.PeriodicWriter(self.build_writers(), period=20))
475
- return ret
476
-
477
- def build_writers(self):
478
- """
479
- Build a list of writers to be used using :func:`default_writers()`.
480
- If you'd like a different list of writers, you can overwrite it in
481
- your trainer.
482
-
483
- Returns:
484
- list[EventWriter]: a list of :class:`EventWriter` objects.
485
- """
486
- return default_writers(self.cfg.OUTPUT_DIR, self.max_iter)
487
-
488
- def train(self):
489
- """
490
- Run training.
491
-
492
- Returns:
493
- OrderedDict of results, if evaluation is enabled. Otherwise None.
494
- """
495
- super().train(self.start_iter, self.max_iter)
496
- if len(self.cfg.TEST.EXPECTED_RESULTS) and comm.is_main_process():
497
- assert hasattr(
498
- self, "_last_eval_results"
499
- ), "No evaluation results obtained during training!"
500
- verify_results(self.cfg, self._last_eval_results)
501
- return self._last_eval_results
502
-
503
- def run_step(self):
504
- self._trainer.iter = self.iter
505
- self._trainer.run_step()
506
-
507
- def state_dict(self):
508
- ret = super().state_dict()
509
- ret["_trainer"] = self._trainer.state_dict()
510
- return ret
511
-
512
- def load_state_dict(self, state_dict):
513
- super().load_state_dict(state_dict)
514
- self._trainer.load_state_dict(state_dict["_trainer"])
515
-
516
- @classmethod
517
- def build_model(cls, cfg):
518
- """
519
- Returns:
520
- torch.nn.Module:
521
-
522
- It now calls :func:`detectron2.modeling.build_model`.
523
- Overwrite it if you'd like a different model.
524
- """
525
- model = build_model(cfg)
526
- logger = logging.getLogger(__name__)
527
- logger.info("Model:\n{}".format(model))
528
- return model
529
-
530
- @classmethod
531
- def build_optimizer(cls, cfg, model):
532
- """
533
- Returns:
534
- torch.optim.Optimizer:
535
-
536
- It now calls :func:`detectron2.solver.build_optimizer`.
537
- Overwrite it if you'd like a different optimizer.
538
- """
539
- return build_optimizer(cfg, model)
540
-
541
- @classmethod
542
- def build_lr_scheduler(cls, cfg, optimizer):
543
- """
544
- It now calls :func:`detectron2.solver.build_lr_scheduler`.
545
- Overwrite it if you'd like a different scheduler.
546
- """
547
- return build_lr_scheduler(cfg, optimizer)
548
-
549
- @classmethod
550
- def build_train_loader(cls, cfg):
551
- """
552
- Returns:
553
- iterable
554
-
555
- It now calls :func:`detectron2.data.build_detection_train_loader`.
556
- Overwrite it if you'd like a different data loader.
557
- """
558
- return build_detection_train_loader(cfg)
559
-
560
- @classmethod
561
- def build_test_loader(cls, cfg, dataset_name):
562
- """
563
- Returns:
564
- iterable
565
-
566
- It now calls :func:`detectron2.data.build_detection_test_loader`.
567
- Overwrite it if you'd like a different data loader.
568
- """
569
- return build_detection_test_loader(cfg, dataset_name)
570
-
571
- @classmethod
572
- def build_evaluator(cls, cfg, dataset_name):
573
- """
574
- Returns:
575
- DatasetEvaluator or None
576
-
577
- It is not implemented by default.
578
- """
579
- raise NotImplementedError(
580
- """
581
- If you want DefaultTrainer to automatically run evaluation,
582
- please implement `build_evaluator()` in subclasses (see train_net.py for example).
583
- Alternatively, you can call evaluation functions yourself (see Colab balloon tutorial for example).
584
- """
585
- )
586
-
587
- @classmethod
588
- def test(cls, cfg, model, evaluators=None):
589
- """
590
- Evaluate the given model. The given model is expected to already contain
591
- weights to evaluate.
592
-
593
- Args:
594
- cfg (CfgNode):
595
- model (nn.Module):
596
- evaluators (list[DatasetEvaluator] or None): if None, will call
597
- :meth:`build_evaluator`. Otherwise, must have the same length as
598
- ``cfg.DATASETS.TEST``.
599
-
600
- Returns:
601
- dict: a dict of result metrics
602
- """
603
- logger = logging.getLogger(__name__)
604
- if isinstance(evaluators, DatasetEvaluator):
605
- evaluators = [evaluators]
606
- if evaluators is not None:
607
- assert len(cfg.DATASETS.TEST) == len(evaluators), "{} != {}".format(
608
- len(cfg.DATASETS.TEST), len(evaluators)
609
- )
610
-
611
- results = OrderedDict()
612
- for idx, dataset_name in enumerate(cfg.DATASETS.TEST):
613
- data_loader = cls.build_test_loader(cfg, dataset_name)
614
- # When evaluators are passed in as arguments,
615
- # implicitly assume that evaluators can be created before data_loader.
616
- if evaluators is not None:
617
- evaluator = evaluators[idx]
618
- else:
619
- try:
620
- evaluator = cls.build_evaluator(cfg, dataset_name)
621
- except NotImplementedError:
622
- logger.warn(
623
- "No evaluator found. Use `DefaultTrainer.test(evaluators=)`, "
624
- "or implement its `build_evaluator` method."
625
- )
626
- results[dataset_name] = {}
627
- continue
628
- results_i = inference_on_dataset(model, data_loader, evaluator)
629
- results[dataset_name] = results_i
630
- if comm.is_main_process():
631
- assert isinstance(
632
- results_i, dict
633
- ), "Evaluator must return a dict on the main process. Got {} instead.".format(
634
- results_i
635
- )
636
- logger.info("Evaluation results for {} in csv format:".format(dataset_name))
637
- print_csv_format(results_i)
638
-
639
- if len(results) == 1:
640
- results = list(results.values())[0]
641
- return results
642
-
643
- @staticmethod
644
- def auto_scale_workers(cfg, num_workers: int):
645
- """
646
- When the config is defined for certain number of workers (according to
647
- ``cfg.SOLVER.REFERENCE_WORLD_SIZE``) that's different from the number of
648
- workers currently in use, returns a new cfg where the total batch size
649
- is scaled so that the per-GPU batch size stays the same as the
650
- original ``IMS_PER_BATCH // REFERENCE_WORLD_SIZE``.
651
-
652
- Other config options are also scaled accordingly:
653
- * training steps and warmup steps are scaled inverse proportionally.
654
- * learning rate are scaled proportionally, following :paper:`ImageNet in 1h`.
655
-
656
- For example, with the original config like the following:
657
-
658
- .. code-block:: yaml
659
-
660
- IMS_PER_BATCH: 16
661
- BASE_LR: 0.1
662
- REFERENCE_WORLD_SIZE: 8
663
- MAX_ITER: 5000
664
- STEPS: (4000,)
665
- CHECKPOINT_PERIOD: 1000
666
-
667
- When this config is used on 16 GPUs instead of the reference number 8,
668
- calling this method will return a new config with:
669
-
670
- .. code-block:: yaml
671
-
672
- IMS_PER_BATCH: 32
673
- BASE_LR: 0.2
674
- REFERENCE_WORLD_SIZE: 16
675
- MAX_ITER: 2500
676
- STEPS: (2000,)
677
- CHECKPOINT_PERIOD: 500
678
-
679
- Note that both the original config and this new config can be trained on 16 GPUs.
680
- It's up to user whether to enable this feature (by setting ``REFERENCE_WORLD_SIZE``).
681
-
682
- Returns:
683
- CfgNode: a new config. Same as original if ``cfg.SOLVER.REFERENCE_WORLD_SIZE==0``.
684
- """
685
- old_world_size = cfg.SOLVER.REFERENCE_WORLD_SIZE
686
- if old_world_size == 0 or old_world_size == num_workers:
687
- return cfg
688
- cfg = cfg.clone()
689
- frozen = cfg.is_frozen()
690
- cfg.defrost()
691
-
692
- assert (
693
- cfg.SOLVER.IMS_PER_BATCH % old_world_size == 0
694
- ), "Invalid REFERENCE_WORLD_SIZE in config!"
695
- scale = num_workers / old_world_size
696
- bs = cfg.SOLVER.IMS_PER_BATCH = int(round(cfg.SOLVER.IMS_PER_BATCH * scale))
697
- lr = cfg.SOLVER.BASE_LR = cfg.SOLVER.BASE_LR * scale
698
- max_iter = cfg.SOLVER.MAX_ITER = int(round(cfg.SOLVER.MAX_ITER / scale))
699
- warmup_iter = cfg.SOLVER.WARMUP_ITERS = int(round(cfg.SOLVER.WARMUP_ITERS / scale))
700
- cfg.SOLVER.STEPS = tuple(int(round(s / scale)) for s in cfg.SOLVER.STEPS)
701
- cfg.TEST.EVAL_PERIOD = int(round(cfg.TEST.EVAL_PERIOD / scale))
702
- cfg.SOLVER.CHECKPOINT_PERIOD = int(round(cfg.SOLVER.CHECKPOINT_PERIOD / scale))
703
- cfg.SOLVER.REFERENCE_WORLD_SIZE = num_workers # maintain invariant
704
- logger = logging.getLogger(__name__)
705
- logger.info(
706
- f"Auto-scaling the config to batch_size={bs}, learning_rate={lr}, "
707
- f"max_iter={max_iter}, warmup={warmup_iter}."
708
- )
709
-
710
- if frozen:
711
- cfg.freeze()
712
- return cfg
713
-
714
-
715
- # Access basic attributes from the underlying trainer
716
- for _attr in ["model", "data_loader", "optimizer"]:
717
- setattr(
718
- DefaultTrainer,
719
- _attr,
720
- property(
721
- # getter
722
- lambda self, x=_attr: getattr(self._trainer, x),
723
- # setter
724
- lambda self, value, x=_attr: setattr(self._trainer, x, value),
725
- ),
726
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/engine/train_loop.py DELETED
@@ -1,360 +0,0 @@
1
- # -*- coding: utf-8 -*-
2
- # Copyright (c) Meta Platforms, Inc. and affiliates.
3
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/engine/train_loop.py and https://github.com/NVlabs/FreeSOLO/tree/main/freesolo/engine/trainer.py
4
-
5
- import torch
6
- from torch.nn.parallel import DataParallel, DistributedDataParallel
7
-
8
- import numpy as np
9
- import time
10
- import torch
11
- from torch.nn.parallel import DataParallel, DistributedDataParallel
12
- import copy
13
- import random
14
- import torch.nn.functional as F
15
- from detectron2.structures.instances import Instances
16
- from detectron2.structures import BitMasks
17
-
18
- from detectron2.engine import SimpleTrainer
19
-
20
- __all__ = ["CustomSimpleTrainer", "CustomAMPTrainer"]
21
-
22
- class CustomSimpleTrainer(SimpleTrainer):
23
- """
24
- A simple trainer for the most common type of task:
25
- single-cost single-optimizer single-data-source iterative optimization,
26
- optionally using data-parallelism.
27
- It assumes that every step, you:
28
-
29
- 1. Compute the loss with a data from the data_loader.
30
- 2. Compute the gradients with the above loss.
31
- 3. Update the model with the optimizer.
32
-
33
- All other tasks during training (checkpointing, logging, evaluation, LR schedule)
34
- are maintained by hooks, which can be registered by :meth:`TrainerBase.register_hooks`.
35
-
36
- If you want to do anything fancier than this,
37
- either subclass TrainerBase and implement your own `run_step`,
38
- or write your own training loop.
39
- """
40
-
41
- def __init__(self, model, data_loader, optimizer, cfg=None, use_copy_paste=False,
42
- copy_paste_rate=-1, copy_paste_random_num=None, copy_paste_min_ratio=-1,
43
- copy_paste_max_ratio=-1, visualize_copy_paste=False):
44
- """
45
- Args:
46
- model: a torch Module. Takes a data from data_loader and returns a
47
- dict of losses.
48
- data_loader: an iterable. Contains data to be used to call model.
49
- optimizer: a torch optimizer.
50
- """
51
- super().__init__(model, data_loader, optimizer)
52
-
53
- """
54
- We set the model to training mode in the trainer.
55
- However it's valid to train a model that's in eval mode.
56
- If you want your model (or a submodule of it) to behave
57
- like evaluation during training, you can overwrite its train() method.
58
- """
59
- self.cfg = cfg
60
- # model.train()
61
-
62
- # self.model = model
63
- # self.data_loader = data_loader
64
- # to access the data loader iterator, call `self._data_loader_iter`
65
- # self._data_loader_iter_obj = None
66
- # self.optimizer = optimizer
67
-
68
- self.use_copy_paste = use_copy_paste if self.cfg is None else self.cfg.DATALOADER.COPY_PASTE
69
- self.cfg_COPY_PASTE_RATE = copy_paste_rate if self.cfg is None else self.cfg.DATALOADER.COPY_PASTE_RATE
70
- self.cfg_COPY_PASTE_RANDOM_NUM = copy_paste_random_num if self.cfg is None else self.cfg.DATALOADER.COPY_PASTE_RANDOM_NUM
71
- self.cfg_COPY_PASTE_MIN_RATIO = copy_paste_min_ratio if self.cfg is None else self.cfg.DATALOADER.COPY_PASTE_MIN_RATIO
72
- self.cfg_COPY_PASTE_MAX_RATIO = copy_paste_max_ratio if self.cfg is None else self.cfg.DATALOADER.COPY_PASTE_MAX_RATIO
73
- self.cfg_VISUALIZE_COPY_PASTE = visualize_copy_paste if self.cfg is None else self.cfg.DATALOADER.VISUALIZE_COPY_PASTE
74
-
75
- def IoU(self, mask1, mask2): # only work when the batch size is 1
76
- mask1, mask2 = (mask1>0.5).to(torch.bool), (mask2>0.5).to(torch.bool)
77
- intersection = torch.sum(mask1 * (mask1 == mask2), dim=[-1, -2]).squeeze()
78
- union = torch.sum(mask1 + mask2, dim=[-1, -2]).squeeze()
79
- return (intersection.to(torch.float) / union).mean().view(1, -1)
80
-
81
- def IoY(self, mask1, mask2): # only work when the batch size is 1
82
- # print(mask1.size(), mask2.size())
83
- mask1, mask2 = mask1.squeeze(), mask2.squeeze()
84
- mask1, mask2 = (mask1>0.5).to(torch.bool), (mask2>0.5).to(torch.bool)
85
- intersection = torch.sum(mask1 * (mask1 == mask2), dim=[-1, -2]).squeeze()
86
- union = torch.sum(mask2, dim=[-1, -2]).squeeze()
87
- return (intersection.to(torch.float) / union).mean().view(1, -1)
88
-
89
- def copy_and_paste(self, labeled_data, unlabeled_data):
90
- new_unlabeled_data = []
91
- def mask_iou_matrix(x, y, mode='iou'):
92
- x = x.reshape(x.shape[0], -1).float()
93
- y = y.reshape(y.shape[0], -1).float()
94
- inter_matrix = x @ y.transpose(1, 0) # n1xn2
95
- sum_x = x.sum(1)[:, None].expand(x.shape[0], y.shape[0])
96
- sum_y = y.sum(1)[None, :].expand(x.shape[0], y.shape[0])
97
- if mode == 'ioy':
98
- iou_matrix = inter_matrix / (sum_y) # [1, 1]
99
- else:
100
- iou_matrix = inter_matrix / (sum_x + sum_y - inter_matrix) # [1, 1]
101
- return iou_matrix
102
-
103
- def visualize_data(data, save_path = './sample.jpg'):
104
- from data import detection_utils as utils
105
- from detectron2.data import DatasetCatalog, MetadataCatalog
106
- from detectron2.utils.visualizer import Visualizer
107
- data["instances"] = data["instances"].to(device='cpu')
108
- img = data["image"].permute(1, 2, 0).cpu().detach().numpy()
109
- img = utils.convert_image_to_rgb(img, 'RGB')
110
- metadata = MetadataCatalog.get('imagenet_train_tau0.15')
111
- visualizer = Visualizer(img, metadata=metadata, scale=1.0)
112
- target_fields = data["instances"].get_fields()
113
- labels = [metadata.thing_classes[i] for i in target_fields["gt_classes"]]
114
- vis = visualizer.overlay_instances(
115
- labels=labels,
116
- boxes=target_fields.get("gt_boxes"), # ("gt_boxes", None),
117
- masks=target_fields.get("gt_masks"), # ("gt_masks", None),
118
- keypoints=target_fields.get("gt_keypoints", None),
119
- )
120
- print("Saving to {} ...".format(save_path))
121
- vis.save(save_path)
122
-
123
- for cur_labeled_data, cur_unlabeled_data in zip(labeled_data, unlabeled_data):
124
- cur_labeled_instances = cur_labeled_data["instances"]
125
- cur_labeled_image = cur_labeled_data["image"]
126
- cur_unlabeled_instances = cur_unlabeled_data["instances"]
127
- cur_unlabeled_image = cur_unlabeled_data["image"]
128
-
129
- num_labeled_instances = len(cur_labeled_instances)
130
- copy_paste_rate = random.random()
131
-
132
- if self.cfg_COPY_PASTE_RATE >= copy_paste_rate and num_labeled_instances > 0:
133
- if self.cfg_COPY_PASTE_RANDOM_NUM:
134
- num_copy = 1 if num_labeled_instances == 1 else np.random.randint(1, max(1, num_labeled_instances))
135
- else:
136
- num_copy = num_labeled_instances
137
- else:
138
- num_copy = 0
139
- if num_labeled_instances == 0 or num_copy == 0:
140
- new_unlabeled_data.append(cur_unlabeled_data)
141
- else:
142
- # print("num_labeled_instances, num_copy: ", num_labeled_instances, num_copy)
143
- choice = np.random.choice(num_labeled_instances, num_copy, replace=False)
144
- copied_instances = cur_labeled_instances[choice].to(device=cur_unlabeled_instances.gt_boxes.device)
145
- copied_masks = copied_instances.gt_masks
146
- copied_boxes = copied_instances.gt_boxes
147
- _, labeled_h, labeled_w = cur_labeled_image.shape
148
- _, unlabeled_h, unlabeled_w = cur_unlabeled_image.shape
149
-
150
- # rescale the labeled image to align with unlabeled one.
151
- if isinstance(copied_masks, torch.Tensor):
152
- masks_new = copied_masks[None, ...].float()
153
- else:
154
- masks_new = copied_masks.tensor[None, ...].float()
155
- # resize the masks with a random ratio from 0.5 to 1.0
156
- resize_ratio = random.uniform(self.cfg_COPY_PASTE_MIN_RATIO, self.cfg_COPY_PASTE_MAX_RATIO)
157
- w_new = int(resize_ratio * unlabeled_w)
158
- h_new = int(resize_ratio * unlabeled_h)
159
-
160
- w_shift = random.randint(0, unlabeled_w - w_new)
161
- h_shift = random.randint(0, unlabeled_h - h_new)
162
-
163
- cur_labeled_image_new = F.interpolate(cur_labeled_image[None, ...].float(), size=(h_new, w_new), mode="bilinear", align_corners=False).byte().squeeze(0)
164
- if isinstance(copied_masks, torch.Tensor):
165
- masks_new = F.interpolate(copied_masks[None, ...].float(), size=(h_new, w_new), mode="bilinear", align_corners=False).bool().squeeze(0)
166
- else:
167
- masks_new = F.interpolate(copied_masks.tensor[None, ...].float(), size=(h_new, w_new), mode="bilinear", align_corners=False).bool().squeeze(0)
168
- copied_boxes.scale(1. * unlabeled_w / labeled_w * resize_ratio, 1. * unlabeled_h / labeled_h * resize_ratio)
169
-
170
- if isinstance(cur_unlabeled_instances.gt_masks, torch.Tensor):
171
- _, mask_w, mask_h = cur_unlabeled_instances.gt_masks.size()
172
- else:
173
- _, mask_w, mask_h = cur_unlabeled_instances.gt_masks.tensor.size()
174
- masks_new_all = torch.zeros(num_copy, mask_w, mask_h)
175
- image_new_all = torch.zeros_like(cur_unlabeled_image)
176
-
177
- image_new_all[:, h_shift:h_shift+h_new, w_shift:w_shift+w_new] += cur_labeled_image_new
178
- masks_new_all[:, h_shift:h_shift+h_new, w_shift:w_shift+w_new] += masks_new
179
-
180
- cur_labeled_image = image_new_all.byte() #.squeeze(0)
181
- if isinstance(copied_masks, torch.Tensor):
182
- copied_masks = masks_new_all.bool() #.squeeze(0)
183
- else:
184
- copied_masks.tensor = masks_new_all.bool() #.squeeze(0)
185
- copied_boxes.tensor[:, 0] += h_shift
186
- copied_boxes.tensor[:, 2] += h_shift
187
- copied_boxes.tensor[:, 1] += w_shift
188
- copied_boxes.tensor[:, 3] += w_shift
189
-
190
- copied_instances.gt_masks = copied_masks
191
- copied_instances.gt_boxes = copied_boxes
192
- copied_instances._image_size = (unlabeled_h, unlabeled_w)
193
- if len(cur_unlabeled_instances) == 0:
194
- if isinstance(copied_instances.gt_masks, torch.Tensor):
195
- alpha = copied_instances.gt_masks.sum(0) > 0
196
- else:
197
- alpha = copied_instances.gt_masks.tensor.sum(0) > 0
198
- # merge image
199
- alpha = alpha.cpu()
200
- composited_image = (alpha * cur_labeled_image) + (~alpha * cur_unlabeled_image)
201
- cur_unlabeled_data["image"] = composited_image
202
- cur_unlabeled_data["instances"] = copied_instances
203
- else:
204
- # remove the copied object if iou greater than 0.5
205
- if isinstance(copied_masks, torch.Tensor):
206
- iou_matrix = mask_iou_matrix(copied_masks, cur_unlabeled_instances.gt_masks, mode='ioy') # nxN
207
- else:
208
- iou_matrix = mask_iou_matrix(copied_masks.tensor, cur_unlabeled_instances.gt_masks.tensor, mode='ioy') # nxN
209
-
210
- keep = iou_matrix.max(1)[0] < 0.5
211
- if keep.sum() == 0:
212
- new_unlabeled_data.append(cur_unlabeled_data)
213
- continue
214
- copied_instances = copied_instances[keep]
215
- # update existing instances in unlabeled image
216
- if isinstance(copied_instances.gt_masks, torch.Tensor):
217
- alpha = copied_instances.gt_masks.sum(0) > 0
218
- cur_unlabeled_instances.gt_masks = ~alpha * cur_unlabeled_instances.gt_masks
219
- areas_unlabeled = cur_unlabeled_instances.gt_masks.sum((1,2))
220
- else:
221
- alpha = copied_instances.gt_masks.tensor.sum(0) > 0
222
- cur_unlabeled_instances.gt_masks.tensor = ~alpha * cur_unlabeled_instances.gt_masks.tensor
223
- areas_unlabeled = cur_unlabeled_instances.gt_masks.tensor.sum((1,2))
224
- # merge image
225
- alpha = alpha.cpu()
226
- composited_image = (alpha * cur_labeled_image) + (~alpha * cur_unlabeled_image)
227
- # merge instances
228
- merged_instances = Instances.cat([cur_unlabeled_instances[areas_unlabeled > 0], copied_instances])
229
- # update boxes
230
- if isinstance(merged_instances.gt_masks, torch.Tensor):
231
- merged_instances.gt_boxes = BitMasks(merged_instances.gt_masks).get_bounding_boxes()
232
- # merged_instances.gt_boxes = merged_instances.gt_masks.get_bounding_boxes()
233
- else:
234
- merged_instances.gt_boxes = merged_instances.gt_masks.get_bounding_boxes()
235
-
236
- cur_unlabeled_data["image"] = composited_image
237
- cur_unlabeled_data["instances"] = merged_instances
238
- if self.cfg_VISUALIZE_COPY_PASTE:
239
- visualize_data(cur_unlabeled_data, save_path = 'sample_{}.jpg'.format(np.random.randint(5)))
240
- new_unlabeled_data.append(cur_unlabeled_data)
241
- return new_unlabeled_data
242
-
243
- def run_step(self):
244
- """
245
- Implement the standard training logic described above.
246
- """
247
- assert self.model.training, "[SimpleTrainer] model was changed to eval mode!"
248
- start = time.perf_counter()
249
- """
250
- If you want to do something with the data, you can wrap the dataloader.
251
- """
252
- data = next(self._data_loader_iter)
253
- # print(data, len(data))
254
- if self.use_copy_paste:
255
- # print('using copy paste')
256
- data = self.copy_and_paste(copy.deepcopy(data[::-1]), data)
257
- data_time = time.perf_counter() - start
258
-
259
- """
260
- If you want to do something with the losses, you can wrap the model.
261
- """
262
- loss_dict = self.model(data)
263
- if isinstance(loss_dict, torch.Tensor):
264
- losses = loss_dict
265
- loss_dict = {"total_loss": loss_dict}
266
- else:
267
- losses = sum(loss_dict.values())
268
-
269
- """
270
- If you need to accumulate gradients or do something similar, you can
271
- wrap the optimizer with your custom `zero_grad()` method.
272
- """
273
- if not torch.isnan(losses):
274
- self.optimizer.zero_grad()
275
- losses.backward()
276
- else:
277
- print('Nan loss. Skipped.')
278
-
279
- self._write_metrics(loss_dict, data_time)
280
-
281
- """
282
- If you need gradient clipping/scaling or other processing, you can
283
- wrap the optimizer with your custom `step()` method. But it is
284
- suboptimal as explained in https://arxiv.org/abs/2006.15704 Sec 3.2.4
285
- """
286
- self.optimizer.step()
287
-
288
-
289
- class CustomAMPTrainer(CustomSimpleTrainer):
290
- """
291
- Like :class:`SimpleTrainer`, but uses PyTorch's native automatic mixed precision
292
- in the training loop.
293
- """
294
-
295
- def __init__(self, model, data_loader, optimizer, cfg=None, grad_scaler=None, use_copy_paste=False,
296
- copy_paste_rate=-1, copy_paste_random_num=None, copy_paste_min_ratio=-1,
297
- copy_paste_max_ratio=-1, visualize_copy_paste=False):
298
- """
299
- Args:
300
- model, data_loader, optimizer: same as in :class:`SimpleTrainer`.
301
- grad_scaler: torch GradScaler to automatically scale gradients.
302
- """
303
- unsupported = "AMPTrainer does not support single-process multi-device training!"
304
- if isinstance(model, DistributedDataParallel):
305
- assert not (model.device_ids and len(model.device_ids) > 1), unsupported
306
- assert not isinstance(model, DataParallel), unsupported
307
-
308
- super().__init__(model, data_loader, optimizer, cfg=cfg, use_copy_paste=use_copy_paste, \
309
- copy_paste_rate=copy_paste_rate, copy_paste_random_num=copy_paste_random_num, \
310
- copy_paste_min_ratio=copy_paste_min_ratio, copy_paste_max_ratio=copy_paste_max_ratio, \
311
- visualize_copy_paste=visualize_copy_paste)
312
-
313
- if grad_scaler is None:
314
- from torch.cuda.amp import GradScaler
315
-
316
- grad_scaler = GradScaler()
317
- self.grad_scaler = grad_scaler
318
-
319
- def run_step(self):
320
- """
321
- Implement the AMP training logic.
322
- """
323
- assert self.model.training, "[AMPTrainer] model was changed to eval mode!"
324
- assert torch.cuda.is_available(), "[AMPTrainer] CUDA is required for AMP training!"
325
- from torch.cuda.amp import autocast
326
-
327
- start = time.perf_counter()
328
- data = next(self._data_loader_iter)
329
- if self.use_copy_paste:
330
- # print('using copy paste')
331
- data = self.copy_and_paste(copy.deepcopy(data[::-1]), data)
332
- data_time = time.perf_counter() - start
333
-
334
- with autocast():
335
- loss_dict = self.model(data)
336
- if isinstance(loss_dict, torch.Tensor):
337
- losses = loss_dict
338
- loss_dict = {"total_loss": loss_dict}
339
- else:
340
- losses = sum(loss_dict.values())
341
-
342
- if not torch.isnan(losses):
343
- self.optimizer.zero_grad()
344
- self.grad_scaler.scale(losses).backward()
345
- else:
346
- print('Nan loss.')
347
-
348
- self._write_metrics(loss_dict, data_time)
349
-
350
- self.grad_scaler.step(self.optimizer)
351
- self.grad_scaler.update()
352
-
353
- def state_dict(self):
354
- ret = super().state_dict()
355
- ret["grad_scaler"] = self.grad_scaler.state_dict()
356
- return ret
357
-
358
- def load_state_dict(self, state_dict):
359
- super().load_state_dict(state_dict)
360
- self.grad_scaler.load_state_dict(state_dict["grad_scaler"])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/evaluation/__init__.py DELETED
@@ -1,3 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
-
3
- from .coco_evaluation import COCOEvaluator
 
 
 
 
cutler/evaluation/coco_evaluation.py DELETED
@@ -1,727 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/evaluation/coco_evaluation.py
3
- # supports evaluation of object detection only, although the prediction contains both segmentation and detection results.
4
-
5
- import contextlib
6
- import copy
7
- import io
8
- import itertools
9
- import json
10
- import logging
11
- import numpy as np
12
- import os
13
- import pickle
14
- from collections import OrderedDict
15
- import pycocotools.mask as mask_util
16
- import torch
17
- from pycocotools.coco import COCO
18
- from pycocotools.cocoeval import COCOeval
19
- from tabulate import tabulate
20
-
21
- import detectron2.utils.comm as comm
22
- from detectron2.config import CfgNode
23
- from detectron2.data import MetadataCatalog
24
- from detectron2.data.datasets.coco import convert_to_coco_json
25
- from detectron2.structures import Boxes, BoxMode, pairwise_iou
26
- from detectron2.utils.file_io import PathManager
27
- from detectron2.utils.logger import create_small_table
28
-
29
- from detectron2.evaluation.evaluator import DatasetEvaluator
30
-
31
- try:
32
- from detectron2.evaluation.fast_eval_api import COCOeval_opt
33
- except ImportError:
34
- COCOeval_opt = COCOeval
35
-
36
-
37
- class COCOEvaluator(DatasetEvaluator):
38
- """
39
- Evaluate AR for object proposals, AP for instance detection/segmentation, AP
40
- for keypoint detection outputs using COCO's metrics.
41
- See http://cocodataset.org/#detection-eval and
42
- http://cocodataset.org/#keypoints-eval to understand its metrics.
43
- The metrics range from 0 to 100 (instead of 0 to 1), where a -1 or NaN means
44
- the metric cannot be computed (e.g. due to no predictions made).
45
-
46
- In addition to COCO, this evaluator is able to support any bounding box detection,
47
- instance segmentation, or keypoint detection dataset.
48
- """
49
-
50
- def __init__(
51
- self,
52
- dataset_name,
53
- tasks=None,
54
- distributed=True,
55
- output_dir=None,
56
- *,
57
- max_dets_per_image=None,
58
- use_fast_impl=True,
59
- kpt_oks_sigmas=(),
60
- allow_cached_coco=True,
61
- no_segm=False,
62
- ):
63
- """
64
- Args:
65
- dataset_name (str): name of the dataset to be evaluated.
66
- It must have either the following corresponding metadata:
67
-
68
- "json_file": the path to the COCO format annotation
69
-
70
- Or it must be in detectron2's standard dataset format
71
- so it can be converted to COCO format automatically.
72
- tasks (tuple[str]): tasks that can be evaluated under the given
73
- configuration. A task is one of "bbox", "segm", "keypoints".
74
- By default, will infer this automatically from predictions.
75
- distributed (True): if True, will collect results from all ranks and run evaluation
76
- in the main process.
77
- Otherwise, will only evaluate the results in the current process.
78
- output_dir (str): optional, an output directory to dump all
79
- results predicted on the dataset. The dump contains two files:
80
-
81
- 1. "instances_predictions.pth" a file that can be loaded with `torch.load` and
82
- contains all the results in the format they are produced by the model.
83
- 2. "coco_instances_results.json" a json file in COCO's result format.
84
- max_dets_per_image (int): limit on the maximum number of detections per image.
85
- By default in COCO, this limit is to 100, but this can be customized
86
- to be greater, as is needed in evaluation metrics AP fixed and AP pool
87
- (see https://arxiv.org/pdf/2102.01066.pdf)
88
- This doesn't affect keypoint evaluation.
89
- use_fast_impl (bool): use a fast but **unofficial** implementation to compute AP.
90
- Although the results should be very close to the official implementation in COCO
91
- API, it is still recommended to compute results with the official API for use in
92
- papers. The faster implementation also uses more RAM.
93
- kpt_oks_sigmas (list[float]): The sigmas used to calculate keypoint OKS.
94
- See http://cocodataset.org/#keypoints-eval
95
- When empty, it will use the defaults in COCO.
96
- Otherwise it should be the same length as ROI_KEYPOINT_HEAD.NUM_KEYPOINTS.
97
- allow_cached_coco (bool): Whether to use cached coco json from previous validation
98
- runs. You should set this to False if you need to use different validation data.
99
- Defaults to True.
100
- """
101
- self._logger = logging.getLogger(__name__)
102
- self._distributed = distributed
103
- self._output_dir = output_dir
104
- self.no_segm = no_segm
105
-
106
- if use_fast_impl and (COCOeval_opt is COCOeval):
107
- self._logger.info("Fast COCO eval is not built. Falling back to official COCO eval.")
108
- use_fast_impl = False
109
- self._use_fast_impl = use_fast_impl
110
-
111
- # COCOeval requires the limit on the number of detections per image (maxDets) to be a list
112
- # with at least 3 elements. The default maxDets in COCOeval is [1, 10, 100], in which the
113
- # 3rd element (100) is used as the limit on the number of detections per image when
114
- # evaluating AP. COCOEvaluator expects an integer for max_dets_per_image, so for COCOeval,
115
- # we reformat max_dets_per_image into [1, 10, max_dets_per_image], based on the defaults.
116
- if max_dets_per_image is None:
117
- max_dets_per_image = [1, 10, 100]
118
- else:
119
- max_dets_per_image = [1, 10, max_dets_per_image]
120
- self._max_dets_per_image = max_dets_per_image
121
-
122
- if tasks is not None and isinstance(tasks, CfgNode):
123
- kpt_oks_sigmas = (
124
- tasks.TEST.KEYPOINT_OKS_SIGMAS if not kpt_oks_sigmas else kpt_oks_sigmas
125
- )
126
- self._logger.warn(
127
- "COCO Evaluator instantiated using config, this is deprecated behavior."
128
- " Please pass in explicit arguments instead."
129
- )
130
- self._tasks = None # Infering it from predictions should be better
131
- else:
132
- self._tasks = tasks
133
-
134
- self._cpu_device = torch.device("cpu")
135
-
136
- self._metadata = MetadataCatalog.get(dataset_name)
137
- if not hasattr(self._metadata, "json_file"):
138
- if output_dir is None:
139
- raise ValueError(
140
- "output_dir must be provided to COCOEvaluator "
141
- "for datasets not in COCO format."
142
- )
143
- self._logger.info(f"Trying to convert '{dataset_name}' to COCO format ...")
144
-
145
- cache_path = os.path.join(output_dir, f"{dataset_name}_coco_format.json")
146
- self._metadata.json_file = cache_path
147
- convert_to_coco_json(dataset_name, cache_path, allow_cached=allow_cached_coco)
148
-
149
- json_file = PathManager.get_local_path(self._metadata.json_file)
150
- with contextlib.redirect_stdout(io.StringIO()):
151
- self._coco_api = COCO(json_file)
152
-
153
- # Test set json files do not contain annotations (evaluation must be
154
- # performed using the COCO evaluation server).
155
- self._do_evaluation = "annotations" in self._coco_api.dataset
156
- if self._do_evaluation:
157
- self._kpt_oks_sigmas = kpt_oks_sigmas
158
-
159
- def reset(self):
160
- self._predictions = []
161
-
162
- def process(self, inputs, outputs):
163
- """
164
- Args:
165
- inputs: the inputs to a COCO model (e.g., GeneralizedRCNN).
166
- It is a list of dict. Each dict corresponds to an image and
167
- contains keys like "height", "width", "file_name", "image_id".
168
- outputs: the outputs of a COCO model. It is a list of dicts with key
169
- "instances" that contains :class:`Instances`.
170
- """
171
- for input, output in zip(inputs, outputs):
172
- prediction = {"image_id": input["image_id"]}
173
-
174
- if "instances" in output:
175
- instances = output["instances"].to(self._cpu_device)
176
- prediction["instances"] = instances_to_coco_json(instances, input["image_id"])
177
- if "proposals" in output:
178
- prediction["proposals"] = output["proposals"].to(self._cpu_device)
179
- if len(prediction) > 1:
180
- self._predictions.append(prediction)
181
-
182
- def evaluate(self, img_ids=None):
183
- """
184
- Args:
185
- img_ids: a list of image IDs to evaluate on. Default to None for the whole dataset
186
- """
187
- if self._distributed:
188
- comm.synchronize()
189
- predictions = comm.gather(self._predictions, dst=0)
190
- predictions = list(itertools.chain(*predictions))
191
-
192
- if not comm.is_main_process():
193
- return {}
194
- else:
195
- predictions = self._predictions
196
-
197
- if len(predictions) == 0:
198
- self._logger.warning("[COCOEvaluator] Did not receive valid predictions.")
199
- return {}
200
-
201
- if self._output_dir:
202
- PathManager.mkdirs(self._output_dir)
203
- file_path = os.path.join(self._output_dir, "instances_predictions.pth")
204
- with PathManager.open(file_path, "wb") as f:
205
- torch.save(predictions, f)
206
-
207
- self._results = OrderedDict()
208
- if "proposals" in predictions[0]:
209
- self._eval_box_proposals(predictions)
210
- if "instances" in predictions[0]:
211
- self._eval_predictions(predictions, img_ids=img_ids)
212
- # Copy so the caller can do whatever with results
213
- return copy.deepcopy(self._results)
214
-
215
- def _tasks_from_predictions(self, predictions):
216
- """
217
- Get COCO API "tasks" (i.e. iou_type) from COCO-format predictions.
218
- """
219
- tasks = {"bbox"}
220
- for pred in predictions:
221
- if "segmentation" in pred and not self.no_segm:
222
- tasks.add("segm")
223
- if "keypoints" in pred:
224
- tasks.add("keypoints")
225
- return sorted(tasks)
226
-
227
- def _eval_predictions(self, predictions, img_ids=None):
228
- """
229
- Evaluate predictions. Fill self._results with the metrics of the tasks.
230
- """
231
- self._logger.info("Preparing results for COCO format ...")
232
- coco_results = list(itertools.chain(*[x["instances"] for x in predictions]))
233
- tasks = self._tasks or self._tasks_from_predictions(coco_results)
234
-
235
- # unmap the category ids for COCO
236
- if hasattr(self._metadata, "thing_dataset_id_to_contiguous_id"):
237
- dataset_id_to_contiguous_id = self._metadata.thing_dataset_id_to_contiguous_id
238
- all_contiguous_ids = list(dataset_id_to_contiguous_id.values())
239
- num_classes = len(all_contiguous_ids)
240
- assert min(all_contiguous_ids) == 0 and max(all_contiguous_ids) == num_classes - 1
241
-
242
- reverse_id_mapping = {v: k for k, v in dataset_id_to_contiguous_id.items()}
243
- for result in coco_results:
244
- category_id = result["category_id"]
245
- assert category_id < num_classes, (
246
- f"A prediction has class={category_id}, "
247
- f"but the dataset only has {num_classes} classes and "
248
- f"predicted class id should be in [0, {num_classes - 1}]."
249
- )
250
- result["category_id"] = reverse_id_mapping[category_id]
251
-
252
- if self._output_dir:
253
- file_path = os.path.join(self._output_dir, "coco_instances_results.json")
254
- self._logger.info("Saving results to {}".format(file_path))
255
- with PathManager.open(file_path, "w") as f:
256
- f.write(json.dumps(coco_results))
257
- f.flush()
258
-
259
- if not self._do_evaluation:
260
- self._logger.info("Annotations are not available for evaluation.")
261
- return
262
-
263
- self._logger.info(
264
- "Evaluating predictions with {} COCO API...".format(
265
- "unofficial" if self._use_fast_impl else "official"
266
- )
267
- )
268
- for task in sorted(tasks):
269
- assert task in {"bbox", "segm", "keypoints"}, f"Got unknown task: {task}!"
270
- coco_eval = (
271
- _evaluate_predictions_on_coco(
272
- self._coco_api,
273
- coco_results,
274
- task,
275
- kpt_oks_sigmas=self._kpt_oks_sigmas,
276
- use_fast_impl=self._use_fast_impl,
277
- img_ids=img_ids,
278
- max_dets_per_image=self._max_dets_per_image,
279
- )
280
- if len(coco_results) > 0
281
- else None # cocoapi does not handle empty results very well
282
- )
283
-
284
- res = self._derive_coco_results(
285
- coco_eval, task, class_names=self._metadata.get("thing_classes")
286
- )
287
- self._results[task] = res
288
-
289
- def _eval_box_proposals(self, predictions):
290
- """
291
- Evaluate the box proposals in predictions.
292
- Fill self._results with the metrics for "box_proposals" task.
293
- """
294
- if self._output_dir:
295
- # Saving generated box proposals to file.
296
- # Predicted box_proposals are in XYXY_ABS mode.
297
- bbox_mode = BoxMode.XYXY_ABS.value
298
- ids, boxes, objectness_logits = [], [], []
299
- for prediction in predictions:
300
- ids.append(prediction["image_id"])
301
- boxes.append(prediction["proposals"].proposal_boxes.tensor.numpy())
302
- objectness_logits.append(prediction["proposals"].objectness_logits.numpy())
303
-
304
- proposal_data = {
305
- "boxes": boxes,
306
- "objectness_logits": objectness_logits,
307
- "ids": ids,
308
- "bbox_mode": bbox_mode,
309
- }
310
- with PathManager.open(os.path.join(self._output_dir, "box_proposals.pkl"), "wb") as f:
311
- pickle.dump(proposal_data, f)
312
-
313
- if not self._do_evaluation:
314
- self._logger.info("Annotations are not available for evaluation.")
315
- return
316
-
317
- self._logger.info("Evaluating bbox proposals ...")
318
- res = {}
319
- areas = {"all": "", "small": "s", "medium": "m", "large": "l"}
320
- for limit in [100, 1000]:
321
- for area, suffix in areas.items():
322
- stats = _evaluate_box_proposals(predictions, self._coco_api, area=area, limit=limit)
323
- key = "AR{}@{:d}".format(suffix, limit)
324
- res[key] = float(stats["ar"].item() * 100)
325
- self._logger.info("Proposal metrics: \n" + create_small_table(res))
326
- self._results["box_proposals"] = res
327
-
328
- def _derive_coco_results(self, coco_eval, iou_type, class_names=None):
329
- """
330
- Derive the desired score numbers from summarized COCOeval.
331
-
332
- Args:
333
- coco_eval (None or COCOEval): None represents no predictions from model.
334
- iou_type (str):
335
- class_names (None or list[str]): if provided, will use it to predict
336
- per-category AP.
337
-
338
- Returns:
339
- a dict of {metric name: score}
340
- """
341
-
342
- metrics = {
343
- "bbox": ["AP", "AP50", "AP75", "APs", "APm", "APl"],
344
- "segm": ["AP", "AP50", "AP75", "APs", "APm", "APl"],
345
- "keypoints": ["AP", "AP50", "AP75", "APm", "APl"],
346
- }[iou_type]
347
-
348
- if coco_eval is None:
349
- self._logger.warn("No predictions from the model!")
350
- return {metric: float("nan") for metric in metrics}
351
-
352
- # the standard metrics
353
- results = {
354
- metric: float(coco_eval.stats[idx] * 100 if coco_eval.stats[idx] >= 0 else "nan")
355
- for idx, metric in enumerate(metrics)
356
- }
357
- self._logger.info(
358
- "Evaluation results for {}: \n".format(iou_type) + create_small_table(results)
359
- )
360
- if not np.isfinite(sum(results.values())):
361
- self._logger.info("Some metrics cannot be computed and is shown as NaN.")
362
-
363
- if class_names is None or len(class_names) <= 1:
364
- return results
365
- # Compute per-category AP
366
- # from https://github.com/facebookresearch/Detectron/blob/a6a835f5b8208c45d0dce217ce9bbda915f44df7/detectron/datasets/json_dataset_evaluator.py#L222-L252 # noqa
367
- precisions = coco_eval.eval["precision"]
368
- # precision has dims (iou, recall, cls, area range, max dets)
369
- assert len(class_names) == precisions.shape[2]
370
-
371
- results_per_category = []
372
- for idx, name in enumerate(class_names):
373
- # area range index 0: all area ranges
374
- # max dets index -1: typically 100 per image
375
- precision = precisions[:, :, idx, 0, -1]
376
- precision = precision[precision > -1]
377
- ap = np.mean(precision) if precision.size else float("nan")
378
- results_per_category.append(("{}".format(name), float(ap * 100)))
379
-
380
- # tabulate it
381
- N_COLS = min(6, len(results_per_category) * 2)
382
- results_flatten = list(itertools.chain(*results_per_category))
383
- results_2d = itertools.zip_longest(*[results_flatten[i::N_COLS] for i in range(N_COLS)])
384
- table = tabulate(
385
- results_2d,
386
- tablefmt="pipe",
387
- floatfmt=".3f",
388
- headers=["category", "AP"] * (N_COLS // 2),
389
- numalign="left",
390
- )
391
- self._logger.info("Per-category {} AP: \n".format(iou_type) + table)
392
-
393
- results.update({"AP-" + name: ap for name, ap in results_per_category})
394
- return results
395
-
396
-
397
- def instances_to_coco_json(instances, img_id):
398
- """
399
- Dump an "Instances" object to a COCO-format json that's used for evaluation.
400
-
401
- Args:
402
- instances (Instances):
403
- img_id (int): the image id
404
-
405
- Returns:
406
- list[dict]: list of json annotations in COCO format.
407
- """
408
- num_instance = len(instances)
409
- if num_instance == 0:
410
- return []
411
-
412
- boxes = instances.pred_boxes.tensor.numpy()
413
- boxes = BoxMode.convert(boxes, BoxMode.XYXY_ABS, BoxMode.XYWH_ABS)
414
- boxes = boxes.tolist()
415
- scores = instances.scores.tolist()
416
- classes = instances.pred_classes.tolist()
417
-
418
- has_mask = instances.has("pred_masks")
419
- if has_mask:
420
- # use RLE to encode the masks, because they are too large and takes memory
421
- # since this evaluator stores outputs of the entire dataset
422
- rles = [
423
- mask_util.encode(np.array(mask[:, :, None], order="F", dtype="uint8"))[0]
424
- for mask in instances.pred_masks
425
- ]
426
- for rle in rles:
427
- # "counts" is an array encoded by mask_util as a byte-stream. Python3's
428
- # json writer which always produces strings cannot serialize a bytestream
429
- # unless you decode it. Thankfully, utf-8 works out (which is also what
430
- # the pycocotools/_mask.pyx does).
431
- rle["counts"] = rle["counts"].decode("utf-8")
432
-
433
- has_keypoints = instances.has("pred_keypoints")
434
- if has_keypoints:
435
- keypoints = instances.pred_keypoints
436
-
437
- results = []
438
- for k in range(num_instance):
439
- result = {
440
- "image_id": img_id,
441
- "category_id": classes[k],
442
- "bbox": boxes[k],
443
- "score": scores[k],
444
- }
445
- if has_mask:
446
- result["segmentation"] = rles[k]
447
- if has_keypoints:
448
- # In COCO annotations,
449
- # keypoints coordinates are pixel indices.
450
- # However our predictions are floating point coordinates.
451
- # Therefore we subtract 0.5 to be consistent with the annotation format.
452
- # This is the inverse of data loading logic in `datasets/coco.py`.
453
- keypoints[k][:, :2] -= 0.5
454
- result["keypoints"] = keypoints[k].flatten().tolist()
455
- results.append(result)
456
- return results
457
-
458
-
459
- # inspired from Detectron:
460
- # https://github.com/facebookresearch/Detectron/blob/a6a835f5b8208c45d0dce217ce9bbda915f44df7/detectron/datasets/json_dataset_evaluator.py#L255 # noqa
461
- def _evaluate_box_proposals(dataset_predictions, coco_api, thresholds=None, area="all", limit=None):
462
- """
463
- Evaluate detection proposal recall metrics. This function is a much
464
- faster alternative to the official COCO API recall evaluation code. However,
465
- it produces slightly different results.
466
- """
467
- # Record max overlap value for each gt box
468
- # Return vector of overlap values
469
- areas = {
470
- "all": 0,
471
- "small": 1,
472
- "medium": 2,
473
- "large": 3,
474
- "96-128": 4,
475
- "128-256": 5,
476
- "256-512": 6,
477
- "512-inf": 7,
478
- }
479
- area_ranges = [
480
- [0**2, 1e5**2], # all
481
- [0**2, 32**2], # small
482
- [32**2, 96**2], # medium
483
- [96**2, 1e5**2], # large
484
- [96**2, 128**2], # 96-128
485
- [128**2, 256**2], # 128-256
486
- [256**2, 512**2], # 256-512
487
- [512**2, 1e5**2],
488
- ] # 512-inf
489
- assert area in areas, "Unknown area range: {}".format(area)
490
- area_range = area_ranges[areas[area]]
491
- gt_overlaps = []
492
- num_pos = 0
493
-
494
- for prediction_dict in dataset_predictions:
495
- predictions = prediction_dict["proposals"]
496
-
497
- # sort predictions in descending order
498
- # TODO maybe remove this and make it explicit in the documentation
499
- inds = predictions.objectness_logits.sort(descending=True)[1]
500
- predictions = predictions[inds]
501
-
502
- ann_ids = coco_api.getAnnIds(imgIds=prediction_dict["image_id"])
503
- anno = coco_api.loadAnns(ann_ids)
504
- gt_boxes = [
505
- BoxMode.convert(obj["bbox"], BoxMode.XYWH_ABS, BoxMode.XYXY_ABS)
506
- for obj in anno
507
- if obj["iscrowd"] == 0
508
- ]
509
- gt_boxes = torch.as_tensor(gt_boxes).reshape(-1, 4) # guard against no boxes
510
- gt_boxes = Boxes(gt_boxes)
511
- gt_areas = torch.as_tensor([obj["area"] for obj in anno if obj["iscrowd"] == 0])
512
-
513
- if len(gt_boxes) == 0 or len(predictions) == 0:
514
- continue
515
-
516
- valid_gt_inds = (gt_areas >= area_range[0]) & (gt_areas <= area_range[1])
517
- gt_boxes = gt_boxes[valid_gt_inds]
518
-
519
- num_pos += len(gt_boxes)
520
-
521
- if len(gt_boxes) == 0:
522
- continue
523
-
524
- if limit is not None and len(predictions) > limit:
525
- predictions = predictions[:limit]
526
-
527
- overlaps = pairwise_iou(predictions.proposal_boxes, gt_boxes)
528
-
529
- _gt_overlaps = torch.zeros(len(gt_boxes))
530
- for j in range(min(len(predictions), len(gt_boxes))):
531
- # find which proposal box maximally covers each gt box
532
- # and get the iou amount of coverage for each gt box
533
- max_overlaps, argmax_overlaps = overlaps.max(dim=0)
534
-
535
- # find which gt box is 'best' covered (i.e. 'best' = most iou)
536
- gt_ovr, gt_ind = max_overlaps.max(dim=0)
537
- assert gt_ovr >= 0
538
- # find the proposal box that covers the best covered gt box
539
- box_ind = argmax_overlaps[gt_ind]
540
- # record the iou coverage of this gt box
541
- _gt_overlaps[j] = overlaps[box_ind, gt_ind]
542
- assert _gt_overlaps[j] == gt_ovr
543
- # mark the proposal box and the gt box as used
544
- overlaps[box_ind, :] = -1
545
- overlaps[:, gt_ind] = -1
546
-
547
- # append recorded iou coverage level
548
- gt_overlaps.append(_gt_overlaps)
549
- gt_overlaps = (
550
- torch.cat(gt_overlaps, dim=0) if len(gt_overlaps) else torch.zeros(0, dtype=torch.float32)
551
- )
552
- gt_overlaps, _ = torch.sort(gt_overlaps)
553
-
554
- if thresholds is None:
555
- step = 0.05
556
- thresholds = torch.arange(0.5, 0.95 + 1e-5, step, dtype=torch.float32)
557
- recalls = torch.zeros_like(thresholds)
558
- # compute recall for each iou threshold
559
- for i, t in enumerate(thresholds):
560
- recalls[i] = (gt_overlaps >= t).float().sum() / float(num_pos)
561
- # ar = 2 * np.trapz(recalls, thresholds)
562
- ar = recalls.mean()
563
- return {
564
- "ar": ar,
565
- "recalls": recalls,
566
- "thresholds": thresholds,
567
- "gt_overlaps": gt_overlaps,
568
- "num_pos": num_pos,
569
- }
570
-
571
-
572
- def _evaluate_predictions_on_coco(
573
- coco_gt,
574
- coco_results,
575
- iou_type,
576
- kpt_oks_sigmas=None,
577
- use_fast_impl=True,
578
- img_ids=None,
579
- max_dets_per_image=None,
580
- ):
581
- """
582
- Evaluate the coco results using COCOEval API.
583
- """
584
- assert len(coco_results) > 0
585
-
586
- if iou_type == "segm":
587
- coco_results = copy.deepcopy(coco_results)
588
- # When evaluating mask AP, if the results contain bbox, cocoapi will
589
- # use the box area as the area of the instance, instead of the mask area.
590
- # This leads to a different definition of small/medium/large.
591
- # We remove the bbox field to let mask AP use mask area.
592
- for c in coco_results:
593
- c.pop("bbox", None)
594
-
595
- coco_dt = coco_gt.loadRes(coco_results)
596
- coco_eval = (COCOeval_opt if use_fast_impl else COCOeval)(coco_gt, coco_dt, iou_type)
597
- # For COCO, the default max_dets_per_image is [1, 10, 100].
598
- if max_dets_per_image is None:
599
- max_dets_per_image = [1, 10, 100] # Default from COCOEval
600
- else:
601
- assert (
602
- len(max_dets_per_image) >= 3
603
- ), "COCOeval requires maxDets (and max_dets_per_image) to have length at least 3"
604
- # In the case that user supplies a custom input for max_dets_per_image,
605
- # apply COCOevalMaxDets to evaluate AP with the custom input.
606
- if max_dets_per_image[2] != 100:
607
- coco_eval = COCOevalMaxDets(coco_gt, coco_dt, iou_type)
608
- if iou_type != "keypoints":
609
- coco_eval.params.maxDets = max_dets_per_image
610
-
611
- if img_ids is not None:
612
- coco_eval.params.imgIds = img_ids
613
-
614
- if iou_type == "keypoints":
615
- # Use the COCO default keypoint OKS sigmas unless overrides are specified
616
- if kpt_oks_sigmas:
617
- assert hasattr(coco_eval.params, "kpt_oks_sigmas"), "pycocotools is too old!"
618
- coco_eval.params.kpt_oks_sigmas = np.array(kpt_oks_sigmas)
619
- # COCOAPI requires every detection and every gt to have keypoints, so
620
- # we just take the first entry from both
621
- num_keypoints_dt = len(coco_results[0]["keypoints"]) // 3
622
- num_keypoints_gt = len(next(iter(coco_gt.anns.values()))["keypoints"]) // 3
623
- num_keypoints_oks = len(coco_eval.params.kpt_oks_sigmas)
624
- assert num_keypoints_oks == num_keypoints_dt == num_keypoints_gt, (
625
- f"[COCOEvaluator] Prediction contain {num_keypoints_dt} keypoints. "
626
- f"Ground truth contains {num_keypoints_gt} keypoints. "
627
- f"The length of cfg.TEST.KEYPOINT_OKS_SIGMAS is {num_keypoints_oks}. "
628
- "They have to agree with each other. For meaning of OKS, please refer to "
629
- "http://cocodataset.org/#keypoints-eval."
630
- )
631
-
632
- coco_eval.evaluate()
633
- coco_eval.accumulate()
634
- coco_eval.summarize()
635
-
636
- return coco_eval
637
-
638
-
639
- class COCOevalMaxDets(COCOeval):
640
- """
641
- Modified version of COCOeval for evaluating AP with a custom
642
- maxDets (by default for COCO, maxDets is 100)
643
- """
644
-
645
- def summarize(self):
646
- """
647
- Compute and display summary metrics for evaluation results given
648
- a custom value for max_dets_per_image
649
- """
650
-
651
- def _summarize(ap=1, iouThr=None, areaRng="all", maxDets=100):
652
- p = self.params
653
- iStr = " {:<18} {} @[ IoU={:<9} | area={:>6s} | maxDets={:>3d} ] = {:0.3f}"
654
- titleStr = "Average Precision" if ap == 1 else "Average Recall"
655
- typeStr = "(AP)" if ap == 1 else "(AR)"
656
- iouStr = (
657
- "{:0.2f}:{:0.2f}".format(p.iouThrs[0], p.iouThrs[-1])
658
- if iouThr is None
659
- else "{:0.2f}".format(iouThr)
660
- )
661
-
662
- aind = [i for i, aRng in enumerate(p.areaRngLbl) if aRng == areaRng]
663
- mind = [i for i, mDet in enumerate(p.maxDets) if mDet == maxDets]
664
- if ap == 1:
665
- # dimension of precision: [TxRxKxAxM]
666
- s = self.eval["precision"]
667
- # IoU
668
- if iouThr is not None:
669
- t = np.where(iouThr == p.iouThrs)[0]
670
- s = s[t]
671
- s = s[:, :, :, aind, mind]
672
- else:
673
- # dimension of recall: [TxKxAxM]
674
- s = self.eval["recall"]
675
- if iouThr is not None:
676
- t = np.where(iouThr == p.iouThrs)[0]
677
- s = s[t]
678
- s = s[:, :, aind, mind]
679
- if len(s[s > -1]) == 0:
680
- mean_s = -1
681
- else:
682
- mean_s = np.mean(s[s > -1])
683
- print(iStr.format(titleStr, typeStr, iouStr, areaRng, maxDets, mean_s))
684
- return mean_s
685
-
686
- def _summarizeDets():
687
- stats = np.zeros((12,))
688
- # Evaluate AP using the custom limit on maximum detections per image
689
- stats[0] = _summarize(1, maxDets=self.params.maxDets[2])
690
- stats[1] = _summarize(1, iouThr=0.5, maxDets=self.params.maxDets[2])
691
- stats[2] = _summarize(1, iouThr=0.75, maxDets=self.params.maxDets[2])
692
- stats[3] = _summarize(1, areaRng="small", maxDets=self.params.maxDets[2])
693
- stats[4] = _summarize(1, areaRng="medium", maxDets=self.params.maxDets[2])
694
- stats[5] = _summarize(1, areaRng="large", maxDets=self.params.maxDets[2])
695
- stats[6] = _summarize(0, maxDets=self.params.maxDets[0])
696
- stats[7] = _summarize(0, maxDets=self.params.maxDets[1])
697
- stats[8] = _summarize(0, maxDets=self.params.maxDets[2])
698
- stats[9] = _summarize(0, areaRng="small", maxDets=self.params.maxDets[2])
699
- stats[10] = _summarize(0, areaRng="medium", maxDets=self.params.maxDets[2])
700
- stats[11] = _summarize(0, areaRng="large", maxDets=self.params.maxDets[2])
701
- return stats
702
-
703
- def _summarizeKps():
704
- stats = np.zeros((10,))
705
- stats[0] = _summarize(1, maxDets=20)
706
- stats[1] = _summarize(1, maxDets=20, iouThr=0.5)
707
- stats[2] = _summarize(1, maxDets=20, iouThr=0.75)
708
- stats[3] = _summarize(1, maxDets=20, areaRng="medium")
709
- stats[4] = _summarize(1, maxDets=20, areaRng="large")
710
- stats[5] = _summarize(0, maxDets=20)
711
- stats[6] = _summarize(0, maxDets=20, iouThr=0.5)
712
- stats[7] = _summarize(0, maxDets=20, iouThr=0.75)
713
- stats[8] = _summarize(0, maxDets=20, areaRng="medium")
714
- stats[9] = _summarize(0, maxDets=20, areaRng="large")
715
- return stats
716
-
717
- if not self.eval:
718
- raise Exception("Please run accumulate() first")
719
- iouType = self.params.iouType
720
- if iouType == "segm" or iouType == "bbox":
721
- summarize = _summarizeDets
722
- elif iouType == "keypoints":
723
- summarize = _summarizeKps
724
- self.stats = summarize()
725
-
726
- def __str__(self):
727
- self.summarize()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/model_zoo/configs/Base-RCNN-FPN.yaml DELETED
@@ -1,42 +0,0 @@
1
- MODEL:
2
- META_ARCHITECTURE: "GeneralizedRCNN"
3
- BACKBONE:
4
- NAME: "build_resnet_fpn_backbone"
5
- RESNETS:
6
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
7
- FPN:
8
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
9
- ANCHOR_GENERATOR:
10
- SIZES: [[32], [64], [128], [256], [512]] # One size for each in feature map
11
- ASPECT_RATIOS: [[0.5, 1.0, 2.0]] # Three aspect ratios (same for all in feature maps)
12
- RPN:
13
- IN_FEATURES: ["p2", "p3", "p4", "p5", "p6"]
14
- PRE_NMS_TOPK_TRAIN: 2000 # Per FPN level
15
- PRE_NMS_TOPK_TEST: 1000 # Per FPN level
16
- # Detectron1 uses 2000 proposals per-batch,
17
- # (See "modeling/rpn/rpn_outputs.py" for details of this legacy issue)
18
- # which is approximately 1000 proposals per-image since the default batch size for FPN is 2.
19
- POST_NMS_TOPK_TRAIN: 1000
20
- POST_NMS_TOPK_TEST: 1000
21
- ROI_HEADS:
22
- NAME: "StandardROIHeads"
23
- IN_FEATURES: ["p2", "p3", "p4", "p5"]
24
- ROI_BOX_HEAD:
25
- NAME: "FastRCNNConvFCHead"
26
- NUM_FC: 2
27
- POOLER_RESOLUTION: 7
28
- ROI_MASK_HEAD:
29
- NAME: "MaskRCNNConvUpsampleHead"
30
- NUM_CONV: 4
31
- POOLER_RESOLUTION: 14
32
- DATASETS:
33
- TRAIN: ("coco_2017_train",)
34
- TEST: ("coco_2017_val",)
35
- SOLVER:
36
- IMS_PER_BATCH: 16
37
- BASE_LR: 0.02
38
- STEPS: (60000, 80000)
39
- MAX_ITER: 90000
40
- INPUT:
41
- MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)
42
- VERSION: 2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_100perc.yaml DELETED
@@ -1,40 +0,0 @@
1
- _BASE_: "../Base-RCNN-FPN.yaml"
2
- MODEL:
3
- PIXEL_MEAN: [123.675, 116.280, 103.530]
4
- PIXEL_STD: [58.395, 57.120, 57.375]
5
- WEIGHTS: "http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_final.pth"
6
- MASK_ON: True
7
- BACKBONE:
8
- FREEZE_AT: 0
9
- RESNETS:
10
- DEPTH: 50
11
- NORM: "SyncBN"
12
- STRIDE_IN_1X1: False
13
- FPN:
14
- NORM: "SyncBN"
15
- ROI_BOX_HEAD:
16
- CLS_AGNOSTIC_BBOX_REG: True
17
- ROI_HEADS:
18
- NAME: CustomCascadeROIHeads
19
- RPN:
20
- POST_NMS_TOPK_TRAIN: 2000
21
- DATASETS:
22
- TRAIN: ("coco_2017_train",)
23
- TEST: ("coco_2017_val",)
24
- SOLVER:
25
- IMS_PER_BATCH: 16
26
- BASE_LR: 0.02
27
- STEPS: (60000, 80000)
28
- MAX_ITER: 90000
29
- BASE_LR_MULTIPLIER: 2
30
- BASE_LR_MULTIPLIER_NAMES: ['roi_heads.mask_head.predictor', 'roi_heads.box_predictor.0.cls_score', 'roi_heads.box_predictor.0.bbox_pred', 'roi_heads.box_predictor.1.cls_score', 'roi_heads.box_predictor.1.bbox_pred', 'roi_heads.box_predictor.2.cls_score', 'roi_heads.box_predictor.2.bbox_pred']
31
- INPUT:
32
- MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)
33
- MAX_SIZE_TRAIN: 1333
34
- MASK_FORMAT: "bitmask"
35
- FORMAT: "RGB"
36
- TEST:
37
- PRECISE_BN:
38
- ENABLED: True
39
- EVAL_PERIOD: 5000
40
- OUTPUT_DIR: "output/100perc"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_10perc.yaml DELETED
@@ -1,40 +0,0 @@
1
- _BASE_: "../Base-RCNN-FPN.yaml"
2
- MODEL:
3
- PIXEL_MEAN: [123.675, 116.280, 103.530]
4
- PIXEL_STD: [58.395, 57.120, 57.375]
5
- WEIGHTS: "http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_final.pth"
6
- MASK_ON: True
7
- BACKBONE:
8
- FREEZE_AT: 0
9
- RESNETS:
10
- DEPTH: 50
11
- NORM: "SyncBN"
12
- STRIDE_IN_1X1: False
13
- FPN:
14
- NORM: "SyncBN"
15
- ROI_BOX_HEAD:
16
- CLS_AGNOSTIC_BBOX_REG: True
17
- ROI_HEADS:
18
- NAME: CustomCascadeROIHeads
19
- RPN:
20
- POST_NMS_TOPK_TRAIN: 2000
21
- DATASETS:
22
- TRAIN: ("coco_semi_10perc",)
23
- TEST: ("coco_2017_val",)
24
- SOLVER:
25
- IMS_PER_BATCH: 16
26
- BASE_LR: 0.04
27
- STEPS: (6000, 8000)
28
- MAX_ITER: 9000
29
- BASE_LR_MULTIPLIER: 4
30
- BASE_LR_MULTIPLIER_NAMES: ['roi_heads.mask_head.predictor', 'roi_heads.box_predictor.0.cls_score', 'roi_heads.box_predictor.0.bbox_pred', 'roi_heads.box_predictor.1.cls_score', 'roi_heads.box_predictor.1.bbox_pred', 'roi_heads.box_predictor.2.cls_score', 'roi_heads.box_predictor.2.bbox_pred']
31
- INPUT:
32
- MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)
33
- MAX_SIZE_TRAIN: 1333
34
- MASK_FORMAT: "bitmask"
35
- FORMAT: "RGB"
36
- TEST:
37
- PRECISE_BN:
38
- ENABLED: True
39
- EVAL_PERIOD: 5000
40
- OUTPUT_DIR: "output/10perc"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_1perc.yaml DELETED
@@ -1,42 +0,0 @@
1
- _BASE_: "../Base-RCNN-FPN.yaml"
2
- MODEL:
3
- PIXEL_MEAN: [123.675, 116.280, 103.530]
4
- PIXEL_STD: [58.395, 57.120, 57.375]
5
- WEIGHTS: "http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_final.pth"
6
- MASK_ON: True
7
- BACKBONE:
8
- FREEZE_AT: 0
9
- RESNETS:
10
- DEPTH: 50
11
- NORM: "SyncBN"
12
- STRIDE_IN_1X1: False
13
- FPN:
14
- NORM: "SyncBN"
15
- ROI_BOX_HEAD:
16
- CLS_AGNOSTIC_BBOX_REG: True
17
- ROI_HEADS:
18
- NAME: CustomCascadeROIHeads
19
- RPN:
20
- POST_NMS_TOPK_TRAIN: 2000
21
- DATASETS:
22
- TRAIN: ("coco_semi_1perc",)
23
- TEST: ("coco_2017_val",)
24
- SOLVER:
25
- IMS_PER_BATCH: 16
26
- BASE_LR: 0.04
27
- STEPS: (2400, 3200)
28
- MAX_ITER: 3600
29
- WARMUP_FACTOR: 0.001
30
- WARMUP_ITERS: 1000
31
- BASE_LR_MULTIPLIER: 4
32
- BASE_LR_MULTIPLIER_NAMES: ['roi_heads.mask_head.predictor', 'roi_heads.box_predictor.0.cls_score', 'roi_heads.box_predictor.0.bbox_pred', 'roi_heads.box_predictor.1.cls_score', 'roi_heads.box_predictor.1.bbox_pred', 'roi_heads.box_predictor.2.cls_score', 'roi_heads.box_predictor.2.bbox_pred']
33
- INPUT:
34
- MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)
35
- MAX_SIZE_TRAIN: 1333
36
- MASK_FORMAT: "bitmask"
37
- FORMAT: "RGB"
38
- TEST:
39
- PRECISE_BN:
40
- ENABLED: True
41
- EVAL_PERIOD: 5000
42
- OUTPUT_DIR: "output/1perc"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_20perc.yaml DELETED
@@ -1,40 +0,0 @@
1
- _BASE_: "../Base-RCNN-FPN.yaml"
2
- MODEL:
3
- PIXEL_MEAN: [123.675, 116.280, 103.530]
4
- PIXEL_STD: [58.395, 57.120, 57.375]
5
- WEIGHTS: "http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_final.pth"
6
- MASK_ON: True
7
- BACKBONE:
8
- FREEZE_AT: 0
9
- RESNETS:
10
- DEPTH: 50
11
- NORM: "SyncBN"
12
- STRIDE_IN_1X1: False
13
- FPN:
14
- NORM: "SyncBN"
15
- ROI_BOX_HEAD:
16
- CLS_AGNOSTIC_BBOX_REG: True
17
- ROI_HEADS:
18
- NAME: CustomCascadeROIHeads
19
- RPN:
20
- POST_NMS_TOPK_TRAIN: 2000
21
- DATASETS:
22
- TRAIN: ("coco_semi_20perc",)
23
- TEST: ("coco_2017_val",)
24
- SOLVER:
25
- IMS_PER_BATCH: 16
26
- BASE_LR: 0.04
27
- STEPS: (12000, 16000)
28
- MAX_ITER: 18000
29
- BASE_LR_MULTIPLIER: 4
30
- BASE_LR_MULTIPLIER_NAMES: ['roi_heads.mask_head.predictor', 'roi_heads.box_predictor.0.cls_score', 'roi_heads.box_predictor.0.bbox_pred', 'roi_heads.box_predictor.1.cls_score', 'roi_heads.box_predictor.1.bbox_pred', 'roi_heads.box_predictor.2.cls_score', 'roi_heads.box_predictor.2.bbox_pred']
31
- INPUT:
32
- MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)
33
- MAX_SIZE_TRAIN: 1333
34
- MASK_FORMAT: "bitmask"
35
- FORMAT: "RGB"
36
- TEST:
37
- PRECISE_BN:
38
- ENABLED: True
39
- EVAL_PERIOD: 5000
40
- OUTPUT_DIR: "output/20perc"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_2perc.yaml DELETED
@@ -1,42 +0,0 @@
1
- _BASE_: "../Base-RCNN-FPN.yaml"
2
- MODEL:
3
- PIXEL_MEAN: [123.675, 116.280, 103.530]
4
- PIXEL_STD: [58.395, 57.120, 57.375]
5
- WEIGHTS: "http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_final.pth"
6
- MASK_ON: True
7
- BACKBONE:
8
- FREEZE_AT: 0
9
- RESNETS:
10
- DEPTH: 50
11
- NORM: "SyncBN"
12
- STRIDE_IN_1X1: False
13
- FPN:
14
- NORM: "SyncBN"
15
- ROI_BOX_HEAD:
16
- CLS_AGNOSTIC_BBOX_REG: True
17
- ROI_HEADS:
18
- NAME: CustomCascadeROIHeads
19
- RPN:
20
- POST_NMS_TOPK_TRAIN: 2000
21
- DATASETS:
22
- TRAIN: ("coco_semi_2perc",)
23
- TEST: ("coco_2017_val",)
24
- SOLVER:
25
- IMS_PER_BATCH: 16
26
- BASE_LR: 0.04
27
- STEPS: (2400, 3200)
28
- MAX_ITER: 3600
29
- WARMUP_FACTOR: 0.001
30
- WARMUP_ITERS: 1000
31
- BASE_LR_MULTIPLIER: 4
32
- BASE_LR_MULTIPLIER_NAMES: ['roi_heads.mask_head.predictor', 'roi_heads.box_predictor.0.cls_score', 'roi_heads.box_predictor.0.bbox_pred', 'roi_heads.box_predictor.1.cls_score', 'roi_heads.box_predictor.1.bbox_pred', 'roi_heads.box_predictor.2.cls_score', 'roi_heads.box_predictor.2.bbox_pred']
33
- INPUT:
34
- MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)
35
- MAX_SIZE_TRAIN: 1333
36
- MASK_FORMAT: "bitmask"
37
- FORMAT: "RGB"
38
- TEST:
39
- PRECISE_BN:
40
- ENABLED: True
41
- EVAL_PERIOD: 5000
42
- OUTPUT_DIR: "output/2perc"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_30perc.yaml DELETED
@@ -1,40 +0,0 @@
1
- _BASE_: "../Base-RCNN-FPN.yaml"
2
- MODEL:
3
- PIXEL_MEAN: [123.675, 116.280, 103.530]
4
- PIXEL_STD: [58.395, 57.120, 57.375]
5
- WEIGHTS: "http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_final.pth"
6
- MASK_ON: True
7
- BACKBONE:
8
- FREEZE_AT: 0
9
- RESNETS:
10
- DEPTH: 50
11
- NORM: "SyncBN"
12
- STRIDE_IN_1X1: False
13
- FPN:
14
- NORM: "SyncBN"
15
- ROI_BOX_HEAD:
16
- CLS_AGNOSTIC_BBOX_REG: True
17
- ROI_HEADS:
18
- NAME: CustomCascadeROIHeads
19
- RPN:
20
- POST_NMS_TOPK_TRAIN: 2000
21
- DATASETS:
22
- TRAIN: ("coco_semi_30perc",)
23
- TEST: ("coco_2017_val",)
24
- SOLVER:
25
- IMS_PER_BATCH: 16
26
- BASE_LR: 0.04
27
- STEPS: (18000, 24000)
28
- MAX_ITER: 27000
29
- BASE_LR_MULTIPLIER: 4
30
- BASE_LR_MULTIPLIER_NAMES: ['roi_heads.mask_head.predictor', 'roi_heads.box_predictor.0.cls_score', 'roi_heads.box_predictor.0.bbox_pred', 'roi_heads.box_predictor.1.cls_score', 'roi_heads.box_predictor.1.bbox_pred', 'roi_heads.box_predictor.2.cls_score', 'roi_heads.box_predictor.2.bbox_pred']
31
- INPUT:
32
- MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)
33
- MAX_SIZE_TRAIN: 1333
34
- MASK_FORMAT: "bitmask"
35
- FORMAT: "RGB"
36
- TEST:
37
- PRECISE_BN:
38
- ENABLED: True
39
- EVAL_PERIOD: 5000
40
- OUTPUT_DIR: "output/30perc"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_40perc.yaml DELETED
@@ -1,40 +0,0 @@
1
- _BASE_: "../Base-RCNN-FPN.yaml"
2
- MODEL:
3
- PIXEL_MEAN: [123.675, 116.280, 103.530]
4
- PIXEL_STD: [58.395, 57.120, 57.375]
5
- WEIGHTS: "http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_final.pth"
6
- MASK_ON: True
7
- BACKBONE:
8
- FREEZE_AT: 0
9
- RESNETS:
10
- DEPTH: 50
11
- NORM: "SyncBN"
12
- STRIDE_IN_1X1: False
13
- FPN:
14
- NORM: "SyncBN"
15
- ROI_BOX_HEAD:
16
- CLS_AGNOSTIC_BBOX_REG: True
17
- ROI_HEADS:
18
- NAME: CustomCascadeROIHeads
19
- RPN:
20
- POST_NMS_TOPK_TRAIN: 2000
21
- DATASETS:
22
- TRAIN: ("coco_semi_40perc",)
23
- TEST: ("coco_2017_val",)
24
- SOLVER:
25
- IMS_PER_BATCH: 16
26
- BASE_LR: 0.04
27
- STEPS: (24000, 32000)
28
- MAX_ITER: 36000
29
- BASE_LR_MULTIPLIER: 4
30
- BASE_LR_MULTIPLIER_NAMES: ['roi_heads.mask_head.predictor', 'roi_heads.box_predictor.0.cls_score', 'roi_heads.box_predictor.0.bbox_pred', 'roi_heads.box_predictor.1.cls_score', 'roi_heads.box_predictor.1.bbox_pred', 'roi_heads.box_predictor.2.cls_score', 'roi_heads.box_predictor.2.bbox_pred']
31
- INPUT:
32
- MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)
33
- MAX_SIZE_TRAIN: 1333
34
- MASK_FORMAT: "bitmask"
35
- FORMAT: "RGB"
36
- TEST:
37
- PRECISE_BN:
38
- ENABLED: True
39
- EVAL_PERIOD: 5000
40
- OUTPUT_DIR: "output/40perc"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_50perc.yaml DELETED
@@ -1,40 +0,0 @@
1
- _BASE_: "../Base-RCNN-FPN.yaml"
2
- MODEL:
3
- PIXEL_MEAN: [123.675, 116.280, 103.530]
4
- PIXEL_STD: [58.395, 57.120, 57.375]
5
- WEIGHTS: "http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_final.pth"
6
- MASK_ON: True
7
- BACKBONE:
8
- FREEZE_AT: 0
9
- RESNETS:
10
- DEPTH: 50
11
- NORM: "SyncBN"
12
- STRIDE_IN_1X1: False
13
- FPN:
14
- NORM: "SyncBN"
15
- ROI_BOX_HEAD:
16
- CLS_AGNOSTIC_BBOX_REG: True
17
- ROI_HEADS:
18
- NAME: CustomCascadeROIHeads
19
- RPN:
20
- POST_NMS_TOPK_TRAIN: 2000
21
- DATASETS:
22
- TRAIN: ("coco_semi_50perc",)
23
- TEST: ("coco_2017_val",)
24
- SOLVER:
25
- IMS_PER_BATCH: 16
26
- BASE_LR: 0.02
27
- STEPS: (30000, 40000)
28
- MAX_ITER: 45000
29
- BASE_LR_MULTIPLIER: 2
30
- BASE_LR_MULTIPLIER_NAMES: ['roi_heads.mask_head.predictor', 'roi_heads.box_predictor.0.cls_score', 'roi_heads.box_predictor.0.bbox_pred', 'roi_heads.box_predictor.1.cls_score', 'roi_heads.box_predictor.1.bbox_pred', 'roi_heads.box_predictor.2.cls_score', 'roi_heads.box_predictor.2.bbox_pred']
31
- INPUT:
32
- MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)
33
- MAX_SIZE_TRAIN: 1333
34
- MASK_FORMAT: "bitmask"
35
- FORMAT: "RGB"
36
- TEST:
37
- PRECISE_BN:
38
- ENABLED: True
39
- EVAL_PERIOD: 5000
40
- OUTPUT_DIR: "output/50perc"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_5perc.yaml DELETED
@@ -1,42 +0,0 @@
1
- _BASE_: "../Base-RCNN-FPN.yaml"
2
- MODEL:
3
- PIXEL_MEAN: [123.675, 116.280, 103.530]
4
- PIXEL_STD: [58.395, 57.120, 57.375]
5
- WEIGHTS: "http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_final.pth"
6
- MASK_ON: True
7
- BACKBONE:
8
- FREEZE_AT: 0
9
- RESNETS:
10
- DEPTH: 50
11
- NORM: "SyncBN"
12
- STRIDE_IN_1X1: False
13
- FPN:
14
- NORM: "SyncBN"
15
- ROI_BOX_HEAD:
16
- CLS_AGNOSTIC_BBOX_REG: True
17
- ROI_HEADS:
18
- NAME: CustomCascadeROIHeads
19
- RPN:
20
- POST_NMS_TOPK_TRAIN: 2000
21
- DATASETS:
22
- TRAIN: ("coco_semi_5perc",)
23
- TEST: ("coco_2017_val",)
24
- SOLVER:
25
- IMS_PER_BATCH: 16
26
- BASE_LR: 0.04
27
- STEPS: (3000, 4000)
28
- MAX_ITER: 4500
29
- WARMUP_FACTOR: 0.001
30
- WARMUP_ITERS: 1000
31
- BASE_LR_MULTIPLIER: 4
32
- BASE_LR_MULTIPLIER_NAMES: ['roi_heads.mask_head.predictor', 'roi_heads.box_predictor.0.cls_score', 'roi_heads.box_predictor.0.bbox_pred', 'roi_heads.box_predictor.1.cls_score', 'roi_heads.box_predictor.1.bbox_pred', 'roi_heads.box_predictor.2.cls_score', 'roi_heads.box_predictor.2.bbox_pred']
33
- INPUT:
34
- MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)
35
- MAX_SIZE_TRAIN: 1333
36
- MASK_FORMAT: "bitmask"
37
- FORMAT: "RGB"
38
- TEST:
39
- PRECISE_BN:
40
- ENABLED: True
41
- EVAL_PERIOD: 5000
42
- OUTPUT_DIR: "output/5perc"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_60perc.yaml DELETED
@@ -1,40 +0,0 @@
1
- _BASE_: "../Base-RCNN-FPN.yaml"
2
- MODEL:
3
- PIXEL_MEAN: [123.675, 116.280, 103.530]
4
- PIXEL_STD: [58.395, 57.120, 57.375]
5
- WEIGHTS: "http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_final.pth"
6
- MASK_ON: True
7
- BACKBONE:
8
- FREEZE_AT: 0
9
- RESNETS:
10
- DEPTH: 50
11
- NORM: "SyncBN"
12
- STRIDE_IN_1X1: False
13
- FPN:
14
- NORM: "SyncBN"
15
- ROI_BOX_HEAD:
16
- CLS_AGNOSTIC_BBOX_REG: True
17
- ROI_HEADS:
18
- NAME: CustomCascadeROIHeads
19
- RPN:
20
- POST_NMS_TOPK_TRAIN: 2000
21
- DATASETS:
22
- TRAIN: ("coco_semi_60perc",)
23
- TEST: ("coco_2017_val",)
24
- SOLVER:
25
- IMS_PER_BATCH: 16
26
- BASE_LR: 0.02
27
- STEPS: (36000, 48000)
28
- MAX_ITER: 54000
29
- BASE_LR_MULTIPLIER: 2
30
- BASE_LR_MULTIPLIER_NAMES: ['roi_heads.mask_head.predictor', 'roi_heads.box_predictor.0.cls_score', 'roi_heads.box_predictor.0.bbox_pred', 'roi_heads.box_predictor.1.cls_score', 'roi_heads.box_predictor.1.bbox_pred', 'roi_heads.box_predictor.2.cls_score', 'roi_heads.box_predictor.2.bbox_pred']
31
- INPUT:
32
- MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)
33
- MAX_SIZE_TRAIN: 1333
34
- MASK_FORMAT: "bitmask"
35
- FORMAT: "RGB"
36
- TEST:
37
- PRECISE_BN:
38
- ENABLED: True
39
- EVAL_PERIOD: 5000
40
- OUTPUT_DIR: "output/60perc"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_80perc.yaml DELETED
@@ -1,40 +0,0 @@
1
- _BASE_: "../Base-RCNN-FPN.yaml"
2
- MODEL:
3
- PIXEL_MEAN: [123.675, 116.280, 103.530]
4
- PIXEL_STD: [58.395, 57.120, 57.375]
5
- WEIGHTS: "http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_final.pth"
6
- MASK_ON: True
7
- BACKBONE:
8
- FREEZE_AT: 0
9
- RESNETS:
10
- DEPTH: 50
11
- NORM: "SyncBN"
12
- STRIDE_IN_1X1: False
13
- FPN:
14
- NORM: "SyncBN"
15
- ROI_BOX_HEAD:
16
- CLS_AGNOSTIC_BBOX_REG: True
17
- ROI_HEADS:
18
- NAME: CustomCascadeROIHeads
19
- RPN:
20
- POST_NMS_TOPK_TRAIN: 2000
21
- DATASETS:
22
- TRAIN: ("coco_semi_80perc",)
23
- TEST: ("coco_2017_val",)
24
- SOLVER:
25
- IMS_PER_BATCH: 16
26
- BASE_LR: 0.02
27
- STEPS: (48000, 64000)
28
- MAX_ITER: 72000
29
- BASE_LR_MULTIPLIER: 2
30
- BASE_LR_MULTIPLIER_NAMES: ['roi_heads.mask_head.predictor', 'roi_heads.box_predictor.0.cls_score', 'roi_heads.box_predictor.0.bbox_pred', 'roi_heads.box_predictor.1.cls_score', 'roi_heads.box_predictor.1.bbox_pred', 'roi_heads.box_predictor.2.cls_score', 'roi_heads.box_predictor.2.bbox_pred']
31
- INPUT:
32
- MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)
33
- MAX_SIZE_TRAIN: 1333
34
- MASK_FORMAT: "bitmask"
35
- FORMAT: "RGB"
36
- TEST:
37
- PRECISE_BN:
38
- ENABLED: True
39
- EVAL_PERIOD: 5000
40
- OUTPUT_DIR: "output/80perc"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml DELETED
@@ -1,61 +0,0 @@
1
- _BASE_: "../Base-RCNN-FPN.yaml"
2
- DATALOADER:
3
- COPY_PASTE: True
4
- COPY_PASTE_RATE: 1.0
5
- VISUALIZE_COPY_PASTE: False
6
- COPY_PASTE_RANDOM_NUM: True
7
- COPY_PASTE_MIN_RATIO: 0.3
8
- COPY_PASTE_MAX_RATIO: 1.0
9
- NUM_WORKERS: 0
10
- MODEL:
11
- PIXEL_MEAN: [123.675, 116.280, 103.530]
12
- PIXEL_STD: [58.395, 57.120, 57.375]
13
- WEIGHTS: 'http://dl.fbaipublicfiles.com/cutler/checkpoints/dino_RN50_pretrain_d2_format.pkl'
14
- MASK_ON: True
15
- BACKBONE:
16
- FREEZE_AT: 0
17
- RESNETS:
18
- DEPTH: 50
19
- NORM: "SyncBN"
20
- STRIDE_IN_1X1: False
21
- FPN:
22
- NORM: "SyncBN"
23
- ROI_BOX_HEAD:
24
- CLS_AGNOSTIC_BBOX_REG: True
25
- ROI_HEADS:
26
- NAME: CustomCascadeROIHeads
27
- NUM_CLASSES: 1
28
- SCORE_THRESH_TEST: 0.0
29
- POSITIVE_FRACTION: 0.25
30
- USE_DROPLOSS: True
31
- DROPLOSS_IOU_THRESH: 0.01
32
- RPN:
33
- POST_NMS_TOPK_TRAIN: 4000
34
- NMS_THRESH: 0.65
35
- DATASETS:
36
- TRAIN: ("imagenet_train",)
37
- SOLVER:
38
- IMS_PER_BATCH: 16
39
- BASE_LR: 0.01
40
- WEIGHT_DECAY: 0.00005
41
- STEPS: (80000,)
42
- MAX_ITER: 160000
43
- GAMMA: 0.02
44
- CLIP_GRADIENTS:
45
- CLIP_TYPE: norm
46
- CLIP_VALUE: 1.0
47
- ENABLED: true
48
- NORM_TYPE: 2.0
49
- AMP:
50
- ENABLED: True
51
- INPUT:
52
- MIN_SIZE_TRAIN: (240, 320, 480, 640, 672, 704, 736, 768, 800, 1024)
53
- MAX_SIZE_TRAIN: 1333
54
- MASK_FORMAT: "bitmask"
55
- FORMAT: "RGB"
56
- TEST:
57
- PRECISE_BN:
58
- ENABLED: True
59
- NUM_ITER: 200
60
- DETECTIONS_PER_IMAGE: 100
61
- OUTPUT_DIR: "output/"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN_demo.yaml DELETED
@@ -1,62 +0,0 @@
1
- _BASE_: "../Base-RCNN-FPN.yaml"
2
- DATALOADER:
3
- COPY_PASTE: True
4
- COPY_PASTE_RATE: 1.0
5
- VISUALIZE_COPY_PASTE: False
6
- COPY_PASTE_RANDOM_NUM: True
7
- COPY_PASTE_MIN_RATIO: 0.3
8
- COPY_PASTE_MAX_RATIO: 1.0
9
- NUM_WORKERS: 0
10
- MODEL:
11
- PIXEL_MEAN: [123.675, 116.280, 103.530]
12
- PIXEL_STD: [58.395, 57.120, 57.375]
13
- WEIGHTS: 'http://dl.fbaipublicfiles.com/cutler/checkpoints/dino_RN50_pretrain_d2_format.pkl'
14
- MASK_ON: True
15
- BACKBONE:
16
- FREEZE_AT: 0
17
- RESNETS:
18
- DEPTH: 50
19
- NORM: "SyncBN"
20
- STRIDE_IN_1X1: False
21
- FPN:
22
- NORM: "SyncBN"
23
- ROI_BOX_HEAD:
24
- CLS_AGNOSTIC_BBOX_REG: True
25
- ROI_HEADS:
26
- NAME: CustomCascadeROIHeads
27
- NUM_CLASSES: 1
28
- SCORE_THRESH_TEST: 0.0
29
- POSITIVE_FRACTION: 0.25
30
- USE_DROPLOSS: True
31
- DROPLOSS_IOU_THRESH: 0.01
32
- RPN:
33
- POST_NMS_TOPK_TRAIN: 4000
34
- NMS_THRESH: 0.65
35
- DATASETS:
36
- TRAIN: ("imagenet_train",)
37
- TEST: ("imagenet_train",)
38
- SOLVER:
39
- IMS_PER_BATCH: 16
40
- BASE_LR: 0.01
41
- WEIGHT_DECAY: 0.00005
42
- STEPS: (80000,)
43
- MAX_ITER: 160000
44
- GAMMA: 0.02
45
- CLIP_GRADIENTS:
46
- CLIP_TYPE: norm
47
- CLIP_VALUE: 1.0
48
- ENABLED: true
49
- NORM_TYPE: 2.0
50
- AMP:
51
- ENABLED: True
52
- INPUT:
53
- MIN_SIZE_TRAIN: (240, 320, 480, 640, 672, 704, 736, 768, 800, 1024)
54
- MAX_SIZE_TRAIN: 1333
55
- MASK_FORMAT: "bitmask"
56
- FORMAT: "RGB"
57
- TEST:
58
- PRECISE_BN:
59
- ENABLED: True
60
- NUM_ITER: 200
61
- DETECTIONS_PER_IMAGE: 100
62
- OUTPUT_DIR: "output/"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN_self_train.yaml DELETED
@@ -1,60 +0,0 @@
1
- _BASE_: "../Base-RCNN-FPN.yaml"
2
- DATALOADER:
3
- COPY_PASTE: True
4
- COPY_PASTE_RATE: 1.0
5
- VISUALIZE_COPY_PASTE: False
6
- COPY_PASTE_RANDOM_NUM: True
7
- COPY_PASTE_MIN_RATIO: 0.5
8
- COPY_PASTE_MAX_RATIO: 1.0
9
- NUM_WORKERS: 2
10
- MODEL:
11
- PIXEL_MEAN: [123.675, 116.280, 103.530]
12
- PIXEL_STD: [58.395, 57.120, 57.375]
13
- WEIGHTS: 'http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_r1.pth' # round 1
14
- # WEIGHTS: 'http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_r2.pth' # round 2
15
- MASK_ON: True
16
- BACKBONE:
17
- FREEZE_AT: 0
18
- RESNETS:
19
- DEPTH: 50
20
- NORM: "SyncBN"
21
- STRIDE_IN_1X1: False
22
- FPN:
23
- NORM: "SyncBN"
24
- ROI_BOX_HEAD:
25
- CLS_AGNOSTIC_BBOX_REG: True
26
- ROI_HEADS:
27
- NAME: CustomCascadeROIHeads
28
- NUM_CLASSES: 1
29
- SCORE_THRESH_TEST: 0.0
30
- POSITIVE_FRACTION: 0.25
31
- USE_DROPLOSS: False
32
- DROPLOSS_IOU_THRESH: 0.01
33
- DATASETS:
34
- TRAIN: ("imagenet_train_r1",) # round 1
35
- # TRAIN: ("imagenet_train_r2",) # round 2
36
- SOLVER:
37
- IMS_PER_BATCH: 16
38
- BASE_LR: 0.005
39
- STEPS: (79999,)
40
- MAX_ITER: 80000
41
- GAMMA: 1.0
42
- CLIP_GRADIENTS:
43
- CLIP_TYPE: norm
44
- CLIP_VALUE: 1.0
45
- ENABLED: true
46
- NORM_TYPE: 2.0
47
- AMP:
48
- ENABLED: True
49
- INPUT:
50
- MIN_SIZE_TRAIN: (240, 320, 480, 640, 672, 704, 736, 768, 800, 1024)
51
- MAX_SIZE_TRAIN: 1333
52
- MASK_FORMAT: "bitmask"
53
- FORMAT: "RGB"
54
- TEST:
55
- PRECISE_BN:
56
- ENABLED: True
57
- NUM_ITER: 200
58
- DETECTIONS_PER_IMAGE: 100
59
- OUTPUT_DIR: "output/self-train-r1/" # round 1
60
- # OUTPUT_DIR: "output/self-train-r2/" # round 2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/model_zoo/configs/CutLER-ImageNet/mask_rcnn_R_50_FPN.yaml DELETED
@@ -1,52 +0,0 @@
1
- _BASE_: "../Base-RCNN-FPN.yaml"
2
- DATALOADER:
3
- COPY_PASTE: True
4
- COPY_PASTE_RATE: 1.0
5
- VISUALIZE_COPY_PASTE: False
6
- COPY_PASTE_RANDOM_NUM: True
7
- COPY_PASTE_MIN_RATIO: 0.3
8
- COPY_PASTE_MAX_RATIO: 1.0
9
- MODEL:
10
- PIXEL_MEAN: [123.675, 116.280, 103.530]
11
- PIXEL_STD: [58.395, 57.120, 57.375]
12
- WEIGHTS: 'http://dl.fbaipublicfiles.com/cutler/checkpoints/dino_RN50_pretrain_d2_format.pkl'
13
- MASK_ON: True
14
- BACKBONE:
15
- FREEZE_AT: 0
16
- RESNETS:
17
- DEPTH: 50
18
- NORM: "SyncBN"
19
- STRIDE_IN_1X1: False
20
- FPN:
21
- NORM: "SyncBN"
22
- ROI_HEADS:
23
- NAME: "CustomStandardROIHeads"
24
- NUM_CLASSES: 1
25
- SCORE_THRESH_TEST: 0.0
26
- USE_DROPLOSS: True
27
- DROPLOSS_IOU_THRESH: 0.01
28
- RPN:
29
- POST_NMS_TOPK_TRAIN: 4000
30
- NMS_THRESH: 0.65
31
- DATASETS:
32
- TRAIN: ("imagenet_train",)
33
- SOLVER:
34
- IMS_PER_BATCH: 16
35
- BASE_LR: 0.01
36
- WEIGHT_DECAY: 0.00005
37
- STEPS: (80000,)
38
- MAX_ITER: 160000
39
- CLIP_GRADIENTS:
40
- CLIP_TYPE: norm
41
- CLIP_VALUE: 1.0
42
- ENABLED: true
43
- NORM_TYPE: 2.0
44
- INPUT:
45
- MIN_SIZE_TRAIN: (240, 320, 480, 640, 672, 704, 736, 768, 800, 1024)
46
- MAX_SIZE_TRAIN: 1333
47
- MASK_FORMAT: "bitmask"
48
- FORMAT: "RGB"
49
- TEST:
50
- PRECISE_BN:
51
- ENABLED: True
52
- OUTPUT_DIR: "output/"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/modeling/__init__.py DELETED
@@ -1,16 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
-
3
- from .roi_heads import (
4
- ROI_HEADS_REGISTRY,
5
- ROIHeads,
6
- CustomStandardROIHeads,
7
- FastRCNNOutputLayers,
8
- build_roi_heads,
9
- )
10
- from .roi_heads.custom_cascade_rcnn import CustomCascadeROIHeads
11
- from .roi_heads.fast_rcnn import FastRCNNOutputLayers
12
- from .meta_arch.rcnn import GeneralizedRCNN, ProposalNetwork
13
- from .meta_arch.build import build_model
14
-
15
- _EXCLUDE = {"ShapeSpec"}
16
- __all__ = [k for k in globals().keys() if k not in _EXCLUDE and not k.startswith("_")]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/modeling/meta_arch/__init__.py DELETED
@@ -1,7 +0,0 @@
1
- # -*- coding: utf-8 -*-
2
- # Copyright (c) Meta Platforms, Inc. and affiliates.
3
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/modeling/meta_arch/__init__.py
4
-
5
- from .build import META_ARCH_REGISTRY, build_model # isort:skip
6
-
7
- __all__ = list(globals().keys())
 
 
 
 
 
 
 
 
cutler/modeling/meta_arch/build.py DELETED
@@ -1,27 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/modeling/meta_arch/build.py
3
-
4
- import torch
5
-
6
- from detectron2.utils.logger import _log_api_usage
7
- from detectron2.utils.registry import Registry
8
-
9
- META_ARCH_REGISTRY = Registry("META_ARCH") # noqa F401 isort:skip
10
- META_ARCH_REGISTRY.__doc__ = """
11
- Registry for meta-architectures, i.e. the whole model.
12
-
13
- The registered object will be called with `obj(cfg)`
14
- and expected to return a `nn.Module` object.
15
- """
16
-
17
-
18
- def build_model(cfg):
19
- """
20
- Build the whole model architecture, defined by ``cfg.MODEL.META_ARCHITECTURE``.
21
- Note that it does not load any weights from ``cfg``.
22
- """
23
- meta_arch = cfg.MODEL.META_ARCHITECTURE
24
- model = META_ARCH_REGISTRY.get(meta_arch)(cfg)
25
- model.to(torch.device(cfg.MODEL.DEVICE))
26
- _log_api_usage("modeling.meta_arch." + meta_arch)
27
- return model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/modeling/meta_arch/rcnn.py DELETED
@@ -1,344 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/modeling/meta_arch/rcnn.py
3
-
4
- import logging
5
- import numpy as np
6
- from typing import Dict, List, Optional, Tuple
7
- import torch
8
- from torch import nn
9
-
10
- from detectron2.config import configurable
11
- from detectron2.data.detection_utils import convert_image_to_rgb
12
- from detectron2.layers import move_device_like
13
- from detectron2.structures import ImageList, Instances
14
- from detectron2.utils.events import get_event_storage
15
- from detectron2.utils.logger import log_first_n
16
-
17
- from detectron2.modeling.backbone import Backbone, build_backbone
18
- from detectron2.modeling.postprocessing import detector_postprocess
19
- from detectron2.modeling.proposal_generator import build_proposal_generator
20
- from ..roi_heads import build_roi_heads
21
- from .build import META_ARCH_REGISTRY
22
-
23
- __all__ = ["GeneralizedRCNN", "ProposalNetwork"]
24
-
25
-
26
- @META_ARCH_REGISTRY.register()
27
- class GeneralizedRCNN(nn.Module):
28
- """
29
- Generalized R-CNN. Any models that contains the following three components:
30
- 1. Per-image feature extraction (aka backbone)
31
- 2. Region proposal generation
32
- 3. Per-region feature extraction and prediction
33
- """
34
-
35
- @configurable
36
- def __init__(
37
- self,
38
- *,
39
- backbone: Backbone,
40
- proposal_generator: nn.Module,
41
- roi_heads: nn.Module,
42
- pixel_mean: Tuple[float],
43
- pixel_std: Tuple[float],
44
- input_format: Optional[str] = None,
45
- vis_period: int = 0,
46
- ):
47
- """
48
- Args:
49
- backbone: a backbone module, must follow detectron2's backbone interface
50
- proposal_generator: a module that generates proposals using backbone features
51
- roi_heads: a ROI head that performs per-region computation
52
- pixel_mean, pixel_std: list or tuple with #channels element, representing
53
- the per-channel mean and std to be used to normalize the input image
54
- input_format: describe the meaning of channels of input. Needed by visualization
55
- vis_period: the period to run visualization. Set to 0 to disable.
56
- """
57
- super().__init__()
58
- self.backbone = backbone
59
- self.proposal_generator = proposal_generator
60
- self.roi_heads = roi_heads
61
-
62
- self.input_format = input_format
63
- self.vis_period = vis_period
64
- if vis_period > 0:
65
- assert input_format is not None, "input_format is required for visualization!"
66
-
67
- self.register_buffer("pixel_mean", torch.tensor(pixel_mean).view(-1, 1, 1), False)
68
- self.register_buffer("pixel_std", torch.tensor(pixel_std).view(-1, 1, 1), False)
69
- assert (
70
- self.pixel_mean.shape == self.pixel_std.shape
71
- ), f"{self.pixel_mean} and {self.pixel_std} have different shapes!"
72
-
73
- @classmethod
74
- def from_config(cls, cfg):
75
- backbone = build_backbone(cfg)
76
- return {
77
- "backbone": backbone,
78
- "proposal_generator": build_proposal_generator(cfg, backbone.output_shape()),
79
- "roi_heads": build_roi_heads(cfg, backbone.output_shape()),
80
- "input_format": cfg.INPUT.FORMAT,
81
- "vis_period": cfg.VIS_PERIOD,
82
- "pixel_mean": cfg.MODEL.PIXEL_MEAN,
83
- "pixel_std": cfg.MODEL.PIXEL_STD,
84
- }
85
-
86
- @property
87
- def device(self):
88
- return self.pixel_mean.device
89
-
90
- def _move_to_current_device(self, x):
91
- return move_device_like(x, self.pixel_mean)
92
-
93
- def visualize_training(self, batched_inputs, proposals):
94
- """
95
- A function used to visualize images and proposals. It shows ground truth
96
- bounding boxes on the original image and up to 20 top-scoring predicted
97
- object proposals on the original image. Users can implement different
98
- visualization functions for different models.
99
-
100
- Args:
101
- batched_inputs (list): a list that contains input to the model.
102
- proposals (list): a list that contains predicted proposals. Both
103
- batched_inputs and proposals should have the same length.
104
- """
105
- from detectron2.utils.visualizer import Visualizer
106
-
107
- storage = get_event_storage()
108
- max_vis_prop = 20
109
-
110
- for input, prop in zip(batched_inputs, proposals):
111
- img = input["image"]
112
- img = convert_image_to_rgb(img.permute(1, 2, 0), self.input_format)
113
- v_gt = Visualizer(img, None)
114
- v_gt = v_gt.overlay_instances(boxes=input["instances"].gt_boxes)
115
- anno_img = v_gt.get_image()
116
- box_size = min(len(prop.proposal_boxes), max_vis_prop)
117
- v_pred = Visualizer(img, None)
118
- v_pred = v_pred.overlay_instances(
119
- boxes=prop.proposal_boxes[0:box_size].tensor.cpu().numpy()
120
- )
121
- prop_img = v_pred.get_image()
122
- vis_img = np.concatenate((anno_img, prop_img), axis=1)
123
- vis_img = vis_img.transpose(2, 0, 1)
124
- vis_name = "Left: GT bounding boxes; Right: Predicted proposals"
125
- storage.put_image(vis_name, vis_img)
126
- break # only visualize one image in a batch
127
-
128
- def forward(self, batched_inputs: List[Dict[str, torch.Tensor]]):
129
- """
130
- Args:
131
- batched_inputs: a list, batched outputs of :class:`DatasetMapper` .
132
- Each item in the list contains the inputs for one image.
133
- For now, each item in the list is a dict that contains:
134
-
135
- * image: Tensor, image in (C, H, W) format.
136
- * instances (optional): groundtruth :class:`Instances`
137
- * proposals (optional): :class:`Instances`, precomputed proposals.
138
-
139
- Other information that's included in the original dicts, such as:
140
-
141
- * "height", "width" (int): the output resolution of the model, used in inference.
142
- See :meth:`postprocess` for details.
143
-
144
- Returns:
145
- list[dict]:
146
- Each dict is the output for one input image.
147
- The dict contains one key "instances" whose value is a :class:`Instances`.
148
- The :class:`Instances` object has the following keys:
149
- "pred_boxes", "pred_classes", "scores", "pred_masks", "pred_keypoints"
150
- """
151
- if not self.training:
152
- return self.inference(batched_inputs)
153
-
154
- images = self.preprocess_image(batched_inputs)
155
- if "instances" in batched_inputs[0]:
156
- gt_instances = [x["instances"].to(self.device) for x in batched_inputs]
157
- else:
158
- gt_instances = None
159
-
160
- features = self.backbone(images.tensor)
161
-
162
- if self.proposal_generator is not None:
163
- proposals, proposal_losses = self.proposal_generator(images, features, gt_instances)
164
- else:
165
- assert "proposals" in batched_inputs[0]
166
- proposals = [x["proposals"].to(self.device) for x in batched_inputs]
167
- proposal_losses = {}
168
-
169
- _, detector_losses = self.roi_heads(images, features, proposals, gt_instances)
170
- if self.vis_period > 0:
171
- storage = get_event_storage()
172
- if storage.iter % self.vis_period == 0:
173
- self.visualize_training(batched_inputs, proposals)
174
-
175
- losses = {}
176
- losses.update(detector_losses)
177
- losses.update(proposal_losses)
178
- return losses
179
-
180
- def inference(
181
- self,
182
- batched_inputs: List[Dict[str, torch.Tensor]],
183
- detected_instances: Optional[List[Instances]] = None,
184
- do_postprocess: bool = True,
185
- ):
186
- """
187
- Run inference on the given inputs.
188
-
189
- Args:
190
- batched_inputs (list[dict]): same as in :meth:`forward`
191
- detected_instances (None or list[Instances]): if not None, it
192
- contains an `Instances` object per image. The `Instances`
193
- object contains "pred_boxes" and "pred_classes" which are
194
- known boxes in the image.
195
- The inference will then skip the detection of bounding boxes,
196
- and only predict other per-ROI outputs.
197
- do_postprocess (bool): whether to apply post-processing on the outputs.
198
-
199
- Returns:
200
- When do_postprocess=True, same as in :meth:`forward`.
201
- Otherwise, a list[Instances] containing raw network outputs.
202
- """
203
- assert not self.training
204
-
205
- images = self.preprocess_image(batched_inputs)
206
- features = self.backbone(images.tensor)
207
-
208
- if detected_instances is None:
209
- if self.proposal_generator is not None:
210
- proposals, _ = self.proposal_generator(images, features, None)
211
- else:
212
- assert "proposals" in batched_inputs[0]
213
- proposals = [x["proposals"].to(self.device) for x in batched_inputs]
214
-
215
- results, _ = self.roi_heads(images, features, proposals, None)
216
- else:
217
- detected_instances = [x.to(self.device) for x in detected_instances]
218
- results = self.roi_heads.forward_with_given_boxes(features, detected_instances)
219
-
220
- if do_postprocess:
221
- assert not torch.jit.is_scripting(), "Scripting is not supported for postprocess."
222
- return GeneralizedRCNN._postprocess(results, batched_inputs, images.image_sizes)
223
- else:
224
- return results
225
-
226
- def preprocess_image(self, batched_inputs: List[Dict[str, torch.Tensor]]):
227
- """
228
- Normalize, pad and batch the input images.
229
- """
230
- images = [self._move_to_current_device(x["image"]) for x in batched_inputs]
231
- images = [(x - self.pixel_mean) / self.pixel_std for x in images]
232
- images = ImageList.from_tensors(
233
- images,
234
- self.backbone.size_divisibility,
235
- padding_constraints=self.backbone.padding_constraints,
236
- )
237
- return images
238
-
239
- @staticmethod
240
- def _postprocess(instances, batched_inputs: List[Dict[str, torch.Tensor]], image_sizes):
241
- """
242
- Rescale the output instances to the target size.
243
- """
244
- # note: private function; subject to changes
245
- processed_results = []
246
- for results_per_image, input_per_image, image_size in zip(
247
- instances, batched_inputs, image_sizes
248
- ):
249
- height = input_per_image.get("height", image_size[0])
250
- width = input_per_image.get("width", image_size[1])
251
- r = detector_postprocess(results_per_image, height, width)
252
- processed_results.append({"instances": r})
253
- return processed_results
254
-
255
-
256
- @META_ARCH_REGISTRY.register()
257
- class ProposalNetwork(nn.Module):
258
- """
259
- A meta architecture that only predicts object proposals.
260
- """
261
-
262
- @configurable
263
- def __init__(
264
- self,
265
- *,
266
- backbone: Backbone,
267
- proposal_generator: nn.Module,
268
- pixel_mean: Tuple[float],
269
- pixel_std: Tuple[float],
270
- ):
271
- """
272
- Args:
273
- backbone: a backbone module, must follow detectron2's backbone interface
274
- proposal_generator: a module that generates proposals using backbone features
275
- pixel_mean, pixel_std: list or tuple with #channels element, representing
276
- the per-channel mean and std to be used to normalize the input image
277
- """
278
- super().__init__()
279
- self.backbone = backbone
280
- self.proposal_generator = proposal_generator
281
- self.register_buffer("pixel_mean", torch.tensor(pixel_mean).view(-1, 1, 1), False)
282
- self.register_buffer("pixel_std", torch.tensor(pixel_std).view(-1, 1, 1), False)
283
-
284
- @classmethod
285
- def from_config(cls, cfg):
286
- backbone = build_backbone(cfg)
287
- return {
288
- "backbone": backbone,
289
- "proposal_generator": build_proposal_generator(cfg, backbone.output_shape()),
290
- "pixel_mean": cfg.MODEL.PIXEL_MEAN,
291
- "pixel_std": cfg.MODEL.PIXEL_STD,
292
- }
293
-
294
- @property
295
- def device(self):
296
- return self.pixel_mean.device
297
-
298
- def _move_to_current_device(self, x):
299
- return move_device_like(x, self.pixel_mean)
300
-
301
- def forward(self, batched_inputs):
302
- """
303
- Args:
304
- Same as in :class:`GeneralizedRCNN.forward`
305
-
306
- Returns:
307
- list[dict]:
308
- Each dict is the output for one input image.
309
- The dict contains one key "proposals" whose value is a
310
- :class:`Instances` with keys "proposal_boxes" and "objectness_logits".
311
- """
312
- images = [self._move_to_current_device(x["image"]) for x in batched_inputs]
313
- images = [(x - self.pixel_mean) / self.pixel_std for x in images]
314
- images = ImageList.from_tensors(
315
- images,
316
- self.backbone.size_divisibility,
317
- padding_constraints=self.backbone.padding_constraints,
318
- )
319
- features = self.backbone(images.tensor)
320
-
321
- if "instances" in batched_inputs[0]:
322
- gt_instances = [x["instances"].to(self.device) for x in batched_inputs]
323
- elif "targets" in batched_inputs[0]:
324
- log_first_n(
325
- logging.WARN, "'targets' in the model inputs is now renamed to 'instances'!", n=10
326
- )
327
- gt_instances = [x["targets"].to(self.device) for x in batched_inputs]
328
- else:
329
- gt_instances = None
330
- proposals, proposal_losses = self.proposal_generator(images, features, gt_instances)
331
- # In training, the proposals are not useful at all but we generate them anyway.
332
- # This makes RPN-only models about 5% slower.
333
- if self.training:
334
- return proposal_losses
335
-
336
- processed_results = []
337
- for results_per_image, input_per_image, image_size in zip(
338
- proposals, batched_inputs, images.image_sizes
339
- ):
340
- height = input_per_image.get("height", image_size[0])
341
- width = input_per_image.get("width", image_size[1])
342
- r = detector_postprocess(results_per_image, height, width)
343
- processed_results.append({"proposals": r})
344
- return processed_results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/modeling/roi_heads/__init__.py DELETED
@@ -1,16 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
-
3
- from .roi_heads import (
4
- ROI_HEADS_REGISTRY,
5
- ROIHeads,
6
- Res5ROIHeads,
7
- CustomStandardROIHeads,
8
- build_roi_heads,
9
- select_foreground_proposals,
10
- )
11
- from .custom_cascade_rcnn import CustomCascadeROIHeads
12
- from .fast_rcnn import FastRCNNOutputLayers
13
-
14
- from . import custom_cascade_rcnn # isort:skip
15
-
16
- __all__ = list(globals().keys())
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/modeling/roi_heads/custom_cascade_rcnn.py DELETED
@@ -1,338 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/modeling/roi_heads/cascade_rcnn.py
3
-
4
- from typing import List
5
- import torch
6
- from torch import nn
7
- from torch.autograd.function import Function
8
-
9
- from detectron2.config import configurable
10
- from detectron2.layers import ShapeSpec
11
- from detectron2.structures import Boxes, pairwise_iou
12
- from structures import pairwise_iou_max_scores
13
- from detectron2.structures import Instances
14
- from detectron2.utils.events import get_event_storage
15
-
16
- from detectron2.modeling.box_regression import Box2BoxTransform
17
- from detectron2.modeling.matcher import Matcher
18
- from detectron2.modeling.poolers import ROIPooler
19
- from detectron2.modeling.roi_heads.box_head import build_box_head
20
- from .fast_rcnn import FastRCNNOutputLayers, fast_rcnn_inference
21
- from .roi_heads import ROI_HEADS_REGISTRY, CustomStandardROIHeads
22
-
23
- import torch.nn.functional as F
24
-
25
- class _ScaleGradient(Function):
26
- @staticmethod
27
- def forward(ctx, input, scale):
28
- ctx.scale = scale
29
- return input
30
-
31
- @staticmethod
32
- def backward(ctx, grad_output):
33
- return grad_output * ctx.scale, None
34
-
35
-
36
- @ROI_HEADS_REGISTRY.register()
37
- class CustomCascadeROIHeads(CustomStandardROIHeads):
38
- """
39
- The ROI heads that implement :paper:`Cascade R-CNN`.
40
- """
41
-
42
- @configurable
43
- def __init__(
44
- self,
45
- *,
46
- box_in_features: List[str],
47
- box_pooler: ROIPooler,
48
- box_heads: List[nn.Module],
49
- box_predictors: List[nn.Module],
50
- proposal_matchers: List[Matcher],
51
- **kwargs,
52
- ):
53
- """
54
- NOTE: this interface is experimental.
55
-
56
- Args:
57
- box_pooler (ROIPooler): pooler that extracts region features from given boxes
58
- box_heads (list[nn.Module]): box head for each cascade stage
59
- box_predictors (list[nn.Module]): box predictor for each cascade stage
60
- proposal_matchers (list[Matcher]): matcher with different IoU thresholds to
61
- match boxes with ground truth for each stage. The first matcher matches
62
- RPN proposals with ground truth, the other matchers use boxes predicted
63
- by the previous stage as proposals and match them with ground truth.
64
- """
65
- assert "proposal_matcher" not in kwargs, (
66
- "CustomCascadeROIHeads takes 'proposal_matchers=' for each stage instead "
67
- "of one 'proposal_matcher='."
68
- )
69
- # The first matcher matches RPN proposals with ground truth, done in the base class
70
- kwargs["proposal_matcher"] = proposal_matchers[0]
71
- num_stages = self.num_cascade_stages = len(box_heads)
72
- box_heads = nn.ModuleList(box_heads)
73
- box_predictors = nn.ModuleList(box_predictors)
74
- assert len(box_predictors) == num_stages, f"{len(box_predictors)} != {num_stages}!"
75
- assert len(proposal_matchers) == num_stages, f"{len(proposal_matchers)} != {num_stages}!"
76
- super().__init__(
77
- box_in_features=box_in_features,
78
- box_pooler=box_pooler,
79
- box_head=box_heads,
80
- box_predictor=box_predictors,
81
- **kwargs,
82
- )
83
- self.proposal_matchers = proposal_matchers
84
-
85
- @classmethod
86
- def from_config(cls, cfg, input_shape):
87
- ret = super().from_config(cfg, input_shape)
88
- ret.pop("proposal_matcher")
89
- return ret
90
-
91
- @classmethod
92
- def _init_box_head(cls, cfg, input_shape):
93
- # fmt: off
94
- in_features = cfg.MODEL.ROI_HEADS.IN_FEATURES
95
- pooler_resolution = cfg.MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION
96
- pooler_scales = tuple(1.0 / input_shape[k].stride for k in in_features)
97
- sampling_ratio = cfg.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO
98
- pooler_type = cfg.MODEL.ROI_BOX_HEAD.POOLER_TYPE
99
- cascade_bbox_reg_weights = cfg.MODEL.ROI_BOX_CASCADE_HEAD.BBOX_REG_WEIGHTS
100
- cascade_ious = cfg.MODEL.ROI_BOX_CASCADE_HEAD.IOUS
101
- assert len(cascade_bbox_reg_weights) == len(cascade_ious)
102
- assert cfg.MODEL.ROI_BOX_HEAD.CLS_AGNOSTIC_BBOX_REG, \
103
- "CustomCascadeROIHeads only support class-agnostic regression now!"
104
- assert cascade_ious[0] == cfg.MODEL.ROI_HEADS.IOU_THRESHOLDS[0]
105
- # fmt: on
106
-
107
- in_channels = [input_shape[f].channels for f in in_features]
108
- # Check all channel counts are equal
109
- assert len(set(in_channels)) == 1, in_channels
110
- in_channels = in_channels[0]
111
-
112
- box_pooler = ROIPooler(
113
- output_size=pooler_resolution,
114
- scales=pooler_scales,
115
- sampling_ratio=sampling_ratio,
116
- pooler_type=pooler_type,
117
- )
118
- pooled_shape = ShapeSpec(
119
- channels=in_channels, width=pooler_resolution, height=pooler_resolution
120
- )
121
-
122
- box_heads, box_predictors, proposal_matchers = [], [], []
123
- for match_iou, bbox_reg_weights in zip(cascade_ious, cascade_bbox_reg_weights):
124
- box_head = build_box_head(cfg, pooled_shape)
125
- box_heads.append(box_head)
126
- box_predictors.append(
127
- FastRCNNOutputLayers(
128
- cfg,
129
- box_head.output_shape,
130
- box2box_transform=Box2BoxTransform(weights=bbox_reg_weights),
131
- )
132
- )
133
- proposal_matchers.append(Matcher([match_iou], [0, 1], allow_low_quality_matches=False))
134
- return {
135
- "box_in_features": in_features,
136
- "box_pooler": box_pooler,
137
- "box_heads": box_heads,
138
- "box_predictors": box_predictors,
139
- "proposal_matchers": proposal_matchers,
140
- }
141
-
142
- def forward(self, images, features, proposals, targets=None):
143
- del images
144
- if self.training:
145
- proposals = self.label_and_sample_proposals(proposals, targets)
146
-
147
- if self.training:
148
- # Need targets to box head
149
- losses = self._forward_box(features, proposals, targets)
150
- losses.update(self._forward_mask(features, proposals))
151
- losses.update(self._forward_keypoint(features, proposals))
152
- return proposals, losses
153
- else:
154
- pred_instances = self._forward_box(features, proposals)
155
- pred_instances = self.forward_with_given_boxes(features, pred_instances)
156
- return pred_instances, {}
157
-
158
- def _forward_box(self, features, proposals, targets=None):
159
- """
160
- Args:
161
- features, targets: the same as in
162
- Same as in :meth:`ROIHeads.forward`.
163
- proposals (list[Instances]): the per-image object proposals with
164
- their matching ground truth.
165
- Each has fields "proposal_boxes", and "objectness_logits",
166
- "gt_classes", "gt_boxes".
167
- """
168
- features = [features[f] for f in self.box_in_features]
169
- head_outputs = [] # (predictor, predictions, proposals)
170
- prev_pred_boxes = None
171
- image_sizes = [x.image_size for x in proposals]
172
- for k in range(self.num_cascade_stages):
173
- if k > 0:
174
- # The output boxes of the previous stage are used to create the input
175
- # proposals of the next stage.
176
- proposals = self._create_proposals_from_boxes(prev_pred_boxes, image_sizes)
177
- if self.training:
178
- proposals = self._match_and_label_boxes(proposals, k, targets)
179
- predictions = self._run_stage(features, proposals, k)
180
- prev_pred_boxes = self.box_predictor[k].predict_boxes(predictions, proposals)
181
- head_outputs.append((self.box_predictor[k], predictions, proposals))
182
-
183
- no_gt_found = False
184
- if self.training:
185
- losses = {}
186
- storage = get_event_storage()
187
- for stage, (predictor, predictions, proposals) in enumerate(head_outputs):
188
- no_gt_found = False
189
- with storage.name_scope("stage{}".format(stage)):
190
- if self.use_droploss:
191
- try:
192
- box_num_list = [len(x.gt_boxes) for x in proposals]
193
- gt_num_list = [torch.unique(x.gt_boxes.tensor[:100], dim=0).size()[0] for x in proposals]
194
- except:
195
- box_num_list = [0 for x in proposals]
196
- gt_num_list = [0 for x in proposals]
197
- no_gt_found = True
198
-
199
- if not no_gt_found:
200
- # NOTE: confidence score
201
- prediction_score, predictions_delta = predictions[0], predictions[1]
202
- prediction_score = F.softmax(prediction_score, dim=1)[:,0]
203
-
204
- # NOTE: maximum overlapping with GT (IoU)
205
- proposal_boxes = Boxes.cat([x.proposal_boxes for x in proposals])
206
- predictions_bbox = predictor.box2box_transform.apply_deltas(predictions_delta, proposal_boxes.tensor)
207
- idx_start = 0
208
- iou_max_list = []
209
- for idx, x in enumerate(proposals):
210
- idx_end = idx_start + box_num_list[idx]
211
- iou_max_list.append(pairwise_iou_max_scores(predictions_bbox[idx_start:idx_end], x.gt_boxes[:gt_num_list[idx]].tensor))
212
- idx_start = idx_end
213
- iou_max = torch.cat(iou_max_list, dim=0)
214
-
215
- # NOTE: get the weight of each proposal
216
- weights = iou_max.le(self.droploss_iou_thresh).float()
217
- weights = 1 - weights.ge(1.0).float()
218
- stage_losses = predictor.losses(predictions, proposals, weights=weights.detach())
219
- else:
220
- stage_losses = predictor.losses(predictions, proposals)
221
- else:
222
- stage_losses = predictor.losses(predictions, proposals)
223
- losses.update({k + "_stage{}".format(stage): v for k, v in stage_losses.items()})
224
- return losses
225
- else:
226
- # Each is a list[Tensor] of length #image. Each tensor is Ri x (K+1)
227
- scores_per_stage = [h[0].predict_probs(h[1], h[2]) for h in head_outputs]
228
-
229
- # Average the scores across heads
230
- scores = [
231
- sum(list(scores_per_image)) * (1.0 / self.num_cascade_stages)
232
- for scores_per_image in zip(*scores_per_stage)
233
- ]
234
- # Use the boxes of the last head
235
- predictor, predictions, proposals = head_outputs[-1]
236
- boxes = predictor.predict_boxes(predictions, proposals)
237
- pred_instances, _ = fast_rcnn_inference(
238
- boxes,
239
- scores,
240
- image_sizes,
241
- predictor.test_score_thresh,
242
- predictor.test_nms_thresh,
243
- predictor.test_topk_per_image,
244
- )
245
- return pred_instances
246
-
247
- @torch.no_grad()
248
- def _match_and_label_boxes(self, proposals, stage, targets):
249
- """
250
- Match proposals with groundtruth using the matcher at the given stage.
251
- Label the proposals as foreground or background based on the match.
252
-
253
- Args:
254
- proposals (list[Instances]): One Instances for each image, with
255
- the field "proposal_boxes".
256
- stage (int): the current stage
257
- targets (list[Instances]): the ground truth instances
258
-
259
- Returns:
260
- list[Instances]: the same proposals, but with fields "gt_classes" and "gt_boxes"
261
- """
262
- num_fg_samples, num_bg_samples = [], []
263
- for proposals_per_image, targets_per_image in zip(proposals, targets):
264
- match_quality_matrix = pairwise_iou(
265
- targets_per_image.gt_boxes, proposals_per_image.proposal_boxes
266
- )
267
- # proposal_labels are 0 or 1
268
- matched_idxs, proposal_labels = self.proposal_matchers[stage](match_quality_matrix)
269
- if len(targets_per_image) > 0:
270
- gt_classes = targets_per_image.gt_classes[matched_idxs]
271
- # Label unmatched proposals (0 label from matcher) as background (label=num_classes)
272
- gt_classes[proposal_labels == 0] = self.num_classes
273
- gt_boxes = targets_per_image.gt_boxes[matched_idxs]
274
- else:
275
- gt_classes = torch.zeros_like(matched_idxs) + self.num_classes
276
- gt_boxes = Boxes(
277
- targets_per_image.gt_boxes.tensor.new_zeros((len(proposals_per_image), 4))
278
- )
279
- proposals_per_image.gt_classes = gt_classes
280
- proposals_per_image.gt_boxes = gt_boxes
281
-
282
- num_fg_samples.append((proposal_labels == 1).sum().item())
283
- num_bg_samples.append(proposal_labels.numel() - num_fg_samples[-1])
284
-
285
- # Log the number of fg/bg samples in each stage
286
- storage = get_event_storage()
287
- storage.put_scalar(
288
- "stage{}/roi_head/num_fg_samples".format(stage),
289
- sum(num_fg_samples) / len(num_fg_samples),
290
- )
291
- storage.put_scalar(
292
- "stage{}/roi_head/num_bg_samples".format(stage),
293
- sum(num_bg_samples) / len(num_bg_samples),
294
- )
295
- return proposals
296
-
297
- def _run_stage(self, features, proposals, stage):
298
- """
299
- Args:
300
- features (list[Tensor]): #lvl input features to ROIHeads
301
- proposals (list[Instances]): #image Instances, with the field "proposal_boxes"
302
- stage (int): the current stage
303
-
304
- Returns:
305
- Same output as `FastRCNNOutputLayers.forward()`.
306
- """
307
- box_features = self.box_pooler(features, [x.proposal_boxes for x in proposals])
308
- # The original implementation averages the losses among heads,
309
- # but scale up the parameter gradients of the heads.
310
- # This is equivalent to adding the losses among heads,
311
- # but scale down the gradients on features.
312
- if self.training:
313
- box_features = _ScaleGradient.apply(box_features, 1.0 / self.num_cascade_stages)
314
- box_features = self.box_head[stage](box_features)
315
- return self.box_predictor[stage](box_features)
316
-
317
- def _create_proposals_from_boxes(self, boxes, image_sizes):
318
- """
319
- Args:
320
- boxes (list[Tensor]): per-image predicted boxes, each of shape Ri x 4
321
- image_sizes (list[tuple]): list of image shapes in (h, w)
322
-
323
- Returns:
324
- list[Instances]: per-image proposals with the given boxes.
325
- """
326
- # Just like RPN, the proposals should not have gradients
327
- boxes = [Boxes(b.detach()) for b in boxes]
328
- proposals = []
329
- for boxes_per_image, image_size in zip(boxes, image_sizes):
330
- boxes_per_image.clip(image_size)
331
- if self.training:
332
- # do not filter empty boxes at inference time,
333
- # because the scores from each stage need to be aligned and added later
334
- boxes_per_image = boxes_per_image[boxes_per_image.nonempty()]
335
- prop = Instances(image_size)
336
- prop.proposal_boxes = boxes_per_image
337
- proposals.append(prop)
338
- return proposals
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/modeling/roi_heads/fast_rcnn.py DELETED
@@ -1,587 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/modeling/roi_heads/fast_rcnn.py
3
-
4
- import logging
5
- from typing import Callable, Dict, List, Optional, Tuple, Union
6
- import torch
7
- from torch import nn
8
- from torch.nn import functional as F
9
-
10
- from detectron2.config import configurable
11
- from detectron2.data.detection_utils import get_fed_loss_cls_weights
12
- from detectron2.layers import ShapeSpec, batched_nms, cat, cross_entropy, nonzero_tuple
13
- from detectron2.modeling.box_regression import Box2BoxTransform, _dense_box_regression_loss
14
- from detectron2.structures import Instances, Boxes
15
- from detectron2.utils.events import get_event_storage
16
- from torch.nn import Parameter
17
- import torch.nn.functional as F
18
-
19
- __all__ = ["fast_rcnn_inference", "FastRCNNOutputLayers"]
20
-
21
-
22
- logger = logging.getLogger(__name__)
23
-
24
- """
25
- Shape shorthand in this module:
26
-
27
- N: number of images in the minibatch
28
- R: number of ROIs, combined over all images, in the minibatch
29
- Ri: number of ROIs in image i
30
- K: number of foreground classes. E.g.,there are 80 foreground classes in COCO.
31
-
32
- Naming convention:
33
-
34
- deltas: refers to the 4-d (dx, dy, dw, dh) deltas that parameterize the box2box
35
- transform (see :class:`box_regression.Box2BoxTransform`).
36
-
37
- pred_class_logits: predicted class scores in [-inf, +inf]; use
38
- softmax(pred_class_logits) to estimate P(class).
39
-
40
- gt_classes: ground-truth classification labels in [0, K], where [0, K) represent
41
- foreground object classes and K represents the background class.
42
-
43
- pred_proposal_deltas: predicted box2box transform deltas for transforming proposals
44
- to detection box predictions.
45
-
46
- gt_proposal_deltas: ground-truth box2box transform deltas
47
- """
48
-
49
-
50
- def fast_rcnn_inference(
51
- boxes: List[torch.Tensor],
52
- scores: List[torch.Tensor],
53
- image_shapes: List[Tuple[int, int]],
54
- score_thresh: float,
55
- nms_thresh: float,
56
- topk_per_image: int,
57
- ):
58
- """
59
- Call `fast_rcnn_inference_single_image` for all images.
60
-
61
- Args:
62
- boxes (list[Tensor]): A list of Tensors of predicted class-specific or class-agnostic
63
- boxes for each image. Element i has shape (Ri, K * 4) if doing
64
- class-specific regression, or (Ri, 4) if doing class-agnostic
65
- regression, where Ri is the number of predicted objects for image i.
66
- This is compatible with the output of :meth:`FastRCNNOutputLayers.predict_boxes`.
67
- scores (list[Tensor]): A list of Tensors of predicted class scores for each image.
68
- Element i has shape (Ri, K + 1), where Ri is the number of predicted objects
69
- for image i. Compatible with the output of :meth:`FastRCNNOutputLayers.predict_probs`.
70
- image_shapes (list[tuple]): A list of (width, height) tuples for each image in the batch.
71
- score_thresh (float): Only return detections with a confidence score exceeding this
72
- threshold.
73
- nms_thresh (float): The threshold to use for box non-maximum suppression. Value in [0, 1].
74
- topk_per_image (int): The number of top scoring detections to return. Set < 0 to return
75
- all detections.
76
-
77
- Returns:
78
- instances: (list[Instances]): A list of N instances, one for each image in the batch,
79
- that stores the topk most confidence detections.
80
- kept_indices: (list[Tensor]): A list of 1D tensor of length of N, each element indicates
81
- the corresponding boxes/scores index in [0, Ri) from the input, for image i.
82
- """
83
- result_per_image = [
84
- fast_rcnn_inference_single_image(
85
- boxes_per_image, scores_per_image, image_shape, score_thresh, nms_thresh, topk_per_image
86
- )
87
- for scores_per_image, boxes_per_image, image_shape in zip(scores, boxes, image_shapes)
88
- ]
89
- return [x[0] for x in result_per_image], [x[1] for x in result_per_image]
90
-
91
-
92
- def _log_classification_stats(pred_logits, gt_classes, prefix="fast_rcnn"):
93
- """
94
- Log the classification metrics to EventStorage.
95
-
96
- Args:
97
- pred_logits: Rx(K+1) logits. The last column is for background class.
98
- gt_classes: R labels
99
- """
100
- num_instances = gt_classes.numel()
101
- if num_instances == 0:
102
- return
103
- pred_classes = pred_logits.argmax(dim=1)
104
- bg_class_ind = pred_logits.shape[1] - 1
105
-
106
- fg_inds = (gt_classes >= 0) & (gt_classes < bg_class_ind)
107
- num_fg = fg_inds.nonzero().numel()
108
- fg_gt_classes = gt_classes[fg_inds]
109
- fg_pred_classes = pred_classes[fg_inds]
110
-
111
- num_false_negative = (fg_pred_classes == bg_class_ind).nonzero().numel()
112
- num_accurate = (pred_classes == gt_classes).nonzero().numel()
113
- fg_num_accurate = (fg_pred_classes == fg_gt_classes).nonzero().numel()
114
-
115
- storage = get_event_storage()
116
- storage.put_scalar(f"{prefix}/cls_accuracy", num_accurate / num_instances)
117
- if num_fg > 0:
118
- storage.put_scalar(f"{prefix}/fg_cls_accuracy", fg_num_accurate / num_fg)
119
- storage.put_scalar(f"{prefix}/false_negative", num_false_negative / num_fg)
120
-
121
-
122
- def fast_rcnn_inference_single_image(
123
- boxes,
124
- scores,
125
- image_shape: Tuple[int, int],
126
- score_thresh: float,
127
- nms_thresh: float,
128
- topk_per_image: int,
129
- ):
130
- """
131
- Single-image inference. Return bounding-box detection results by thresholding
132
- on scores and applying non-maximum suppression (NMS).
133
-
134
- Args:
135
- Same as `fast_rcnn_inference`, but with boxes, scores, and image shapes
136
- per image.
137
-
138
- Returns:
139
- Same as `fast_rcnn_inference`, but for only one image.
140
- """
141
- valid_mask = torch.isfinite(boxes).all(dim=1) & torch.isfinite(scores).all(dim=1)
142
- if not valid_mask.all():
143
- boxes = boxes[valid_mask]
144
- scores = scores[valid_mask]
145
-
146
- scores = scores[:, :-1]
147
- num_bbox_reg_classes = boxes.shape[1] // 4
148
- # Convert to Boxes to use the `clip` function ...
149
- boxes = Boxes(boxes.reshape(-1, 4))
150
- boxes.clip(image_shape)
151
- boxes = boxes.tensor.view(-1, num_bbox_reg_classes, 4) # R x C x 4
152
-
153
- # 1. Filter results based on detection scores. It can make NMS more efficient
154
- # by filtering out low-confidence detections.
155
- filter_mask = scores > score_thresh # R x K
156
- # R' x 2. First column contains indices of the R predictions;
157
- # Second column contains indices of classes.
158
- filter_inds = filter_mask.nonzero()
159
- if num_bbox_reg_classes == 1:
160
- boxes = boxes[filter_inds[:, 0], 0]
161
- else:
162
- boxes = boxes[filter_mask]
163
- scores = scores[filter_mask]
164
-
165
- # 2. Apply NMS for each class independently.
166
- keep = batched_nms(boxes, scores, filter_inds[:, 1], nms_thresh)
167
- if topk_per_image >= 0:
168
- keep = keep[:topk_per_image]
169
- boxes, scores, filter_inds = boxes[keep], scores[keep], filter_inds[keep]
170
-
171
- result = Instances(image_shape)
172
- result.pred_boxes = Boxes(boxes)
173
- result.scores = scores
174
- result.pred_classes = filter_inds[:, 1]
175
- return result, filter_inds[:, 0]
176
-
177
- class NormedLinear(nn.Module):
178
- def __init__(self, in_features, out_features):
179
- super(NormedLinear, self).__init__()
180
- self.weight = Parameter(torch.Tensor(in_features, out_features))
181
- self.weight.data.uniform_(-1, 1).renorm_(2, 1, 1e-5).mul_(1e5)
182
-
183
- def forward(self, x):
184
- out = F.normalize(x, dim=1).mm(F.normalize(self.weight, dim=0))
185
- return out
186
-
187
- class FastRCNNOutputLayers(nn.Module):
188
- """
189
- Two linear layers for predicting Fast R-CNN outputs:
190
-
191
- 1. proposal-to-detection box regression deltas
192
- 2. classification scores
193
- """
194
-
195
- @configurable
196
- def __init__(
197
- self,
198
- input_shape: ShapeSpec,
199
- *,
200
- box2box_transform,
201
- num_classes: int,
202
- test_score_thresh: float = 0.0,
203
- test_nms_thresh: float = 0.5,
204
- test_topk_per_image: int = 100,
205
- cls_agnostic_bbox_reg: bool = False,
206
- smooth_l1_beta: float = 0.0,
207
- box_reg_loss_type: str = "smooth_l1",
208
- loss_weight: Union[float, Dict[str, float]] = 1.0,
209
- use_fed_loss: bool = False,
210
- use_sigmoid_ce: bool = False,
211
- get_fed_loss_cls_weights: Optional[Callable] = None,
212
- fed_loss_num_classes: int = 50,
213
- ):
214
- """
215
- NOTE: this interface is experimental.
216
-
217
- Args:
218
- input_shape (ShapeSpec): shape of the input feature to this module
219
- box2box_transform (Box2BoxTransform or Box2BoxTransformRotated):
220
- num_classes (int): number of foreground classes
221
- test_score_thresh (float): threshold to filter predictions results.
222
- test_nms_thresh (float): NMS threshold for prediction results.
223
- test_topk_per_image (int): number of top predictions to produce per image.
224
- cls_agnostic_bbox_reg (bool): whether to use class agnostic for bbox regression
225
- smooth_l1_beta (float): transition point from L1 to L2 loss. Only used if
226
- `box_reg_loss_type` is "smooth_l1"
227
- box_reg_loss_type (str): Box regression loss type. One of: "smooth_l1", "giou",
228
- "diou", "ciou"
229
- loss_weight (float|dict): weights to use for losses. Can be single float for weighting
230
- all losses, or a dict of individual weightings. Valid dict keys are:
231
- * "loss_cls": applied to classification loss
232
- * "loss_box_reg": applied to box regression loss
233
- use_fed_loss (bool): whether to use federated loss which samples additional negative
234
- classes to calculate the loss
235
- use_sigmoid_ce (bool): whether to calculate the loss using weighted average of binary
236
- cross entropy with logits. This could be used together with federated loss
237
- get_fed_loss_cls_weights (Callable): a callable which takes dataset name and frequency
238
- weight power, and returns the probabilities to sample negative classes for
239
- federated loss. The implementation can be found in
240
- detectron2/data/detection_utils.py
241
- fed_loss_num_classes (int): number of federated classes to keep in total
242
- """
243
- super().__init__()
244
- if isinstance(input_shape, int): # some backward compatibility
245
- input_shape = ShapeSpec(channels=input_shape)
246
- self.num_classes = num_classes
247
- input_size = input_shape.channels * (input_shape.width or 1) * (input_shape.height or 1)
248
- # prediction layer for num_classes foreground classes and one background class (hence + 1)
249
- self.cls_score = nn.Linear(input_size, num_classes + 1)
250
- nn.init.normal_(self.cls_score.weight, std=0.01)
251
- num_bbox_reg_classes = 1 if cls_agnostic_bbox_reg else num_classes
252
- box_dim = len(box2box_transform.weights)
253
- self.bbox_pred = nn.Linear(input_size, num_bbox_reg_classes * box_dim)
254
-
255
- nn.init.normal_(self.bbox_pred.weight, std=0.001)
256
- for l in [self.cls_score, self.bbox_pred]:
257
- nn.init.constant_(l.bias, 0)
258
-
259
- self.box2box_transform = box2box_transform
260
- self.smooth_l1_beta = smooth_l1_beta
261
- self.test_score_thresh = test_score_thresh
262
- self.test_nms_thresh = test_nms_thresh
263
- self.test_topk_per_image = test_topk_per_image
264
- self.box_reg_loss_type = box_reg_loss_type
265
- if isinstance(loss_weight, float):
266
- loss_weight = {"loss_cls": loss_weight, "loss_box_reg": loss_weight}
267
- self.loss_weight = loss_weight
268
- self.use_fed_loss = use_fed_loss
269
- self.use_sigmoid_ce = use_sigmoid_ce
270
- self.fed_loss_num_classes = fed_loss_num_classes
271
-
272
- if self.use_fed_loss:
273
- assert self.use_sigmoid_ce, "Please use sigmoid cross entropy loss with federated loss"
274
- fed_loss_cls_weights = get_fed_loss_cls_weights()
275
- assert (
276
- len(fed_loss_cls_weights) == self.num_classes
277
- ), "Please check the provided fed_loss_cls_weights. Their size should match num_classes"
278
- self.register_buffer("fed_loss_cls_weights", fed_loss_cls_weights)
279
-
280
-
281
- @classmethod
282
- def from_config(cls, cfg, input_shape):
283
- return {
284
- "input_shape": input_shape,
285
- "box2box_transform": Box2BoxTransform(weights=cfg.MODEL.ROI_BOX_HEAD.BBOX_REG_WEIGHTS),
286
- # fmt: off
287
- "num_classes" : cfg.MODEL.ROI_HEADS.NUM_CLASSES,
288
- "cls_agnostic_bbox_reg" : cfg.MODEL.ROI_BOX_HEAD.CLS_AGNOSTIC_BBOX_REG,
289
- "smooth_l1_beta" : cfg.MODEL.ROI_BOX_HEAD.SMOOTH_L1_BETA,
290
- "test_score_thresh" : cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST,
291
- "test_nms_thresh" : cfg.MODEL.ROI_HEADS.NMS_THRESH_TEST,
292
- "test_topk_per_image" : cfg.TEST.DETECTIONS_PER_IMAGE,
293
- "box_reg_loss_type" : cfg.MODEL.ROI_BOX_HEAD.BBOX_REG_LOSS_TYPE,
294
- "loss_weight" : {"loss_box_reg": cfg.MODEL.ROI_BOX_HEAD.BBOX_REG_LOSS_WEIGHT}, # noqa
295
- "use_fed_loss" : cfg.MODEL.ROI_BOX_HEAD.USE_FED_LOSS,
296
- "use_sigmoid_ce" : cfg.MODEL.ROI_BOX_HEAD.USE_SIGMOID_CE,
297
- "get_fed_loss_cls_weights" : lambda: get_fed_loss_cls_weights(dataset_names=cfg.DATASETS.TRAIN, freq_weight_power=cfg.MODEL.ROI_BOX_HEAD.FED_LOSS_FREQ_WEIGHT_POWER), # noqa
298
- "fed_loss_num_classes" : cfg.MODEL.ROI_BOX_HEAD.FED_LOSS_NUM_CLASSES,
299
- # fmt: on
300
- }
301
-
302
- def forward(self, x):
303
- """
304
- Args:
305
- x: per-region features of shape (N, ...) for N bounding boxes to predict.
306
-
307
- Returns:
308
- (Tensor, Tensor):
309
- First tensor: shape (N,K+1), scores for each of the N box. Each row contains the
310
- scores for K object categories and 1 background class.
311
-
312
- Second tensor: bounding box regression deltas for each box. Shape is shape (N,Kx4),
313
- or (N,4) for class-agnostic regression.
314
- """
315
- if x.dim() > 2:
316
- x = torch.flatten(x, start_dim=1)
317
- scores = self.cls_score(x)
318
- proposal_deltas = self.bbox_pred(x)
319
- return scores, proposal_deltas
320
-
321
- def losses(self, predictions, proposals, weights=None):
322
- """
323
- Args:
324
- predictions: return values of :meth:`forward()`.
325
- proposals (list[Instances]): proposals that match the features that were used
326
- to compute predictions. The fields ``proposal_boxes``, ``gt_boxes``,
327
- ``gt_classes`` are expected.
328
- weights: weights for reweighting the loss of each instance based on IoU
329
-
330
- Returns:
331
- Dict[str, Tensor]: dict of losses
332
- """
333
- scores, proposal_deltas = predictions
334
-
335
- # parse classification outputs
336
- gt_classes = (
337
- cat([p.gt_classes for p in proposals], dim=0) if len(proposals) else torch.empty(0)
338
- )
339
- _log_classification_stats(scores, gt_classes)
340
-
341
- # parse box regression outputs
342
- if len(proposals):
343
- proposal_boxes = cat([p.proposal_boxes.tensor for p in proposals], dim=0) # Nx4
344
- assert not proposal_boxes.requires_grad, "Proposals should not require gradients!"
345
- # If "gt_boxes" does not exist, the proposals must be all negative and
346
- # should not be included in regression loss computation.
347
- # Here we just use proposal_boxes as an arbitrary placeholder because its
348
- # value won't be used in self.box_reg_loss().
349
- gt_boxes = cat(
350
- [(p.gt_boxes if p.has("gt_boxes") else p.proposal_boxes).tensor for p in proposals],
351
- dim=0,
352
- )
353
- else:
354
- proposal_boxes = gt_boxes = torch.empty((0, 4), device=proposal_deltas.device)
355
-
356
- if self.use_sigmoid_ce:
357
- loss_cls = self.sigmoid_cross_entropy_loss(scores, gt_classes)
358
- else:
359
- if weights != None:
360
- loss_cls = (weights * cross_entropy(scores, gt_classes, reduction='none')).mean()
361
- else:
362
- loss_cls = cross_entropy(scores, gt_classes, reduction="mean")
363
-
364
- losses = {
365
- "loss_cls": loss_cls,
366
- "loss_box_reg": self.box_reg_loss(
367
- proposal_boxes, gt_boxes, proposal_deltas, gt_classes
368
- ),
369
- }
370
- return {k: v * self.loss_weight.get(k, 1.0) for k, v in losses.items()}
371
-
372
- # Implementation from https://github.com/xingyizhou/CenterNet2/blob/master/projects/CenterNet2/centernet/modeling/roi_heads/fed_loss.py # noqa
373
- # with slight modifications
374
- def get_fed_loss_classes(self, gt_classes, num_fed_loss_classes, num_classes, weight):
375
- """
376
- Args:
377
- gt_classes: a long tensor of shape R that contains the gt class label of each proposal.
378
- num_fed_loss_classes: minimum number of classes to keep when calculating federated loss.
379
- Will sample negative classes if number of unique gt_classes is smaller than this value.
380
- num_classes: number of foreground classes
381
- weight: probabilities used to sample negative classes
382
-
383
- Returns:
384
- Tensor:
385
- classes to keep when calculating the federated loss, including both unique gt
386
- classes and sampled negative classes.
387
- """
388
- unique_gt_classes = torch.unique(gt_classes)
389
- prob = unique_gt_classes.new_ones(num_classes + 1).float()
390
- prob[-1] = 0
391
- if len(unique_gt_classes) < num_fed_loss_classes:
392
- prob[:num_classes] = weight.float().clone()
393
- prob[unique_gt_classes] = 0
394
- sampled_negative_classes = torch.multinomial(
395
- prob, num_fed_loss_classes - len(unique_gt_classes), replacement=False
396
- )
397
- fed_loss_classes = torch.cat([unique_gt_classes, sampled_negative_classes])
398
- else:
399
- fed_loss_classes = unique_gt_classes
400
- return fed_loss_classes
401
-
402
- # Implementation from https://github.com/xingyizhou/CenterNet2/blob/master/projects/CenterNet2/centernet/modeling/roi_heads/custom_fast_rcnn.py#L113 # noqa
403
- # with slight modifications
404
- def sigmoid_cross_entropy_loss(self, pred_class_logits, gt_classes):
405
- """
406
- Args:
407
- pred_class_logits: shape (N, K+1), scores for each of the N box. Each row contains the
408
- scores for K object categories and 1 background class
409
- gt_classes: a long tensor of shape R that contains the gt class label of each proposal.
410
- """
411
- if pred_class_logits.numel() == 0:
412
- return pred_class_logits.new_zeros([1])[0]
413
-
414
- N = pred_class_logits.shape[0]
415
- K = pred_class_logits.shape[1] - 1
416
-
417
- target = pred_class_logits.new_zeros(N, K + 1)
418
- target[range(len(gt_classes)), gt_classes] = 1
419
- target = target[:, :K]
420
-
421
- cls_loss = F.binary_cross_entropy_with_logits(
422
- pred_class_logits[:, :-1], target, reduction="none"
423
- )
424
-
425
- if self.use_fed_loss:
426
- fed_loss_classes = self.get_fed_loss_classes(
427
- gt_classes,
428
- num_fed_loss_classes=self.fed_loss_num_classes,
429
- num_classes=K,
430
- weight=self.fed_loss_cls_weights,
431
- )
432
- fed_loss_classes_mask = fed_loss_classes.new_zeros(K + 1)
433
- fed_loss_classes_mask[fed_loss_classes] = 1
434
- fed_loss_classes_mask = fed_loss_classes_mask[:K]
435
- weight = fed_loss_classes_mask.view(1, K).expand(N, K).float()
436
- else:
437
- weight = 1
438
-
439
- loss = torch.sum(cls_loss * weight) / N
440
- return loss
441
-
442
- def box_reg_loss(self, proposal_boxes, gt_boxes, pred_deltas, gt_classes):
443
- """
444
- Args:
445
- proposal_boxes/gt_boxes are tensors with the same shape (R, 4 or 5).
446
- pred_deltas has shape (R, 4 or 5), or (R, num_classes * (4 or 5)).
447
- gt_classes is a long tensor of shape R, the gt class label of each proposal.
448
- R shall be the number of proposals.
449
- """
450
- box_dim = proposal_boxes.shape[1] # 4 or 5
451
- # Regression loss is only computed for foreground proposals (those matched to a GT)
452
- fg_inds = nonzero_tuple((gt_classes >= 0) & (gt_classes < self.num_classes))[0]
453
- if pred_deltas.shape[1] == box_dim: # cls-agnostic regression
454
- fg_pred_deltas = pred_deltas[fg_inds]
455
- else:
456
- fg_pred_deltas = pred_deltas.view(-1, self.num_classes, box_dim)[
457
- fg_inds, gt_classes[fg_inds]
458
- ]
459
-
460
- loss_box_reg = _dense_box_regression_loss(
461
- [proposal_boxes[fg_inds]],
462
- self.box2box_transform,
463
- [fg_pred_deltas.unsqueeze(0)],
464
- [gt_boxes[fg_inds]],
465
- ...,
466
- self.box_reg_loss_type,
467
- self.smooth_l1_beta,
468
- )
469
-
470
- # The reg loss is normalized using the total number of regions (R), not the number
471
- # of foreground regions even though the box regression loss is only defined on
472
- # foreground regions. Why? Because doing so gives equal training influence to
473
- # each foreground example. To see how, consider two different minibatches:
474
- # (1) Contains a single foreground region
475
- # (2) Contains 100 foreground regions
476
- # If we normalize by the number of foreground regions, the single example in
477
- # minibatch (1) will be given 100 times as much influence as each foreground
478
- # example in minibatch (2). Normalizing by the total number of regions, R,
479
- # means that the single example in minibatch (1) and each of the 100 examples
480
- # in minibatch (2) are given equal influence.
481
- return loss_box_reg / max(gt_classes.numel(), 1.0) # return 0 if empty
482
-
483
- def inference(self, predictions: Tuple[torch.Tensor, torch.Tensor], proposals: List[Instances]):
484
- """
485
- Args:
486
- predictions: return values of :meth:`forward()`.
487
- proposals (list[Instances]): proposals that match the features that were
488
- used to compute predictions. The ``proposal_boxes`` field is expected.
489
-
490
- Returns:
491
- list[Instances]: same as `fast_rcnn_inference`.
492
- list[Tensor]: same as `fast_rcnn_inference`.
493
- """
494
- boxes = self.predict_boxes(predictions, proposals)
495
- scores = self.predict_probs(predictions, proposals)
496
- image_shapes = [x.image_size for x in proposals]
497
- return fast_rcnn_inference(
498
- boxes,
499
- scores,
500
- image_shapes,
501
- self.test_score_thresh,
502
- self.test_nms_thresh,
503
- self.test_topk_per_image,
504
- )
505
-
506
- def predict_boxes_for_gt_classes(self, predictions, proposals):
507
- """
508
- Args:
509
- predictions: return values of :meth:`forward()`.
510
- proposals (list[Instances]): proposals that match the features that were used
511
- to compute predictions. The fields ``proposal_boxes``, ``gt_classes`` are expected.
512
-
513
- Returns:
514
- list[Tensor]:
515
- A list of Tensors of predicted boxes for GT classes in case of
516
- class-specific box head. Element i of the list has shape (Ri, B), where Ri is
517
- the number of proposals for image i and B is the box dimension (4 or 5)
518
- """
519
- if not len(proposals):
520
- return []
521
- scores, proposal_deltas = predictions
522
- proposal_boxes = cat([p.proposal_boxes.tensor for p in proposals], dim=0)
523
- N, B = proposal_boxes.shape
524
- predict_boxes = self.box2box_transform.apply_deltas(
525
- proposal_deltas, proposal_boxes
526
- ) # Nx(KxB)
527
-
528
- K = predict_boxes.shape[1] // B
529
- if K > 1:
530
- gt_classes = torch.cat([p.gt_classes for p in proposals], dim=0)
531
- # Some proposals are ignored or have a background class. Their gt_classes
532
- # cannot be used as index.
533
- gt_classes = gt_classes.clamp_(0, K - 1)
534
-
535
- predict_boxes = predict_boxes.view(N, K, B)[
536
- torch.arange(N, dtype=torch.long, device=predict_boxes.device), gt_classes
537
- ]
538
- num_prop_per_image = [len(p) for p in proposals]
539
- return predict_boxes.split(num_prop_per_image)
540
-
541
- def predict_boxes(
542
- self, predictions: Tuple[torch.Tensor, torch.Tensor], proposals: List[Instances]
543
- ):
544
- """
545
- Args:
546
- predictions: return values of :meth:`forward()`.
547
- proposals (list[Instances]): proposals that match the features that were
548
- used to compute predictions. The ``proposal_boxes`` field is expected.
549
-
550
- Returns:
551
- list[Tensor]:
552
- A list of Tensors of predicted class-specific or class-agnostic boxes
553
- for each image. Element i has shape (Ri, K * B) or (Ri, B), where Ri is
554
- the number of proposals for image i and B is the box dimension (4 or 5)
555
- """
556
- if not len(proposals):
557
- return []
558
- _, proposal_deltas = predictions
559
- num_prop_per_image = [len(p) for p in proposals]
560
- proposal_boxes = cat([p.proposal_boxes.tensor for p in proposals], dim=0)
561
- predict_boxes = self.box2box_transform.apply_deltas(
562
- proposal_deltas,
563
- proposal_boxes,
564
- ) # Nx(KxB)
565
- return predict_boxes.split(num_prop_per_image)
566
-
567
- def predict_probs(
568
- self, predictions: Tuple[torch.Tensor, torch.Tensor], proposals: List[Instances]
569
- ):
570
- """
571
- Args:
572
- predictions: return values of :meth:`forward()`.
573
- proposals (list[Instances]): proposals that match the features that were
574
- used to compute predictions.
575
-
576
- Returns:
577
- list[Tensor]:
578
- A list of Tensors of predicted class probabilities for each image.
579
- Element i has shape (Ri, K + 1), where Ri is the number of proposals for image i.
580
- """
581
- scores, _ = predictions
582
- num_inst_per_image = [len(p) for p in proposals]
583
- if self.use_sigmoid_ce:
584
- probs = scores.sigmoid()
585
- else:
586
- probs = F.softmax(scores, dim=-1)
587
- return probs.split(num_inst_per_image, dim=0)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/modeling/roi_heads/roi_heads.py DELETED
@@ -1,926 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # Modified by XuDong Wang from https://github.com/facebookresearch/detectron2/blob/main/detectron2/modeling/roi_heads/roi_heads.py
3
-
4
- import inspect
5
- import logging
6
- import numpy as np
7
- from typing import Dict, List, Optional, Tuple
8
- import torch
9
- from torch import nn
10
-
11
- from detectron2.config import configurable
12
- from detectron2.layers import ShapeSpec, nonzero_tuple
13
- from detectron2.structures import Boxes, pairwise_iou
14
- from structures import pairwise_iou_max_scores
15
- from detectron2.structures import ImageList, Instances
16
- from detectron2.utils.events import get_event_storage
17
- from detectron2.utils.registry import Registry
18
-
19
- from detectron2.modeling.backbone.resnet import BottleneckBlock, ResNet
20
- from detectron2.modeling.matcher import Matcher
21
- from detectron2.modeling.poolers import ROIPooler
22
- from detectron2.modeling.proposal_generator.proposal_utils import add_ground_truth_to_proposals
23
- from detectron2.modeling.sampling import subsample_labels
24
- from detectron2.modeling.roi_heads.box_head import build_box_head
25
- from .fast_rcnn import FastRCNNOutputLayers
26
- from detectron2.modeling.roi_heads.keypoint_head import build_keypoint_head
27
- from detectron2.modeling.roi_heads.mask_head import build_mask_head
28
-
29
- from detectron2.modeling.box_regression import Box2BoxTransform
30
- import torch.nn.functional as F
31
- from colored import fg
32
-
33
- blue, red = fg('blue'), fg('red')
34
-
35
- ROI_HEADS_REGISTRY = Registry("ROI_HEADS")
36
- ROI_HEADS_REGISTRY.__doc__ = """
37
- Registry for ROI heads in a generalized R-CNN model.
38
- ROIHeads take feature maps and region proposals, and
39
- perform per-region computation.
40
-
41
- The registered object will be called with `obj(cfg, input_shape)`.
42
- The call is expected to return an :class:`ROIHeads`.
43
- """
44
-
45
- logger = logging.getLogger(__name__)
46
-
47
-
48
- def build_roi_heads(cfg, input_shape):
49
- """
50
- Build ROIHeads defined by `cfg.MODEL.ROI_HEADS.NAME`.
51
- """
52
- name = cfg.MODEL.ROI_HEADS.NAME
53
- return ROI_HEADS_REGISTRY.get(name)(cfg, input_shape)
54
-
55
-
56
- def select_foreground_proposals(
57
- proposals: List[Instances], bg_label: int
58
- ) -> Tuple[List[Instances], List[torch.Tensor]]:
59
- """
60
- Given a list of N Instances (for N images), each containing a `gt_classes` field,
61
- return a list of Instances that contain only instances with `gt_classes != -1 &&
62
- gt_classes != bg_label`.
63
-
64
- Args:
65
- proposals (list[Instances]): A list of N Instances, where N is the number of
66
- images in the batch.
67
- bg_label: label index of background class.
68
-
69
- Returns:
70
- list[Instances]: N Instances, each contains only the selected foreground instances.
71
- list[Tensor]: N boolean vector, correspond to the selection mask of
72
- each Instances object. True for selected instances.
73
- """
74
- assert isinstance(proposals, (list, tuple))
75
- assert isinstance(proposals[0], Instances)
76
- assert proposals[0].has("gt_classes")
77
- fg_proposals = []
78
- fg_selection_masks = []
79
- for proposals_per_image in proposals:
80
- gt_classes = proposals_per_image.gt_classes
81
- fg_selection_mask = (gt_classes != -1) & (gt_classes != bg_label)
82
- fg_idxs = fg_selection_mask.nonzero().squeeze(1)
83
- fg_proposals.append(proposals_per_image[fg_idxs])
84
- fg_selection_masks.append(fg_selection_mask)
85
- return fg_proposals, fg_selection_masks
86
-
87
-
88
- def select_proposals_with_visible_keypoints(proposals: List[Instances]) -> List[Instances]:
89
- """
90
- Args:
91
- proposals (list[Instances]): a list of N Instances, where N is the
92
- number of images.
93
-
94
- Returns:
95
- proposals: only contains proposals with at least one visible keypoint.
96
-
97
- Note that this is still slightly different from Detectron.
98
- In Detectron, proposals for training keypoint head are re-sampled from
99
- all the proposals with IOU>threshold & >=1 visible keypoint.
100
-
101
- Here, the proposals are first sampled from all proposals with
102
- IOU>threshold, then proposals with no visible keypoint are filtered out.
103
- This strategy seems to make no difference on Detectron and is easier to implement.
104
- """
105
- ret = []
106
- all_num_fg = []
107
- for proposals_per_image in proposals:
108
- # If empty/unannotated image (hard negatives), skip filtering for train
109
- if len(proposals_per_image) == 0:
110
- ret.append(proposals_per_image)
111
- continue
112
- gt_keypoints = proposals_per_image.gt_keypoints.tensor
113
- # #fg x K x 3
114
- vis_mask = gt_keypoints[:, :, 2] >= 1
115
- xs, ys = gt_keypoints[:, :, 0], gt_keypoints[:, :, 1]
116
- proposal_boxes = proposals_per_image.proposal_boxes.tensor.unsqueeze(dim=1) # #fg x 1 x 4
117
- kp_in_box = (
118
- (xs >= proposal_boxes[:, :, 0])
119
- & (xs <= proposal_boxes[:, :, 2])
120
- & (ys >= proposal_boxes[:, :, 1])
121
- & (ys <= proposal_boxes[:, :, 3])
122
- )
123
- selection = (kp_in_box & vis_mask).any(dim=1)
124
- selection_idxs = nonzero_tuple(selection)[0]
125
- all_num_fg.append(selection_idxs.numel())
126
- ret.append(proposals_per_image[selection_idxs])
127
-
128
- storage = get_event_storage()
129
- storage.put_scalar("keypoint_head/num_fg_samples", np.mean(all_num_fg))
130
- return ret
131
-
132
-
133
- class ROIHeads(torch.nn.Module):
134
- """
135
- ROIHeads perform all per-region computation in an R-CNN.
136
-
137
- It typically contains logic to
138
-
139
- 1. (in training only) match proposals with ground truth and sample them
140
- 2. crop the regions and extract per-region features using proposals
141
- 3. make per-region predictions with different heads
142
-
143
- It can have many variants, implemented as subclasses of this class.
144
- This base class contains the logic to match/sample proposals.
145
- But it is not necessary to inherit this class if the sampling logic is not needed.
146
- """
147
-
148
- @configurable
149
- def __init__(
150
- self,
151
- *,
152
- num_classes,
153
- batch_size_per_image,
154
- positive_fraction,
155
- proposal_matcher,
156
- proposal_append_gt=True,
157
- ):
158
- """
159
- NOTE: this interface is experimental.
160
-
161
- Args:
162
- num_classes (int): number of foreground classes (i.e. background is not included)
163
- batch_size_per_image (int): number of proposals to sample for training
164
- positive_fraction (float): fraction of positive (foreground) proposals
165
- to sample for training.
166
- proposal_matcher (Matcher): matcher that matches proposals and ground truth
167
- proposal_append_gt (bool): whether to include ground truth as proposals as well
168
- """
169
- super().__init__()
170
- self.batch_size_per_image = batch_size_per_image
171
- self.positive_fraction = positive_fraction
172
- self.num_classes = num_classes
173
- self.proposal_matcher = proposal_matcher
174
- self.proposal_append_gt = proposal_append_gt
175
-
176
- @classmethod
177
- def from_config(cls, cfg):
178
- return {
179
- "batch_size_per_image": cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE,
180
- "positive_fraction": cfg.MODEL.ROI_HEADS.POSITIVE_FRACTION,
181
- "num_classes": cfg.MODEL.ROI_HEADS.NUM_CLASSES,
182
- "proposal_append_gt": cfg.MODEL.ROI_HEADS.PROPOSAL_APPEND_GT,
183
- # Matcher to assign box proposals to gt boxes
184
- "proposal_matcher": Matcher(
185
- cfg.MODEL.ROI_HEADS.IOU_THRESHOLDS,
186
- cfg.MODEL.ROI_HEADS.IOU_LABELS,
187
- allow_low_quality_matches=False,
188
- ),
189
- }
190
-
191
- def _sample_proposals(
192
- self, matched_idxs: torch.Tensor, matched_labels: torch.Tensor, gt_classes: torch.Tensor
193
- ) -> Tuple[torch.Tensor, torch.Tensor]:
194
- """
195
- Based on the matching between N proposals and M groundtruth,
196
- sample the proposals and set their classification labels.
197
-
198
- Args:
199
- matched_idxs (Tensor): a vector of length N, each is the best-matched
200
- gt index in [0, M) for each proposal.
201
- matched_labels (Tensor): a vector of length N, the matcher's label
202
- (one of cfg.MODEL.ROI_HEADS.IOU_LABELS) for each proposal.
203
- gt_classes (Tensor): a vector of length M.
204
-
205
- Returns:
206
- Tensor: a vector of indices of sampled proposals. Each is in [0, N).
207
- Tensor: a vector of the same length, the classification label for
208
- each sampled proposal. Each sample is labeled as either a category in
209
- [0, num_classes) or the background (num_classes).
210
- """
211
- has_gt = gt_classes.numel() > 0
212
- # Get the corresponding GT for each proposal
213
- if has_gt:
214
- gt_classes = gt_classes[matched_idxs]
215
- # Label unmatched proposals (0 label from matcher) as background (label=num_classes)
216
- gt_classes[matched_labels == 0] = self.num_classes
217
- # Label ignore proposals (-1 label)
218
- gt_classes[matched_labels == -1] = -1
219
- else:
220
- gt_classes = torch.zeros_like(matched_idxs) + self.num_classes
221
-
222
- sampled_fg_idxs, sampled_bg_idxs = subsample_labels(
223
- gt_classes, self.batch_size_per_image, self.positive_fraction, self.num_classes
224
- )
225
-
226
- sampled_idxs = torch.cat([sampled_fg_idxs, sampled_bg_idxs], dim=0)
227
- return sampled_idxs, gt_classes[sampled_idxs]
228
-
229
- @torch.no_grad()
230
- def label_and_sample_proposals(
231
- self, proposals: List[Instances], targets: List[Instances]
232
- ) -> List[Instances]:
233
- """
234
- Prepare some proposals to be used to train the ROI heads.
235
- It performs box matching between `proposals` and `targets`, and assigns
236
- training labels to the proposals.
237
- It returns ``self.batch_size_per_image`` random samples from proposals and groundtruth
238
- boxes, with a fraction of positives that is no larger than
239
- ``self.positive_fraction``.
240
-
241
- Args:
242
- See :meth:`ROIHeads.forward`
243
-
244
- Returns:
245
- list[Instances]:
246
- length `N` list of `Instances`s containing the proposals
247
- sampled for training. Each `Instances` has the following fields:
248
-
249
- - proposal_boxes: the proposal boxes
250
- - gt_boxes: the ground-truth box that the proposal is assigned to
251
- (this is only meaningful if the proposal has a label > 0; if label = 0
252
- then the ground-truth box is random)
253
-
254
- Other fields such as "gt_classes", "gt_masks", that's included in `targets`.
255
- """
256
- # Augment proposals with ground-truth boxes.
257
- # In the case of learned proposals (e.g., RPN), when training starts
258
- # the proposals will be low quality due to random initialization.
259
- # It's possible that none of these initial
260
- # proposals have high enough overlap with the gt objects to be used
261
- # as positive examples for the second stage components (box head,
262
- # cls head, mask head). Adding the gt boxes to the set of proposals
263
- # ensures that the second stage components will have some positive
264
- # examples from the start of training. For RPN, this augmentation improves
265
- # convergence and empirically improves box AP on COCO by about 0.5
266
- # points (under one tested configuration).
267
- if self.proposal_append_gt:
268
- proposals = add_ground_truth_to_proposals(targets, proposals)
269
-
270
- proposals_with_gt = []
271
-
272
- num_fg_samples = []
273
- num_bg_samples = []
274
- for proposals_per_image, targets_per_image in zip(proposals, targets):
275
- has_gt = len(targets_per_image) > 0
276
- match_quality_matrix = pairwise_iou(
277
- targets_per_image.gt_boxes, proposals_per_image.proposal_boxes
278
- )
279
- matched_idxs, matched_labels = self.proposal_matcher(match_quality_matrix)
280
- sampled_idxs, gt_classes = self._sample_proposals(
281
- matched_idxs, matched_labels, targets_per_image.gt_classes
282
- )
283
-
284
- # Set target attributes of the sampled proposals:
285
- proposals_per_image = proposals_per_image[sampled_idxs]
286
- proposals_per_image.gt_classes = gt_classes
287
-
288
- if has_gt:
289
- sampled_targets = matched_idxs[sampled_idxs]
290
- # We index all the attributes of targets that start with "gt_"
291
- # and have not been added to proposals yet (="gt_classes").
292
- # NOTE: here the indexing waste some compute, because heads
293
- # like masks, keypoints, etc, will filter the proposals again,
294
- # (by foreground/background, or number of keypoints in the image, etc)
295
- # so we essentially index the data twice.
296
- for (trg_name, trg_value) in targets_per_image.get_fields().items():
297
- if trg_name.startswith("gt_") and not proposals_per_image.has(trg_name):
298
- proposals_per_image.set(trg_name, trg_value[sampled_targets])
299
- # If no GT is given in the image, we don't know what a dummy gt value can be.
300
- # Therefore the returned proposals won't have any gt_* fields, except for a
301
- # gt_classes full of background label.
302
-
303
- num_bg_samples.append((gt_classes == self.num_classes).sum().item())
304
- num_fg_samples.append(gt_classes.numel() - num_bg_samples[-1])
305
- proposals_with_gt.append(proposals_per_image)
306
-
307
- # Log the number of fg/bg samples that are selected for training ROI heads
308
- storage = get_event_storage()
309
- storage.put_scalar("roi_head/num_fg_samples", np.mean(num_fg_samples))
310
- storage.put_scalar("roi_head/num_bg_samples", np.mean(num_bg_samples))
311
-
312
- return proposals_with_gt
313
-
314
- def forward(
315
- self,
316
- images: ImageList,
317
- features: Dict[str, torch.Tensor],
318
- proposals: List[Instances],
319
- targets: Optional[List[Instances]] = None,
320
- ) -> Tuple[List[Instances], Dict[str, torch.Tensor]]:
321
- """
322
- Args:
323
- images (ImageList):
324
- features (dict[str,Tensor]): input data as a mapping from feature
325
- map name to tensor. Axis 0 represents the number of images `N` in
326
- the input data; axes 1-3 are channels, height, and width, which may
327
- vary between feature maps (e.g., if a feature pyramid is used).
328
- proposals (list[Instances]): length `N` list of `Instances`. The i-th
329
- `Instances` contains object proposals for the i-th input image,
330
- with fields "proposal_boxes" and "objectness_logits".
331
- targets (list[Instances], optional): length `N` list of `Instances`. The i-th
332
- `Instances` contains the ground-truth per-instance annotations
333
- for the i-th input image. Specify `targets` during training only.
334
- It may have the following fields:
335
-
336
- - gt_boxes: the bounding box of each instance.
337
- - gt_classes: the label for each instance with a category ranging in [0, #class].
338
- - gt_masks: PolygonMasks or BitMasks, the ground-truth masks of each instance.
339
- - gt_keypoints: NxKx3, the groud-truth keypoints for each instance.
340
-
341
- Returns:
342
- list[Instances]: length `N` list of `Instances` containing the
343
- detected instances. Returned during inference only; may be [] during training.
344
-
345
- dict[str->Tensor]:
346
- mapping from a named loss to a tensor storing the loss. Used during training only.
347
- """
348
- raise NotImplementedError()
349
-
350
-
351
- @ROI_HEADS_REGISTRY.register()
352
- class Res5ROIHeads(ROIHeads):
353
- """
354
- The ROIHeads in a typical "C4" R-CNN model, where
355
- the box and mask head share the cropping and
356
- the per-region feature computation by a Res5 block.
357
- See :paper:`ResNet` Appendix A.
358
- """
359
-
360
- @configurable
361
- def __init__(
362
- self,
363
- *,
364
- in_features: List[str],
365
- pooler: ROIPooler,
366
- res5: nn.Module,
367
- box_predictor: nn.Module,
368
- mask_head: Optional[nn.Module] = None,
369
- **kwargs,
370
- ):
371
- """
372
- NOTE: this interface is experimental.
373
-
374
- Args:
375
- in_features (list[str]): list of backbone feature map names to use for
376
- feature extraction
377
- pooler (ROIPooler): pooler to extra region features from backbone
378
- res5 (nn.Sequential): a CNN to compute per-region features, to be used by
379
- ``box_predictor`` and ``mask_head``. Typically this is a "res5"
380
- block from a ResNet.
381
- box_predictor (nn.Module): make box predictions from the feature.
382
- Should have the same interface as :class:`FastRCNNOutputLayers`.
383
- mask_head (nn.Module): transform features to make mask predictions
384
- """
385
- super().__init__(**kwargs)
386
- self.in_features = in_features
387
- self.pooler = pooler
388
- if isinstance(res5, (list, tuple)):
389
- res5 = nn.Sequential(*res5)
390
- self.res5 = res5
391
- self.box_predictor = box_predictor
392
- self.mask_on = mask_head is not None
393
- if self.mask_on:
394
- self.mask_head = mask_head
395
-
396
- @classmethod
397
- def from_config(cls, cfg, input_shape):
398
- # fmt: off
399
- ret = super().from_config(cfg)
400
- in_features = ret["in_features"] = cfg.MODEL.ROI_HEADS.IN_FEATURES
401
- pooler_resolution = cfg.MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION
402
- pooler_type = cfg.MODEL.ROI_BOX_HEAD.POOLER_TYPE
403
- pooler_scales = (1.0 / input_shape[in_features[0]].stride, )
404
- sampling_ratio = cfg.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO
405
- mask_on = cfg.MODEL.MASK_ON
406
- # fmt: on
407
- assert not cfg.MODEL.KEYPOINT_ON
408
- assert len(in_features) == 1
409
-
410
- ret["pooler"] = ROIPooler(
411
- output_size=pooler_resolution,
412
- scales=pooler_scales,
413
- sampling_ratio=sampling_ratio,
414
- pooler_type=pooler_type,
415
- )
416
-
417
- # Compatbility with old moco code. Might be useful.
418
- # See notes in StandardROIHeads.from_config
419
- if not inspect.ismethod(cls._build_res5_block):
420
- logger.warning(
421
- "The behavior of _build_res5_block may change. "
422
- "Please do not depend on private methods."
423
- )
424
- cls._build_res5_block = classmethod(cls._build_res5_block)
425
-
426
- ret["res5"], out_channels = cls._build_res5_block(cfg)
427
- ret["box_predictor"] = FastRCNNOutputLayers(
428
- cfg, ShapeSpec(channels=out_channels, height=1, width=1)
429
- )
430
-
431
- if mask_on:
432
- ret["mask_head"] = build_mask_head(
433
- cfg,
434
- ShapeSpec(channels=out_channels, width=pooler_resolution, height=pooler_resolution),
435
- )
436
- return ret
437
-
438
- @classmethod
439
- def _build_res5_block(cls, cfg):
440
- # fmt: off
441
- stage_channel_factor = 2 ** 3 # res5 is 8x res2
442
- num_groups = cfg.MODEL.RESNETS.NUM_GROUPS
443
- width_per_group = cfg.MODEL.RESNETS.WIDTH_PER_GROUP
444
- bottleneck_channels = num_groups * width_per_group * stage_channel_factor
445
- out_channels = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS * stage_channel_factor
446
- stride_in_1x1 = cfg.MODEL.RESNETS.STRIDE_IN_1X1
447
- norm = cfg.MODEL.RESNETS.NORM
448
- assert not cfg.MODEL.RESNETS.DEFORM_ON_PER_STAGE[-1], \
449
- "Deformable conv is not yet supported in res5 head."
450
- # fmt: on
451
-
452
- blocks = ResNet.make_stage(
453
- BottleneckBlock,
454
- 3,
455
- stride_per_block=[2, 1, 1],
456
- in_channels=out_channels // 2,
457
- bottleneck_channels=bottleneck_channels,
458
- out_channels=out_channels,
459
- num_groups=num_groups,
460
- norm=norm,
461
- stride_in_1x1=stride_in_1x1,
462
- )
463
- return nn.Sequential(*blocks), out_channels
464
-
465
- def _shared_roi_transform(self, features: List[torch.Tensor], boxes: List[Boxes]):
466
- x = self.pooler(features, boxes)
467
- return self.res5(x)
468
-
469
- def forward(
470
- self,
471
- images: ImageList,
472
- features: Dict[str, torch.Tensor],
473
- proposals: List[Instances],
474
- targets: Optional[List[Instances]] = None,
475
- ):
476
- """
477
- See :meth:`ROIHeads.forward`.
478
- """
479
- del images
480
-
481
- if self.training:
482
- assert targets
483
- proposals = self.label_and_sample_proposals(proposals, targets)
484
- del targets
485
-
486
- proposal_boxes = [x.proposal_boxes for x in proposals]
487
- box_features = self._shared_roi_transform(
488
- [features[f] for f in self.in_features], proposal_boxes
489
- )
490
- predictions = self.box_predictor(box_features.mean(dim=[2, 3]))
491
-
492
- if self.training:
493
- del features
494
- losses = self.box_predictor.losses(predictions, proposals)
495
- if self.mask_on:
496
- proposals, fg_selection_masks = select_foreground_proposals(
497
- proposals, self.num_classes
498
- )
499
- # Since the ROI feature transform is shared between boxes and masks,
500
- # we don't need to recompute features. The mask loss is only defined
501
- # on foreground proposals, so we need to select out the foreground
502
- # features.
503
- mask_features = box_features[torch.cat(fg_selection_masks, dim=0)]
504
- del box_features
505
- losses.update(self.mask_head(mask_features, proposals))
506
- return [], losses
507
- else:
508
- pred_instances, _ = self.box_predictor.inference(predictions, proposals)
509
- pred_instances = self.forward_with_given_boxes(features, pred_instances)
510
- return pred_instances, {}
511
-
512
- def forward_with_given_boxes(
513
- self, features: Dict[str, torch.Tensor], instances: List[Instances]
514
- ) -> List[Instances]:
515
- """
516
- Use the given boxes in `instances` to produce other (non-box) per-ROI outputs.
517
-
518
- Args:
519
- features: same as in `forward()`
520
- instances (list[Instances]): instances to predict other outputs. Expect the keys
521
- "pred_boxes" and "pred_classes" to exist.
522
-
523
- Returns:
524
- instances (Instances):
525
- the same `Instances` object, with extra
526
- fields such as `pred_masks` or `pred_keypoints`.
527
- """
528
- assert not self.training
529
- assert instances[0].has("pred_boxes") and instances[0].has("pred_classes")
530
-
531
- if self.mask_on:
532
- feature_list = [features[f] for f in self.in_features]
533
- x = self._shared_roi_transform(feature_list, [x.pred_boxes for x in instances])
534
- return self.mask_head(x, instances)
535
- else:
536
- return instances
537
-
538
-
539
- @ROI_HEADS_REGISTRY.register()
540
- class CustomStandardROIHeads(ROIHeads):
541
- """
542
- It's "standard" in a sense that there is no ROI transform sharing
543
- or feature sharing between tasks.
544
- Each head independently processes the input features by each head's
545
- own pooler and head.
546
-
547
- This class is used by most models, such as FPN and C5.
548
- To implement more models, you can subclass it and implement a different
549
- :meth:`forward()` or a head.
550
- """
551
-
552
- @configurable
553
- def __init__(
554
- self,
555
- *,
556
- box_in_features: List[str],
557
- box_pooler: ROIPooler,
558
- box_head: nn.Module,
559
- box_predictor: nn.Module,
560
- mask_in_features: Optional[List[str]] = None,
561
- mask_pooler: Optional[ROIPooler] = None,
562
- mask_head: Optional[nn.Module] = None,
563
- keypoint_in_features: Optional[List[str]] = None,
564
- keypoint_pooler: Optional[ROIPooler] = None,
565
- keypoint_head: Optional[nn.Module] = None,
566
- train_on_pred_boxes: bool = False,
567
- box2box_transform = Box2BoxTransform,
568
- use_droploss: bool = False,
569
- droploss_iou_thresh: float = 1.0,
570
- **kwargs,
571
- ):
572
- """
573
- NOTE: this interface is experimental.
574
-
575
- Args:
576
- box_in_features (list[str]): list of feature names to use for the box head.
577
- box_pooler (ROIPooler): pooler to extra region features for box head
578
- box_head (nn.Module): transform features to make box predictions
579
- box_predictor (nn.Module): make box predictions from the feature.
580
- Should have the same interface as :class:`FastRCNNOutputLayers`.
581
- mask_in_features (list[str]): list of feature names to use for the mask
582
- pooler or mask head. None if not using mask head.
583
- mask_pooler (ROIPooler): pooler to extract region features from image features.
584
- The mask head will then take region features to make predictions.
585
- If None, the mask head will directly take the dict of image features
586
- defined by `mask_in_features`
587
- mask_head (nn.Module): transform features to make mask predictions
588
- keypoint_in_features, keypoint_pooler, keypoint_head: similar to ``mask_*``.
589
- train_on_pred_boxes (bool): whether to use proposal boxes or
590
- predicted boxes from the box head to train other heads.
591
- """
592
- super().__init__(**kwargs)
593
- # keep self.in_features for backward compatibility
594
- self.in_features = self.box_in_features = box_in_features
595
- self.box_pooler = box_pooler
596
- self.box_head = box_head
597
- self.box_predictor = box_predictor
598
-
599
- self.mask_on = mask_in_features is not None
600
- if self.mask_on:
601
- self.mask_in_features = mask_in_features
602
- self.mask_pooler = mask_pooler
603
- self.mask_head = mask_head
604
-
605
- self.keypoint_on = keypoint_in_features is not None
606
- if self.keypoint_on:
607
- self.keypoint_in_features = keypoint_in_features
608
- self.keypoint_pooler = keypoint_pooler
609
- self.keypoint_head = keypoint_head
610
-
611
- self.train_on_pred_boxes = train_on_pred_boxes
612
- self.use_droploss = use_droploss
613
- self.box2box_transform = box2box_transform
614
- self.droploss_iou_thresh = droploss_iou_thresh
615
-
616
- @classmethod
617
- def from_config(cls, cfg, input_shape):
618
- ret = super().from_config(cfg)
619
- ret["train_on_pred_boxes"] = cfg.MODEL.ROI_BOX_HEAD.TRAIN_ON_PRED_BOXES
620
- # Subclasses that have not been updated to use from_config style construction
621
- # may have overridden _init_*_head methods. In this case, those overridden methods
622
- # will not be classmethods and we need to avoid trying to call them here.
623
- # We test for this with ismethod which only returns True for bound methods of cls.
624
- # Such subclasses will need to handle calling their overridden _init_*_head methods.
625
- if cfg.MODEL.ROI_HEADS.USE_DROPLOSS:
626
- ret['use_droploss'] = True
627
- ret['droploss_iou_thresh'] = cfg.MODEL.ROI_HEADS.DROPLOSS_IOU_THRESH
628
- ret['box2box_transform'] = Box2BoxTransform(weights=cfg.MODEL.ROI_BOX_HEAD.BBOX_REG_WEIGHTS)
629
- if inspect.ismethod(cls._init_box_head):
630
- ret.update(cls._init_box_head(cfg, input_shape))
631
- if inspect.ismethod(cls._init_mask_head):
632
- ret.update(cls._init_mask_head(cfg, input_shape))
633
- if inspect.ismethod(cls._init_keypoint_head):
634
- ret.update(cls._init_keypoint_head(cfg, input_shape))
635
- return ret
636
-
637
- @classmethod
638
- def _init_box_head(cls, cfg, input_shape):
639
- # fmt: off
640
- in_features = cfg.MODEL.ROI_HEADS.IN_FEATURES
641
- pooler_resolution = cfg.MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION
642
- pooler_scales = tuple(1.0 / input_shape[k].stride for k in in_features)
643
- sampling_ratio = cfg.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO
644
- pooler_type = cfg.MODEL.ROI_BOX_HEAD.POOLER_TYPE
645
- # fmt: on
646
-
647
- # If CustomStandardROIHeads is applied on multiple feature maps (as in FPN),
648
- # then we share the same predictors and therefore the channel counts must be the same
649
- in_channels = [input_shape[f].channels for f in in_features]
650
- # Check all channel counts are equal
651
- assert len(set(in_channels)) == 1, in_channels
652
- in_channels = in_channels[0]
653
-
654
- box_pooler = ROIPooler(
655
- output_size=pooler_resolution,
656
- scales=pooler_scales,
657
- sampling_ratio=sampling_ratio,
658
- pooler_type=pooler_type,
659
- )
660
- # Here we split "box head" and "box predictor", which is mainly due to historical reasons.
661
- # They are used together so the "box predictor" layers should be part of the "box head".
662
- # New subclasses of ROIHeads do not need "box predictor"s.
663
- box_head = build_box_head(
664
- cfg, ShapeSpec(channels=in_channels, height=pooler_resolution, width=pooler_resolution)
665
- )
666
- box_predictor = FastRCNNOutputLayers(cfg, box_head.output_shape)
667
- return {
668
- "box_in_features": in_features,
669
- "box_pooler": box_pooler,
670
- "box_head": box_head,
671
- "box_predictor": box_predictor,
672
- }
673
-
674
- @classmethod
675
- def _init_mask_head(cls, cfg, input_shape):
676
- if not cfg.MODEL.MASK_ON:
677
- return {}
678
- # fmt: off
679
- in_features = cfg.MODEL.ROI_HEADS.IN_FEATURES
680
- pooler_resolution = cfg.MODEL.ROI_MASK_HEAD.POOLER_RESOLUTION
681
- pooler_scales = tuple(1.0 / input_shape[k].stride for k in in_features)
682
- sampling_ratio = cfg.MODEL.ROI_MASK_HEAD.POOLER_SAMPLING_RATIO
683
- pooler_type = cfg.MODEL.ROI_MASK_HEAD.POOLER_TYPE
684
- # fmt: on
685
-
686
- in_channels = [input_shape[f].channels for f in in_features][0]
687
-
688
- ret = {"mask_in_features": in_features}
689
- ret["mask_pooler"] = (
690
- ROIPooler(
691
- output_size=pooler_resolution,
692
- scales=pooler_scales,
693
- sampling_ratio=sampling_ratio,
694
- pooler_type=pooler_type,
695
- )
696
- if pooler_type
697
- else None
698
- )
699
- if pooler_type:
700
- shape = ShapeSpec(
701
- channels=in_channels, width=pooler_resolution, height=pooler_resolution
702
- )
703
- else:
704
- shape = {f: input_shape[f] for f in in_features}
705
- ret["mask_head"] = build_mask_head(cfg, shape)
706
- return ret
707
-
708
- @classmethod
709
- def _init_keypoint_head(cls, cfg, input_shape):
710
- if not cfg.MODEL.KEYPOINT_ON:
711
- return {}
712
- # fmt: off
713
- in_features = cfg.MODEL.ROI_HEADS.IN_FEATURES
714
- pooler_resolution = cfg.MODEL.ROI_KEYPOINT_HEAD.POOLER_RESOLUTION
715
- pooler_scales = tuple(1.0 / input_shape[k].stride for k in in_features) # noqa
716
- sampling_ratio = cfg.MODEL.ROI_KEYPOINT_HEAD.POOLER_SAMPLING_RATIO
717
- pooler_type = cfg.MODEL.ROI_KEYPOINT_HEAD.POOLER_TYPE
718
- # fmt: on
719
-
720
- in_channels = [input_shape[f].channels for f in in_features][0]
721
-
722
- ret = {"keypoint_in_features": in_features}
723
- ret["keypoint_pooler"] = (
724
- ROIPooler(
725
- output_size=pooler_resolution,
726
- scales=pooler_scales,
727
- sampling_ratio=sampling_ratio,
728
- pooler_type=pooler_type,
729
- )
730
- if pooler_type
731
- else None
732
- )
733
- if pooler_type:
734
- shape = ShapeSpec(
735
- channels=in_channels, width=pooler_resolution, height=pooler_resolution
736
- )
737
- else:
738
- shape = {f: input_shape[f] for f in in_features}
739
- ret["keypoint_head"] = build_keypoint_head(cfg, shape)
740
- return ret
741
-
742
- def forward(
743
- self,
744
- images: ImageList,
745
- features: Dict[str, torch.Tensor],
746
- proposals: List[Instances],
747
- targets: Optional[List[Instances]] = None,
748
- ) -> Tuple[List[Instances], Dict[str, torch.Tensor]]:
749
- """
750
- See :class:`ROIHeads.forward`.
751
- """
752
- del images
753
- if self.training:
754
- assert targets, "'targets' argument is required during training"
755
- proposals = self.label_and_sample_proposals(proposals, targets)
756
- del targets
757
-
758
- if self.training:
759
- losses = self._forward_box(features, proposals)
760
- # Usually the original proposals used by the box head are used by the mask, keypoint
761
- # heads. But when `self.train_on_pred_boxes is True`, proposals will contain boxes
762
- # predicted by the box head.
763
- losses.update(self._forward_mask(features, proposals))
764
- losses.update(self._forward_keypoint(features, proposals))
765
- return proposals, losses
766
- else:
767
- pred_instances = self._forward_box(features, proposals)
768
- # During inference cascaded prediction is used: the mask and keypoints heads are only
769
- # applied to the top scoring box detections.
770
- pred_instances = self.forward_with_given_boxes(features, pred_instances)
771
- return pred_instances, {}
772
-
773
- def forward_with_given_boxes(
774
- self, features: Dict[str, torch.Tensor], instances: List[Instances]
775
- ) -> List[Instances]:
776
- """
777
- Use the given boxes in `instances` to produce other (non-box) per-ROI outputs.
778
-
779
- This is useful for downstream tasks where a box is known, but need to obtain
780
- other attributes (outputs of other heads).
781
- Test-time augmentation also uses this.
782
-
783
- Args:
784
- features: same as in `forward()`
785
- instances (list[Instances]): instances to predict other outputs. Expect the keys
786
- "pred_boxes" and "pred_classes" to exist.
787
-
788
- Returns:
789
- list[Instances]:
790
- the same `Instances` objects, with extra
791
- fields such as `pred_masks` or `pred_keypoints`.
792
- """
793
- assert not self.training
794
- assert instances[0].has("pred_boxes") and instances[0].has("pred_classes")
795
-
796
- instances = self._forward_mask(features, instances)
797
- instances = self._forward_keypoint(features, instances)
798
- return instances
799
-
800
- def _forward_box(self, features: Dict[str, torch.Tensor], proposals: List[Instances]):
801
- """
802
- Forward logic of the box prediction branch. If `self.train_on_pred_boxes is True`,
803
- the function puts predicted boxes in the `proposal_boxes` field of `proposals` argument.
804
-
805
- Args:
806
- features (dict[str, Tensor]): mapping from feature map names to tensor.
807
- Same as in :meth:`ROIHeads.forward`.
808
- proposals (list[Instances]): the per-image object proposals with
809
- their matching ground truth.
810
- Each has fields "proposal_boxes", and "objectness_logits",
811
- "gt_classes", "gt_boxes".
812
-
813
- Returns:
814
- In training, a dict of losses.
815
- In inference, a list of `Instances`, the predicted instances.
816
- """
817
- features = [features[f] for f in self.box_in_features]
818
- box_features = self.box_pooler(features, [x.proposal_boxes for x in proposals]) # torch.Size([512 * batch_size, 256, 7, 7])
819
- box_features = self.box_head(box_features) # torch.Size([512 * batch_size, 1024])
820
- predictions = self.box_predictor(box_features) # [torch.Size([512 * batch_size, 2]), torch.Size([512 * batch_size, 4])]
821
-
822
- no_gt_found = False
823
- if self.use_droploss and self.training:
824
- # the first K proposals are GT proposals
825
- try:
826
- box_num_list = [len(x.gt_boxes) for x in proposals]
827
- gt_num_list = [torch.unique(x.gt_boxes.tensor[:100], dim=0).size()[0] for x in proposals]
828
- except:
829
- box_num_list = [0 for _ in proposals]
830
- gt_num_list = [0 for _ in proposals]
831
- no_gt_found = True
832
-
833
- if self.use_droploss and self.training and not no_gt_found:
834
- # NOTE: maximum overlapping with GT (IoU)
835
- predictions_delta = predictions[1]
836
- proposal_boxes = Boxes.cat([x.proposal_boxes for x in proposals])
837
- predictions_bbox = self.box2box_transform.apply_deltas(predictions_delta, proposal_boxes.tensor)
838
- idx_start = 0
839
- iou_max_list = []
840
- for idx, x in enumerate(proposals):
841
- idx_end = idx_start + box_num_list[idx]
842
- iou_max_list.append(pairwise_iou_max_scores(predictions_bbox[idx_start:idx_end], x.gt_boxes[:gt_num_list[idx]].tensor))
843
- idx_start = idx_end
844
- iou_max = torch.cat(iou_max_list, dim=0)
845
-
846
- del box_features
847
-
848
- if self.training:
849
- if self.use_droploss and not no_gt_found:
850
- weights = iou_max.le(self.droploss_iou_thresh).float()
851
- weights = 1 - weights.ge(1.0).float()
852
- losses = self.box_predictor.losses(predictions, proposals, weights=weights.detach())
853
- else:
854
- losses = self.box_predictor.losses(predictions, proposals)
855
- if self.train_on_pred_boxes: # default is false
856
- with torch.no_grad():
857
- pred_boxes = self.box_predictor.predict_boxes_for_gt_classes(
858
- predictions, proposals
859
- )
860
- for proposals_per_image, pred_boxes_per_image in zip(proposals, pred_boxes):
861
- proposals_per_image.proposal_boxes = Boxes(pred_boxes_per_image)
862
- return losses
863
- else:
864
- pred_instances, _ = self.box_predictor.inference(predictions, proposals)
865
- return pred_instances
866
-
867
- def _forward_mask(self, features: Dict[str, torch.Tensor], instances: List[Instances]):
868
- """
869
- Forward logic of the mask prediction branch.
870
-
871
- Args:
872
- features (dict[str, Tensor]): mapping from feature map names to tensor.
873
- Same as in :meth:`ROIHeads.forward`.
874
- instances (list[Instances]): the per-image instances to train/predict masks.
875
- In training, they can be the proposals.
876
- In inference, they can be the boxes predicted by R-CNN box head.
877
-
878
- Returns:
879
- In training, a dict of losses.
880
- In inference, update `instances` with new fields "pred_masks" and return it.
881
- """
882
- if not self.mask_on:
883
- return {} if self.training else instances
884
-
885
- if self.training:
886
- # head is only trained on positive proposals.
887
- instances, _ = select_foreground_proposals(instances, self.num_classes)
888
-
889
- if self.mask_pooler is not None:
890
- features = [features[f] for f in self.mask_in_features]
891
- boxes = [x.proposal_boxes if self.training else x.pred_boxes for x in instances]
892
- features = self.mask_pooler(features, boxes)
893
- else:
894
- features = {f: features[f] for f in self.mask_in_features}
895
- return self.mask_head(features, instances)
896
-
897
- def _forward_keypoint(self, features: Dict[str, torch.Tensor], instances: List[Instances]):
898
- """
899
- Forward logic of the keypoint prediction branch.
900
-
901
- Args:
902
- features (dict[str, Tensor]): mapping from feature map names to tensor.
903
- Same as in :meth:`ROIHeads.forward`.
904
- instances (list[Instances]): the per-image instances to train/predict keypoints.
905
- In training, they can be the proposals.
906
- In inference, they can be the boxes predicted by R-CNN box head.
907
-
908
- Returns:
909
- In training, a dict of losses.
910
- In inference, update `instances` with new fields "pred_keypoints" and return it.
911
- """
912
- if not self.keypoint_on:
913
- return {} if self.training else instances
914
-
915
- if self.training:
916
- # head is only trained on positive proposals with >=1 visible keypoints.
917
- instances, _ = select_foreground_proposals(instances, self.num_classes)
918
- instances = select_proposals_with_visible_keypoints(instances)
919
-
920
- if self.keypoint_pooler is not None:
921
- features = [features[f] for f in self.keypoint_in_features]
922
- boxes = [x.proposal_boxes if self.training else x.pred_boxes for x in instances]
923
- features = self.keypoint_pooler(features, boxes)
924
- else:
925
- features = {f: features[f] for f in self.keypoint_in_features}
926
- return self.keypoint_head(features, instances)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cutler/solver/__init__.py DELETED
@@ -1,5 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
-
3
- from .build import build_lr_scheduler, build_optimizer, get_default_optimizer_params
4
-
5
- __all__ = [k for k in globals().keys() if not k.startswith("_")]