jayyap commited on
Commit
9d51df0
·
1 Parent(s): c69f654

Upload 28 files

Browse files
.gitignore ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .idea/
2
+ .cog
3
+ oryx-build-commands.txt
4
+ tmp/*.png
5
+ *.png
6
+
7
+ training/
8
+
9
+ *.pth
10
+ *.pt
11
+ *.ckpt
12
+
13
+ my_fix.py
14
+
15
+ # Byte-compiled / optimized / DLL files
16
+ __pycache__/
17
+ *.py[cod]
18
+ *$py.class
19
+
20
+ # C extensions
21
+ *.so
22
+
23
+ # Distribution / packaging
24
+ .Python
25
+ build/
26
+ develop-eggs/
27
+ dist/
28
+ downloads/
29
+ eggs/
30
+ .eggs/
31
+ lib/
32
+ lib64/
33
+ parts/
34
+ sdist/
35
+ var/
36
+ wheels/
37
+ pip-wheel-metadata/
38
+ share/python-wheels/
39
+ *.egg-info/
40
+ .installed.cfg
41
+ *.egg
42
+ MANIFEST
43
+
44
+ # PyInstaller
45
+ # Usually these files are written by a python script from a template
46
+ # before PyInstaller builds the exe, so as to inject date/other infos into it.
47
+ *.manifest
48
+ *.spec
49
+
50
+ # Installer logs
51
+ pip-log.txt
52
+ pip-delete-this-directory.txt
53
+
54
+ # Unit test / coverage reports
55
+ htmlcov/
56
+ .tox/
57
+ .nox/
58
+ .coverage
59
+ .coverage.*
60
+ .cache
61
+ nosetests.xml
62
+ coverage.xml
63
+ *.cover
64
+ *.py,cover
65
+ .hypothesis/
66
+ .pytest_cache/
67
+
68
+ # Translations
69
+ *.mo
70
+ *.pot
71
+
72
+ # Django stuff:
73
+ *.log
74
+ local_settings.py
75
+ db.sqlite3
76
+ db.sqlite3-journal
77
+
78
+ # Flask stuff:
79
+ instance/
80
+ .webassets-cache
81
+
82
+ # Scrapy stuff:
83
+ .scrapy
84
+
85
+ # Sphinx documentation
86
+ docs/_build/
87
+
88
+ # PyBuilder
89
+ target/
90
+
91
+ # Jupyter Notebook
92
+ .ipynb_checkpoints
93
+
94
+ # IPython
95
+ profile_default/
96
+ ipython_config.py
97
+
98
+ # pyenv
99
+ .python-version
100
+
101
+ # pipenv
102
+ # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
103
+ # However, in case of collaboration, if having platform-specific dependencies or dependencies
104
+ # having no cross-platform support, pipenv may install dependencies that don't work, or not
105
+ # install all needed dependencies.
106
+ #Pipfile.lock
107
+
108
+ # PEP 582; used by e.g. github.com/David-OConnor/pyflow
109
+ __pypackages__/
110
+
111
+ # Celery stuff
112
+ celerybeat-schedule
113
+ celerybeat.pid
114
+
115
+ # SageMath parsed files
116
+ *.sage.py
117
+
118
+ # Environments
119
+ .env
120
+ .venv
121
+ env/
122
+ venv/
123
+ ENV/
124
+ env.bak/
125
+ venv.bak/
126
+
127
+ # Spyder project settings
128
+ .spyderproject
129
+ .spyproject
130
+
131
+ # Rope project settings
132
+ .ropeproject
133
+
134
+ # mkdocs documentation
135
+ /site
136
+
137
+ # mypy
138
+ .mypy_cache/
139
+ .dmypy.json
140
+ dmypy.json
141
+
142
+ # Pyre type checker
143
+ .pyre/
FAQ.md ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # FAQs
2
+
3
+ **Q:** If the weight of a conv layer is zero, the gradient will also be zero, and the network will not learn anything. Why "zero convolution" works?
4
+
5
+ **A:** This is wrong. Let us consider a very simple
6
+
7
+ $$y=wx+b$$
8
+
9
+ and we have
10
+
11
+ $$\partial y/\partial w=x, \partial y/\partial x=w, \partial y/\partial b=1$$
12
+
13
+ and if $w=0$ and $x \neq 0$, then
14
+
15
+ $$\partial y/\partial w \neq 0, \partial y/\partial x=0, \partial y/\partial b\neq 0$$
16
+
17
+ which means as long as $x \neq 0$, one gradient descent iteration will make $w$ non-zero. Then
18
+
19
+ $$\partial y/\partial x\neq 0$$
20
+
21
+ so that the zero convolutions will progressively become a common conv layer with non-zero weights.
LICENSE ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright [yyyy] [name of copyright owner]
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
README.md CHANGED
@@ -1,3 +1,234 @@
1
- ---
2
- license: unknown
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ControlNet
2
+
3
+ Cog implementation of [Adding Conditional Control to Text-to-Image Diffusion Models](https://github.com/lllyasviel/ControlNet/raw/main/github_page/control.pdf).
4
+
5
+ To run this Cog model:
6
+ 1. clone this repo
7
+ 1. run `cog run python download_weights.py --model_type='desired-model-type-goes-here'`
8
+ 1. run `cog predict -i image='@your_img.png' -i prompt='your prompt'`
9
+ 1. push to Replicate with `cog push`, if you like
10
+
11
+ # About ControlNet
12
+
13
+ ControlNet is a neural network structure to control diffusion models by adding extra conditions.
14
+
15
+ ![img](github_page/he.png)
16
+
17
+ It copys the weights of neural network blocks into a "locked" copy and a "trainable" copy.
18
+
19
+ The "trainable" one learns your condition. The "locked" one preserves your model.
20
+
21
+ Thanks to this, training with small dataset of image pairs will not destroy the production-ready diffusion models.
22
+
23
+ The "zero convolution" is 1×1 convolution with both weight and bias initialized as zeros.
24
+
25
+ Before training, all zero convolutions output zeros, and ControlNet will not cause any distortion.
26
+
27
+ No layer is trained from scratch. You are still fine-tuning. Your original model is safe.
28
+
29
+ This allows training on small-scale or even personal devices.
30
+
31
+ This is also friendly to merge/replacement/offsetting of models/weights/blocks/layers.
32
+
33
+ ### FAQ
34
+
35
+ **Q:** But wait, if the weight of a conv layer is zero, the gradient will also be zero, and the network will not learn anything. Why "zero convolution" works?
36
+
37
+ **A:** This is not true. [See an explanation here](FAQ.md).
38
+
39
+ # Stable Diffusion + ControlNet
40
+
41
+ By repeating the above simple structure 14 times, we can control stable diffusion in this way:
42
+
43
+ ![img](github_page/sd.png)
44
+
45
+ Note that the way we connect layers is computational efficient. The original SD encoder does not need to store gradients (the locked original SD Encoder Block 1234 and Middle). The required GPU memory is not much larger than original SD, although many layers are added. Great!
46
+
47
+ # Production-Ready Pretrained Models
48
+
49
+ First create a new conda environment
50
+
51
+ conda env create -f environment.yaml
52
+ conda activate control
53
+
54
+ All models and detectors can be downloaded from [our huggingface page](https://huggingface.co/lllyasviel/ControlNet). Make sure that SD models are put in "ControlNet/models" and detectors are put in "ControlNet/annotator/ckpts". Make sure that you download all necessary pretrained weights and detector models from that huggingface page, including HED edge detection model, Midas depth estimation model, Openpose, and so on.
55
+
56
+ We provide 9 Gradio apps with these models.
57
+
58
+ All test images can be found at the folder "test_imgs".
59
+
60
+ ## ControlNet with Canny Edge
61
+
62
+ Stable Diffusion 1.5 + ControlNet (using simple Canny edge detection)
63
+
64
+ python gradio_canny2image.py
65
+
66
+ The Gradio app also allows you to change the Canny edge thresholds. Just try it for more details.
67
+
68
+ Prompt: "bird"
69
+ ![p](github_page/p1.png)
70
+
71
+ Prompt: "cute dog"
72
+ ![p](github_page/p2.png)
73
+
74
+ ## ControlNet with M-LSD Lines
75
+
76
+ Stable Diffusion 1.5 + ControlNet (using simple M-LSD straight line detection)
77
+
78
+ python gradio_hough2image.py
79
+
80
+ The Gradio app also allows you to change the M-LSD thresholds. Just try it for more details.
81
+
82
+ Prompt: "room"
83
+ ![p](github_page/p3.png)
84
+
85
+ Prompt: "building"
86
+ ![p](github_page/p4.png)
87
+
88
+ ## ControlNet with HED Boundary
89
+
90
+ Stable Diffusion 1.5 + ControlNet (using soft HED Boundary)
91
+
92
+ python gradio_hed2image.py
93
+
94
+ The soft HED Boundary will preserve many details in input images, making this app suitable for recoloring and stylizing. Just try it for more details.
95
+
96
+ Prompt: "oil painting of handsome old man, masterpiece"
97
+ ![p](github_page/p5.png)
98
+
99
+ Prompt: "Cyberpunk robot"
100
+ ![p](github_page/p6.png)
101
+
102
+ ## ControlNet with User Scribbles
103
+
104
+ Stable Diffusion 1.5 + ControlNet (using Scribbles)
105
+
106
+ python gradio_scribble2image.py
107
+
108
+ Note that the UI is based on Gradio, and Gradio is somewhat difficult to customize. Right now you need to draw scribbles outside the UI (using your favorite drawing software, for example, MS Paint) and then import the scribble image to Gradio.
109
+
110
+ Prompt: "turtle"
111
+ ![p](github_page/p7.png)
112
+
113
+ Prompt: "hot air balloon"
114
+ ![p](github_page/p8.png)
115
+
116
+ ### Interactive Interface
117
+
118
+ We actually provide an interactive interface
119
+
120
+ python gradio_scribble2image_interactive.py
121
+
122
+ However, because gradio is very [buggy](https://github.com/gradio-app/gradio/issues/3166) and difficult to customize, right now, user need to first set canvas width and heights and then click "Open drawing canvas" to get a drawing area. Please do not upload image to that drawing canvas. Also, the drawing area is very small; it should be bigger. But I failed to find out how to make it larger. Again, gradio is really buggy.
123
+
124
+ The below dog sketch is drawn by me. Perhaps we should draw a better dog for showcase.
125
+
126
+ Prompt: "dog in a room"
127
+ ![p](github_page/p20.png)
128
+
129
+ ## ControlNet with Fake Scribbles
130
+
131
+ Stable Diffusion 1.5 + ControlNet (using fake scribbles)
132
+
133
+ python gradio_fake_scribble2image.py
134
+
135
+ Sometimes we are lazy, and we do not want to draw scribbles. This script use the exactly same scribble-based model but use a simple algorithm to synthesize scribbles from input images.
136
+
137
+ Prompt: "bag"
138
+ ![p](github_page/p9.png)
139
+
140
+ Prompt: "shose" (Note that "shose" is a typo; it should be "shoes". But it still seems to work.)
141
+ ![p](github_page/p10.png)
142
+
143
+ ## ControlNet with Human Pose
144
+
145
+ Stable Diffusion 1.5 + ControlNet (using human pose)
146
+
147
+ python gradio_pose2image.py
148
+
149
+ Apparently, this model deserves a better UI to directly manipulate pose skeleton. However, again, Gradio is somewhat difficult to customize. Right now you need to input an image and then the Openpose will detect the pose for you.
150
+
151
+ Prompt: "Chief in the kitchen"
152
+ ![p](github_page/p11.png)
153
+
154
+ Prompt: "An astronaut on the moon"
155
+ ![p](github_page/p12.png)
156
+
157
+ ## ControlNet with Semantic Segmentation
158
+
159
+ Stable Diffusion 1.5 + ControlNet (using semantic segmentation)
160
+
161
+ python gradio_seg2image.py
162
+
163
+ This model use ADE20K's segmentation protocol. Again, this model deserves a better UI to directly draw the segmentations. However, again, Gradio is somewhat difficult to customize. Right now you need to input an image and then a model called Uniformer will detect the segmentations for you. Just try it for more details.
164
+
165
+ Prompt: "House"
166
+ ![p](github_page/p13.png)
167
+
168
+ Prompt: "River"
169
+ ![p](github_page/p14.png)
170
+
171
+ ## ControlNet with Depth
172
+
173
+ Stable Diffusion 1.5 + ControlNet (using depth map)
174
+
175
+ python gradio_depth2image.py
176
+
177
+ Great! Now SD 1.5 also have a depth control. FINALLY. So many possibilities (considering SD1.5 has much more community models than SD2).
178
+
179
+ Note that different from Stability's model, the ControlNet receive the full 512×512 depth map, rather than 64×64 depth. Note that Stability's SD2 depth model use 64*64 depth maps. This means that the ControlNet will preserve more details in the depth map.
180
+
181
+ This is always a strength because if users do not want to preserve more details, they can simply use another SD to post-process an i2i. But if they want to preserve more details, ControlNet becomes their only choice. Again, SD2 uses 64×64 depth, we use 512×512.
182
+
183
+ Prompt: "Stormtrooper's lecture"
184
+ ![p](github_page/p15.png)
185
+
186
+ ## ControlNet with Normal Map
187
+
188
+ Stable Diffusion 1.5 + ControlNet (using normal map)
189
+
190
+ python gradio_normal2image.py
191
+
192
+ This model use normal map. Rightnow in the APP, the normal is computed from the midas depth map and a user threshold (to determine how many area is background with identity normal face to viewer, tune the "Normal background threshold" in the gradio app to get a feeling).
193
+
194
+ Prompt: "Cute toy"
195
+ ![p](github_page/p17.png)
196
+
197
+ Prompt: "Plaster statue of Abraham Lincoln"
198
+ ![p](github_page/p18.png)
199
+
200
+ Compared to depth model, this model seems to be a bit better at preserving the geometry. This is intuitive: minor details are not salient in depth maps, but are salient in normal maps. Below is the depth result with same inputs. You can see that the hairstyle of the man in the input image is modified by depth model, but preserved by the normal model.
201
+
202
+ Prompt: "Plaster statue of Abraham Lincoln"
203
+ ![p](github_page/p19.png)
204
+
205
+ ## ControlNet with Anime Line Drawing
206
+
207
+ We also trained a relatively simple ControlNet for anime line drawings. This tool may be useful for artistic creations. (Although the image details in the results is a bit modified, since it still diffuse latent images.)
208
+
209
+ This model is not available right now. We need to evaluate the potential risks before releasing this model.
210
+
211
+ ![p](github_page/p21.png)
212
+
213
+ # Annotate Your Own Data
214
+
215
+ We provide simple python scripts to process images.
216
+
217
+ [See a gradio example here](annotator.md).
218
+
219
+ # Train with Your Own Data
220
+
221
+ Training a ControlNet is as easy as (or even easier than) training a simple pix2pix.
222
+
223
+ [See the steps here](train.md).
224
+
225
+ # Citation
226
+
227
+ @misc{control2023,
228
+ author = "Lvmin Zhang and Maneesh Agrawala",
229
+ title = "Adding Conditional Control to Text-to-Image Diffusion Models",
230
+ month = "Feb",
231
+ year = "2022"
232
+ }
233
+
234
+ [Download the paper here](https://github.com/lllyasviel/ControlNet/raw/main/github_page/control.pdf).
annotator.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Automatic Annotations
2
+
3
+ We provide gradio examples to obtain annotations that are aligned to our pretrained production-ready models.
4
+
5
+ Just run
6
+
7
+ python gradio_annotator.py
8
+
9
+ Since everyone has different habit to organize their datasets, we do not hard code any scripts for batch processing. But "gradio_annotator.py" is written in a super readable way, and modifying it to annotate your images should be easy.
10
+
11
+ In the gradio UI of "gradio_annotator.py" we have the following interfaces:
12
+
13
+ ### Canny Edge
14
+
15
+ Be careful about "black edge and white background" or "white edge and black background".
16
+
17
+ ![p](github_page/a1.png)
18
+
19
+ ### HED Edge
20
+
21
+ Be careful about "black edge and white background" or "white edge and black background".
22
+
23
+ ![p](github_page/a2.png)
24
+
25
+ ### MLSD Edge
26
+
27
+ Be careful about "black edge and white background" or "white edge and black background".
28
+
29
+ ![p](github_page/a3.png)
30
+
31
+ ### MIDAS Depth and Normal
32
+
33
+ Be careful about RGB or BGR in normal maps.
34
+
35
+ ![p](github_page/a4.png)
36
+
37
+ ### Openpose
38
+
39
+ Be careful about RGB or BGR in pose maps.
40
+
41
+ For our production-ready model, the hand pose option is turned off.
42
+
43
+ ![p](github_page/a5.png)
44
+
45
+ ### Uniformer Segmentation
46
+
47
+ Be careful about RGB or BGR in segmentation maps.
48
+
49
+ ![p](github_page/a6.png)
cog.yaml ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Configuration for Cog ⚙️
2
+ # Reference: https://github.com/replicate/cog/blob/main/docs/yaml.md
3
+
4
+ build:
5
+ # set to true if your model requires a GPU
6
+ gpu: true
7
+
8
+ # a list of ubuntu apt packages to install
9
+ system_packages:
10
+ - "python3-opencv"
11
+ # - "libgl1-mesa-glx"
12
+ # - "libglib2.0-0"
13
+
14
+ # python version in the form '3.8' or '3.8.12'
15
+ python_version: "3.8"
16
+
17
+ # a list of packages in the format <package-name>==<version>
18
+ # packages required: torch torchvision numpy gradio albumentations opencv-contrib-python imageio imageio-ffmpeg pytorch-lightning omegaconf test-tube streamlit einops transformers webdataset kornia open_clip_torch invisible-watermark streamlit-drawable-canvas torchmetrics timm addict yapf prettytable
19
+ python_packages:
20
+ - "torch==1.13.0"
21
+ - "torchvision==0.14.0"
22
+ - "numpy==1.21.6"
23
+ - "gradio==3.18.0"
24
+ - "albumentations==1.2.1"
25
+ - "opencv-contrib-python==4.6.0.66"
26
+ - "imageio==2.9.0"
27
+ - "imageio-ffmpeg==0.4.8"
28
+ - "pytorch-lightning==1.9.1"
29
+ - "omegaconf==2.3.0"
30
+ - "test-tube==0.7.5"
31
+ - "streamlit==1.18.1"
32
+ - "einops==0.6.0"
33
+ - "transformers==4.26.1"
34
+ - "webdataset==0.2.33"
35
+ - "kornia==0.6.9"
36
+ - "open_clip_torch==2.11.1"
37
+ - "invisible-watermark==0.1.5"
38
+ - "streamlit-drawable-canvas==0.9.2"
39
+ - "torchmetrics==0.11.1"
40
+ - "timm==0.6.12"
41
+ - "addict==2.4.0"
42
+ - "yapf==0.32.0"
43
+ - "prettytable==3.6.0"
44
+
45
+ # commands run after the environment is setup
46
+ # run:
47
+ # - "echo env is ready!"
48
+ # - "echo another command if needed"
49
+
50
+ # predict.py defines how predictions are run on your model
51
+ predict: "predict.py:Predictor"
download_weights.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+
3
+ # add command line arg for model type
4
+ parser = argparse.ArgumentParser()
5
+ parser.add_argument("--model_type", type=str, default="canny", help="Model type to download")
6
+ # add a binary flag to wipe the weights folder
7
+ parser.add_argument("--wipe", action="store_true", help="Wipe the weights folder")
8
+ args = parser.parse_args()
9
+
10
+ MODEL_TYPE = args.model_type
11
+
12
+ from utils import model_dl_urls, annotator_dl_urls, download_model
13
+
14
+ for model_name in annotator_dl_urls.keys():
15
+ download_model(model_name, annotator_dl_urls)
16
+
17
+ download_model(MODEL_TYPE, model_dl_urls)
environment.yaml ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: control
2
+ channels:
3
+ - pytorch
4
+ - defaults
5
+ dependencies:
6
+ - python=3.8.5
7
+ - pip=20.3
8
+ - cudatoolkit=11.3
9
+ - pytorch=1.12.1
10
+ - torchvision=0.13.1
11
+ - numpy=1.23.1
12
+ - pip:
13
+ - gradio==3.16.2
14
+ - albumentations==1.3.0
15
+ - opencv-contrib-python==4.3.0.36
16
+ - imageio==2.9.0
17
+ - imageio-ffmpeg==0.4.2
18
+ - pytorch-lightning==1.5.0
19
+ - omegaconf==2.1.1
20
+ - test-tube>=0.7.5
21
+ - streamlit==1.12.1
22
+ - einops==0.3.0
23
+ - transformers==4.19.2
24
+ - webdataset==0.2.5
25
+ - kornia==0.6
26
+ - open_clip_torch==2.0.2
27
+ - invisible-watermark>=0.1.5
28
+ - streamlit-drawable-canvas==0.8.0
29
+ - torchmetrics==0.6.0
30
+ - timm==0.6.12
31
+ - addict==2.4.0
32
+ - yapf==0.32.0
33
+ - prettytable==3.6.0
gradio_annotator.py ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+
3
+ from annotator.util import resize_image, HWC3
4
+
5
+
6
+ model_canny = None
7
+
8
+
9
+ def canny(img, res, l, h):
10
+ img = resize_image(HWC3(img), res)
11
+ global model_canny
12
+ if model_canny is None:
13
+ from annotator.canny import apply_canny
14
+ model_canny = apply_canny
15
+ result = model_canny(img, l, h)
16
+ return [result]
17
+
18
+
19
+ model_hed = None
20
+
21
+
22
+ def hed(img, res):
23
+ img = resize_image(HWC3(img), res)
24
+ global model_hed
25
+ if model_hed is None:
26
+ from annotator.hed import apply_hed
27
+ model_hed = apply_hed
28
+ result = model_hed(img)
29
+ return [result]
30
+
31
+
32
+ model_mlsd = None
33
+
34
+
35
+ def mlsd(img, res, thr_v, thr_d):
36
+ img = resize_image(HWC3(img), res)
37
+ global model_mlsd
38
+ if model_mlsd is None:
39
+ from annotator.mlsd import apply_mlsd
40
+ model_mlsd = apply_mlsd
41
+ result = model_mlsd(img, thr_v, thr_d)
42
+ return [result]
43
+
44
+
45
+ model_midas = None
46
+
47
+
48
+ def midas(img, res, a):
49
+ img = resize_image(HWC3(img), res)
50
+ global model_midas
51
+ if model_midas is None:
52
+ from annotator.midas import apply_midas
53
+ model_midas = apply_midas
54
+ results = model_midas(img, a)
55
+ return results
56
+
57
+
58
+ model_openpose = None
59
+
60
+
61
+ def openpose(img, res, has_hand):
62
+ img = resize_image(HWC3(img), res)
63
+ global model_openpose
64
+ if model_openpose is None:
65
+ from annotator.openpose import apply_openpose
66
+ model_openpose = apply_openpose
67
+ result, _ = model_openpose(img, has_hand)
68
+ return [result]
69
+
70
+
71
+ model_uniformer = None
72
+
73
+
74
+ def uniformer(img, res):
75
+ img = resize_image(HWC3(img), res)
76
+ global model_uniformer
77
+ if model_uniformer is None:
78
+ from annotator.uniformer import apply_uniformer
79
+ model_uniformer = apply_uniformer
80
+ result = model_uniformer(img)
81
+ return [result]
82
+
83
+
84
+ block = gr.Blocks().queue()
85
+ with block:
86
+ with gr.Row():
87
+ gr.Markdown("## Canny Edge")
88
+ with gr.Row():
89
+ with gr.Column():
90
+ input_image = gr.Image(source='upload', type="numpy")
91
+ low_threshold = gr.Slider(label="low_threshold", minimum=1, maximum=255, value=100, step=1)
92
+ high_threshold = gr.Slider(label="high_threshold", minimum=1, maximum=255, value=200, step=1)
93
+ resolution = gr.Slider(label="resolution", minimum=256, maximum=1024, value=512, step=64)
94
+ run_button = gr.Button(label="Run")
95
+ with gr.Column():
96
+ gallery = gr.Gallery(label="Generated images", show_label=False).style(height="auto")
97
+ run_button.click(fn=canny, inputs=[input_image, resolution, low_threshold, high_threshold], outputs=[gallery])
98
+
99
+ with gr.Row():
100
+ gr.Markdown("## HED Edge")
101
+ with gr.Row():
102
+ with gr.Column():
103
+ input_image = gr.Image(source='upload', type="numpy")
104
+ resolution = gr.Slider(label="resolution", minimum=256, maximum=1024, value=512, step=64)
105
+ run_button = gr.Button(label="Run")
106
+ with gr.Column():
107
+ gallery = gr.Gallery(label="Generated images", show_label=False).style(height="auto")
108
+ run_button.click(fn=hed, inputs=[input_image, resolution], outputs=[gallery])
109
+
110
+ with gr.Row():
111
+ gr.Markdown("## MLSD Edge")
112
+ with gr.Row():
113
+ with gr.Column():
114
+ input_image = gr.Image(source='upload', type="numpy")
115
+ value_threshold = gr.Slider(label="value_threshold", minimum=0.01, maximum=2.0, value=0.1, step=0.01)
116
+ distance_threshold = gr.Slider(label="distance_threshold", minimum=0.01, maximum=20.0, value=0.1, step=0.01)
117
+ resolution = gr.Slider(label="resolution", minimum=256, maximum=1024, value=384, step=64)
118
+ run_button = gr.Button(label="Run")
119
+ with gr.Column():
120
+ gallery = gr.Gallery(label="Generated images", show_label=False).style(height="auto")
121
+ run_button.click(fn=mlsd, inputs=[input_image, resolution, value_threshold, distance_threshold], outputs=[gallery])
122
+
123
+ with gr.Row():
124
+ gr.Markdown("## MIDAS Depth and Normal")
125
+ with gr.Row():
126
+ with gr.Column():
127
+ input_image = gr.Image(source='upload', type="numpy")
128
+ alpha = gr.Slider(label="alpha", minimum=0.1, maximum=20.0, value=6.2, step=0.01)
129
+ resolution = gr.Slider(label="resolution", minimum=256, maximum=1024, value=384, step=64)
130
+ run_button = gr.Button(label="Run")
131
+ with gr.Column():
132
+ gallery = gr.Gallery(label="Generated images", show_label=False).style(height="auto")
133
+ run_button.click(fn=midas, inputs=[input_image, resolution, alpha], outputs=[gallery])
134
+
135
+ with gr.Row():
136
+ gr.Markdown("## Openpose")
137
+ with gr.Row():
138
+ with gr.Column():
139
+ input_image = gr.Image(source='upload', type="numpy")
140
+ hand = gr.Checkbox(label='detect hand', value=False)
141
+ resolution = gr.Slider(label="resolution", minimum=256, maximum=1024, value=512, step=64)
142
+ run_button = gr.Button(label="Run")
143
+ with gr.Column():
144
+ gallery = gr.Gallery(label="Generated images", show_label=False).style(height="auto")
145
+ run_button.click(fn=openpose, inputs=[input_image, resolution, hand], outputs=[gallery])
146
+
147
+
148
+ with gr.Row():
149
+ gr.Markdown("## Uniformer Segmentation")
150
+ with gr.Row():
151
+ with gr.Column():
152
+ input_image = gr.Image(source='upload', type="numpy")
153
+ resolution = gr.Slider(label="resolution", minimum=256, maximum=1024, value=512, step=64)
154
+ run_button = gr.Button(label="Run")
155
+ with gr.Column():
156
+ gallery = gr.Gallery(label="Generated images", show_label=False).style(height="auto")
157
+ run_button.click(fn=uniformer, inputs=[input_image, resolution], outputs=[gallery])
158
+
159
+
160
+ block.launch(server_name='0.0.0.0')
gradio_canny2image.py ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ import einops
3
+ import gradio as gr
4
+ import numpy as np
5
+ import torch
6
+
7
+ from cldm.hack import disable_verbosity
8
+ disable_verbosity()
9
+
10
+ from pytorch_lightning import seed_everything
11
+ from annotator.util import resize_image, HWC3
12
+ from annotator.canny import apply_canny
13
+ from cldm.model import create_model, load_state_dict
14
+ from ldm.models.diffusion.ddim import DDIMSampler
15
+
16
+ def process_canny(input_image, prompt, a_prompt, n_prompt, num_samples, image_resolution, ddim_steps, scale, seed, eta, low_threshold, high_threshold, model, ddim_sampler):
17
+ with torch.no_grad():
18
+ img = resize_image(HWC3(input_image), image_resolution)
19
+ H, W, C = img.shape
20
+
21
+ detected_map = apply_canny(img, low_threshold, high_threshold)
22
+ detected_map = HWC3(detected_map)
23
+
24
+ control = torch.from_numpy(detected_map.copy()).float().cuda() / 255.0
25
+ control = torch.stack([control for _ in range(num_samples)], dim=0)
26
+ control = einops.rearrange(control, 'b h w c -> b c h w').clone()
27
+
28
+ seed_everything(seed)
29
+
30
+ cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([prompt + ', ' + a_prompt] * num_samples)]}
31
+ un_cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
32
+ shape = (4, H // 8, W // 8)
33
+
34
+ samples, intermediates = ddim_sampler.sample(ddim_steps, num_samples,
35
+ shape, cond, verbose=False, eta=eta,
36
+ unconditional_guidance_scale=scale,
37
+ unconditional_conditioning=un_cond)
38
+ x_samples = model.decode_first_stage(samples)
39
+ x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cpu().numpy().clip(0, 255).astype(np.uint8)
40
+
41
+ results = [x_samples[i] for i in range(num_samples)]
42
+ return [255 - detected_map] + results
gradio_depth2image.py ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ import einops
3
+ import gradio as gr
4
+ import numpy as np
5
+ import torch
6
+
7
+ from cldm.hack import disable_verbosity
8
+ disable_verbosity()
9
+
10
+ from pytorch_lightning import seed_everything
11
+ from annotator.util import resize_image, HWC3
12
+ from annotator.midas import apply_midas
13
+ from cldm.model import create_model, load_state_dict
14
+ from ldm.models.diffusion.ddim import DDIMSampler
15
+
16
+ def process_depth(input_image, prompt, a_prompt, n_prompt, num_samples, image_resolution, detect_resolution, ddim_steps, scale, seed, eta, model, ddim_sampler):
17
+ with torch.no_grad():
18
+ input_image = HWC3(input_image)
19
+ detected_map, _ = apply_midas(resize_image(input_image, detect_resolution))
20
+ detected_map = HWC3(detected_map)
21
+ img = resize_image(input_image, image_resolution)
22
+ H, W, C = img.shape
23
+
24
+ detected_map = cv2.resize(detected_map, (W, H), interpolation=cv2.INTER_LINEAR)
25
+
26
+ control = torch.from_numpy(detected_map.copy()).float().cuda() / 255.0
27
+ control = torch.stack([control for _ in range(num_samples)], dim=0)
28
+ control = einops.rearrange(control, 'b h w c -> b c h w').clone()
29
+
30
+ seed_everything(seed)
31
+
32
+ cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([prompt + ', ' + a_prompt] * num_samples)]}
33
+ un_cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
34
+ shape = (4, H // 8, W // 8)
35
+
36
+ samples, intermediates = ddim_sampler.sample(ddim_steps, num_samples,
37
+ shape, cond, verbose=False, eta=eta,
38
+ unconditional_guidance_scale=scale,
39
+ unconditional_conditioning=un_cond)
40
+ x_samples = model.decode_first_stage(samples)
41
+ x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cpu().numpy().clip(0, 255).astype(np.uint8)
42
+
43
+ results = [x_samples[i] for i in range(num_samples)]
44
+ return [detected_map] + results
gradio_fake_scribble2image.py ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ import einops
3
+ import gradio as gr
4
+ import numpy as np
5
+ import torch
6
+
7
+ from cldm.hack import disable_verbosity
8
+ disable_verbosity()
9
+
10
+ from pytorch_lightning import seed_everything
11
+ from annotator.util import resize_image, HWC3
12
+ from annotator.hed import apply_hed, nms
13
+ from cldm.model import create_model, load_state_dict
14
+ from ldm.models.diffusion.ddim import DDIMSampler
15
+
16
+ def process_scribble(input_image, prompt, a_prompt, n_prompt, num_samples, image_resolution, detect_resolution, ddim_steps, scale, seed, eta, model, ddim_sampler):
17
+ with torch.no_grad():
18
+ input_image = HWC3(input_image)
19
+ detected_map = apply_hed(resize_image(input_image, detect_resolution))
20
+ detected_map = HWC3(detected_map)
21
+ img = resize_image(input_image, image_resolution)
22
+ H, W, C = img.shape
23
+
24
+ detected_map = cv2.resize(detected_map, (W, H), interpolation=cv2.INTER_LINEAR)
25
+ detected_map = nms(detected_map, 127, 3.0)
26
+ detected_map = cv2.GaussianBlur(detected_map, (0, 0), 3.0)
27
+ detected_map[detected_map > 4] = 255
28
+ detected_map[detected_map < 255] = 0
29
+
30
+ control = torch.from_numpy(detected_map.copy()).float().cuda() / 255.0
31
+ control = torch.stack([control for _ in range(num_samples)], dim=0)
32
+ control = einops.rearrange(control, 'b h w c -> b c h w').clone()
33
+
34
+ seed_everything(seed)
35
+
36
+ cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([prompt + ', ' + a_prompt] * num_samples)]}
37
+ un_cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
38
+ shape = (4, H // 8, W // 8)
39
+
40
+ samples, intermediates = ddim_sampler.sample(ddim_steps, num_samples,
41
+ shape, cond, verbose=False, eta=eta,
42
+ unconditional_guidance_scale=scale,
43
+ unconditional_conditioning=un_cond)
44
+ x_samples = model.decode_first_stage(samples)
45
+ x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cpu().numpy().clip(0, 255).astype(np.uint8)
46
+
47
+ results = [x_samples[i] for i in range(num_samples)]
48
+ return [255 - detected_map] + results
gradio_hed2image.py ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ import einops
3
+ import gradio as gr
4
+ import numpy as np
5
+ import torch
6
+
7
+ from cldm.hack import disable_verbosity
8
+ disable_verbosity()
9
+
10
+ from pytorch_lightning import seed_everything
11
+ from annotator.util import resize_image, HWC3
12
+ from annotator.hed import apply_hed
13
+ from cldm.model import create_model, load_state_dict
14
+ from ldm.models.diffusion.ddim import DDIMSampler
15
+
16
+ def process_hed(input_image, prompt, a_prompt, n_prompt, num_samples, image_resolution, detect_resolution, ddim_steps, scale, seed, eta, model, ddim_sampler):
17
+ with torch.no_grad():
18
+ input_image = HWC3(input_image)
19
+ detected_map = apply_hed(resize_image(input_image, detect_resolution))
20
+ detected_map = HWC3(detected_map)
21
+ img = resize_image(input_image, image_resolution)
22
+ H, W, C = img.shape
23
+
24
+ detected_map = cv2.resize(detected_map, (W, H), interpolation=cv2.INTER_LINEAR)
25
+
26
+ control = torch.from_numpy(detected_map.copy()).float().cuda() / 255.0
27
+ control = torch.stack([control for _ in range(num_samples)], dim=0)
28
+ control = einops.rearrange(control, 'b h w c -> b c h w').clone()
29
+
30
+ seed_everything(seed)
31
+
32
+ cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([prompt + ', ' + a_prompt] * num_samples)]}
33
+ un_cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
34
+ shape = (4, H // 8, W // 8)
35
+
36
+ samples, intermediates = ddim_sampler.sample(ddim_steps, num_samples,
37
+ shape, cond, verbose=False, eta=eta,
38
+ unconditional_guidance_scale=scale,
39
+ unconditional_conditioning=un_cond)
40
+ x_samples = model.decode_first_stage(samples)
41
+ x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cpu().numpy().clip(0, 255).astype(np.uint8)
42
+
43
+ results = [x_samples[i] for i in range(num_samples)]
44
+ return [detected_map] + results
gradio_hough2image.py ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ import einops
3
+ import gradio as gr
4
+ import numpy as np
5
+ import torch
6
+
7
+ from cldm.hack import disable_verbosity
8
+ disable_verbosity()
9
+
10
+ from pytorch_lightning import seed_everything
11
+ from annotator.util import resize_image, HWC3
12
+ from annotator.mlsd import apply_mlsd
13
+ from cldm.model import create_model, load_state_dict
14
+ from ldm.models.diffusion.ddim import DDIMSampler
15
+
16
+ def process_mlsd(input_image, prompt, a_prompt, n_prompt, num_samples, image_resolution, detect_resolution, ddim_steps, scale, seed, eta, value_threshold, distance_threshold, model, ddim_sampler):
17
+ with torch.no_grad():
18
+ input_image = HWC3(input_image)
19
+ detected_map = apply_mlsd(resize_image(input_image, detect_resolution), value_threshold, distance_threshold)
20
+ detected_map = HWC3(detected_map)
21
+ img = resize_image(input_image, image_resolution)
22
+ H, W, C = img.shape
23
+
24
+ detected_map = cv2.resize(detected_map, (W, H), interpolation=cv2.INTER_NEAREST)
25
+
26
+ control = torch.from_numpy(detected_map.copy()).float().cuda() / 255.0
27
+ control = torch.stack([control for _ in range(num_samples)], dim=0)
28
+ control = einops.rearrange(control, 'b h w c -> b c h w').clone()
29
+
30
+ seed_everything(seed)
31
+
32
+ cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([prompt + ', ' + a_prompt] * num_samples)]}
33
+ un_cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
34
+ shape = (4, H // 8, W // 8)
35
+
36
+ samples, intermediates = ddim_sampler.sample(ddim_steps, num_samples,
37
+ shape, cond, verbose=False, eta=eta,
38
+ unconditional_guidance_scale=scale,
39
+ unconditional_conditioning=un_cond)
40
+ x_samples = model.decode_first_stage(samples)
41
+ x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cpu().numpy().clip(0, 255).astype(np.uint8)
42
+
43
+ results = [x_samples[i] for i in range(num_samples)]
44
+ return [255 - cv2.dilate(detected_map, np.ones(shape=(3, 3), dtype=np.uint8), iterations=1)] + results
gradio_normal2image.py ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ import einops
3
+ import gradio as gr
4
+ import numpy as np
5
+ import torch
6
+
7
+ from cldm.hack import disable_verbosity
8
+ disable_verbosity()
9
+
10
+ from pytorch_lightning import seed_everything
11
+ from annotator.util import resize_image, HWC3
12
+ from annotator.midas import apply_midas
13
+ from cldm.model import create_model, load_state_dict
14
+ from ldm.models.diffusion.ddim import DDIMSampler
15
+
16
+ def process_normal(input_image, prompt, a_prompt, n_prompt, num_samples, image_resolution, detect_resolution, ddim_steps, scale, seed, eta, bg_threshold, model, ddim_sampler):
17
+ with torch.no_grad():
18
+ input_image = HWC3(input_image)
19
+ _, detected_map = apply_midas(resize_image(input_image, detect_resolution), bg_th=bg_threshold)
20
+ detected_map = HWC3(detected_map)
21
+ img = resize_image(input_image, image_resolution)
22
+ H, W, C = img.shape
23
+
24
+ detected_map = cv2.resize(detected_map, (W, H), interpolation=cv2.INTER_LINEAR)
25
+
26
+ control = torch.from_numpy(detected_map[:, :, ::-1].copy()).float().cuda() / 255.0
27
+ control = torch.stack([control for _ in range(num_samples)], dim=0)
28
+ control = einops.rearrange(control, 'b h w c -> b c h w').clone()
29
+
30
+ seed_everything(seed)
31
+
32
+ cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([prompt + ', ' + a_prompt] * num_samples)]}
33
+ un_cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
34
+ shape = (4, H // 8, W // 8)
35
+
36
+ samples, intermediates = ddim_sampler.sample(ddim_steps, num_samples,
37
+ shape, cond, verbose=False, eta=eta,
38
+ unconditional_guidance_scale=scale,
39
+ unconditional_conditioning=un_cond)
40
+ x_samples = model.decode_first_stage(samples)
41
+ x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cpu().numpy().clip(0, 255).astype(np.uint8)
42
+
43
+ results = [x_samples[i] for i in range(num_samples)]
44
+ return [detected_map] + results
gradio_pose2image.py ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ import einops
3
+ import gradio as gr
4
+ import numpy as np
5
+ import torch
6
+
7
+ from cldm.hack import disable_verbosity
8
+ disable_verbosity()
9
+
10
+ from pytorch_lightning import seed_everything
11
+ from annotator.util import resize_image, HWC3
12
+ from annotator.openpose import apply_openpose
13
+ from cldm.model import create_model, load_state_dict
14
+ from ldm.models.diffusion.ddim import DDIMSampler
15
+
16
+ def process_pose(input_image, prompt, a_prompt, n_prompt, num_samples, image_resolution, detect_resolution, ddim_steps, scale, seed, eta, model, ddim_sampler):
17
+ with torch.no_grad():
18
+ input_image = HWC3(input_image)
19
+ detected_map, _ = apply_openpose(resize_image(input_image, detect_resolution))
20
+ detected_map = HWC3(detected_map)
21
+ img = resize_image(input_image, image_resolution)
22
+ H, W, C = img.shape
23
+
24
+ detected_map = cv2.resize(detected_map, (W, H), interpolation=cv2.INTER_NEAREST)
25
+
26
+ control = torch.from_numpy(detected_map.copy()).float().cuda() / 255.0
27
+ control = torch.stack([control for _ in range(num_samples)], dim=0)
28
+ control = einops.rearrange(control, 'b h w c -> b c h w').clone()
29
+
30
+ seed_everything(seed)
31
+
32
+ cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([prompt + ', ' + a_prompt] * num_samples)]}
33
+ un_cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
34
+ shape = (4, H // 8, W // 8)
35
+
36
+ samples, intermediates = ddim_sampler.sample(ddim_steps, num_samples,
37
+ shape, cond, verbose=False, eta=eta,
38
+ unconditional_guidance_scale=scale,
39
+ unconditional_conditioning=un_cond)
40
+ x_samples = model.decode_first_stage(samples)
41
+ x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cpu().numpy().clip(0, 255).astype(np.uint8)
42
+
43
+ results = [x_samples[i] for i in range(num_samples)]
44
+ return [detected_map] + results
gradio_scribble2image.py ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ import einops
3
+ import gradio as gr
4
+ import numpy as np
5
+ import torch
6
+
7
+ from cldm.hack import disable_verbosity
8
+ disable_verbosity()
9
+
10
+ from pytorch_lightning import seed_everything
11
+ from annotator.util import resize_image, HWC3
12
+ from cldm.model import create_model, load_state_dict
13
+ from ldm.models.diffusion.ddim import DDIMSampler
14
+
15
+ def process_scribble(input_image, prompt, a_prompt, n_prompt, num_samples, image_resolution, ddim_steps, scale, seed, eta, model, ddim_sampler):
16
+ with torch.no_grad():
17
+ img = resize_image(HWC3(input_image), image_resolution)
18
+ H, W, C = img.shape
19
+
20
+ detected_map = np.zeros_like(img, dtype=np.uint8)
21
+ detected_map[np.min(img, axis=2) < 127] = 255
22
+
23
+ control = torch.from_numpy(detected_map.copy()).float().cuda() / 255.0
24
+ control = torch.stack([control for _ in range(num_samples)], dim=0)
25
+ control = einops.rearrange(control, 'b h w c -> b c h w').clone()
26
+
27
+ seed_everything(seed)
28
+
29
+ cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([prompt + ', ' + a_prompt] * num_samples)]}
30
+ un_cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
31
+ shape = (4, H // 8, W // 8)
32
+
33
+ samples, intermediates = ddim_sampler.sample(ddim_steps, num_samples,
34
+ shape, cond, verbose=False, eta=eta,
35
+ unconditional_guidance_scale=scale,
36
+ unconditional_conditioning=un_cond)
37
+ x_samples = model.decode_first_stage(samples)
38
+ x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cpu().numpy().clip(0, 255).astype(np.uint8)
39
+
40
+ results = [x_samples[i] for i in range(num_samples)]
41
+ return [255 - detected_map] + results
gradio_scribble2image_interactive.py ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ import einops
3
+ import gradio as gr
4
+ import numpy as np
5
+ import torch
6
+
7
+ from cldm.hack import disable_verbosity
8
+ disable_verbosity()
9
+
10
+ from pytorch_lightning import seed_everything
11
+ from annotator.util import resize_image, HWC3
12
+ from cldm.model import create_model, load_state_dict
13
+ from ldm.models.diffusion.ddim import DDIMSampler
14
+
15
+ def process(input_image, prompt, a_prompt, n_prompt, num_samples, image_resolution, ddim_steps, scale, seed, eta):
16
+ with torch.no_grad():
17
+ img = resize_image(HWC3(input_image['mask'][:, :, 0]), image_resolution)
18
+ H, W, C = img.shape
19
+
20
+ detected_map = np.zeros_like(img, dtype=np.uint8)
21
+ detected_map[np.min(img, axis=2) > 127] = 255
22
+
23
+ control = torch.from_numpy(detected_map.copy()).float().cuda() / 255.0
24
+ control = torch.stack([control for _ in range(num_samples)], dim=0)
25
+ control = einops.rearrange(control, 'b h w c -> b c h w').clone()
26
+
27
+ seed_everything(seed)
28
+
29
+ cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([prompt + ', ' + a_prompt] * num_samples)]}
30
+ un_cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
31
+ shape = (4, H // 8, W // 8)
32
+
33
+ samples, intermediates = ddim_sampler.sample(ddim_steps, num_samples,
34
+ shape, cond, verbose=False, eta=eta,
35
+ unconditional_guidance_scale=scale,
36
+ unconditional_conditioning=un_cond)
37
+ x_samples = model.decode_first_stage(samples)
38
+ x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cpu().numpy().clip(0, 255).astype(np.uint8)
39
+
40
+ results = [x_samples[i] for i in range(num_samples)]
41
+ return [255 - detected_map] + results
42
+
43
+
44
+ def create_canvas(w, h):
45
+ return np.zeros(shape=(h, w, 3), dtype=np.uint8) + 255
gradio_seg2image.py ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ import einops
3
+ import gradio as gr
4
+ import numpy as np
5
+ import torch
6
+
7
+ from cldm.hack import disable_verbosity
8
+ disable_verbosity()
9
+
10
+ from pytorch_lightning import seed_everything
11
+ from annotator.util import resize_image, HWC3
12
+ from annotator.uniformer import apply_uniformer
13
+ from cldm.model import create_model, load_state_dict
14
+ from ldm.models.diffusion.ddim import DDIMSampler
15
+
16
+ def process_seg(input_image, prompt, a_prompt, n_prompt, num_samples, image_resolution, detect_resolution, ddim_steps, scale, seed, eta, model, ddim_sampler):
17
+ with torch.no_grad():
18
+ input_image = HWC3(input_image)
19
+ detected_map = apply_uniformer(resize_image(input_image, detect_resolution))
20
+ img = resize_image(input_image, image_resolution)
21
+ H, W, C = img.shape
22
+
23
+ detected_map = cv2.resize(detected_map, (W, H), interpolation=cv2.INTER_NEAREST)
24
+
25
+ control = torch.from_numpy(detected_map.copy()).float().cuda() / 255.0
26
+ control = torch.stack([control for _ in range(num_samples)], dim=0)
27
+ control = einops.rearrange(control, 'b h w c -> b c h w').clone()
28
+
29
+ seed_everything(seed)
30
+
31
+ cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([prompt + ', ' + a_prompt] * num_samples)]}
32
+ un_cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
33
+ shape = (4, H // 8, W // 8)
34
+
35
+ samples, intermediates = ddim_sampler.sample(ddim_steps, num_samples,
36
+ shape, cond, verbose=False, eta=eta,
37
+ unconditional_guidance_scale=scale,
38
+ unconditional_conditioning=un_cond)
39
+ x_samples = model.decode_first_stage(samples)
40
+ x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cpu().numpy().clip(0, 255).astype(np.uint8)
41
+
42
+ results = [x_samples[i] for i in range(num_samples)]
43
+ return [detected_map] + results
output.0.png ADDED
output.1.png ADDED
predict.py ADDED
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Prediction interface for Cog ⚙️
2
+ # https://github.com/replicate/cog/resolve/main/docs/python.md
3
+
4
+ from cog import BasePredictor, Input, Path
5
+ import os
6
+ from subprocess import call
7
+ from cldm.model import create_model, load_state_dict
8
+ from ldm.models.diffusion.ddim import DDIMSampler
9
+ from PIL import Image
10
+ import numpy as np
11
+ from typing import List
12
+ from utils import get_state_dict_path, download_model, model_dl_urls, annotator_dl_urls
13
+
14
+ MODEL_TYPE = "openpose"
15
+
16
+ if MODEL_TYPE == "canny":
17
+ from gradio_canny2image import process_canny
18
+ elif MODEL_TYPE == "depth":
19
+ from gradio_depth2image import process_depth
20
+ elif MODEL_TYPE == "hed":
21
+ from gradio_hed2image import process_hed
22
+ elif MODEL_TYPE == "normal":
23
+ from gradio_normal2image import process_normal
24
+ elif MODEL_TYPE == "mlsd":
25
+ from gradio_hough2image import process_mlsd
26
+ elif MODEL_TYPE == "scribble":
27
+ from gradio_scribble2image import process_scribble
28
+ elif MODEL_TYPE == "seg":
29
+ from gradio_seg2image import process_seg
30
+ elif MODEL_TYPE == "openpose":
31
+ from gradio_pose2image import process_pose
32
+
33
+ class Predictor(BasePredictor):
34
+ def setup(self):
35
+ """Load the model into memory to make running multiple predictions efficient"""
36
+ self.model = create_model('./models/cldm_v15.yaml').cuda()
37
+ self.model.load_state_dict(load_state_dict(get_state_dict_path(MODEL_TYPE), location='cuda'))
38
+ self.ddim_sampler = DDIMSampler(self.model)
39
+
40
+ def predict(
41
+ self,
42
+ image: Path = Input(description="Input image"),
43
+ prompt: str = Input(description="Prompt for the model"),
44
+ num_samples: str = Input(
45
+ description="Number of samples (higher values may OOM)",
46
+ choices=['1', '4'],
47
+ default='1'
48
+ ),
49
+ image_resolution: str = Input(
50
+ description="Image resolution to be generated",
51
+ choices = ['256', '512', '768'],
52
+ default='512'
53
+ ),
54
+ low_threshold: int = Input(description="Canny line detection low threshold", default=100, ge=1, le=255), # only applicable when model type is 'canny'
55
+ high_threshold: int = Input(description="Canny line detection high threshold", default=200, ge=1, le=255), # only applicable when model type is 'canny'
56
+ ddim_steps: int = Input(description="Steps", default=20),
57
+ scale: float = Input(description="Scale for classifier-free guidance", default=9.0, ge=0.1, le=30.0),
58
+ seed: int = Input(description="Seed", default=None),
59
+ eta: float = Input(description="Controls the amount of noise that is added to the input data during the denoising diffusion process. Higher value -> more noise", default=0.0),
60
+ a_prompt: str = Input(description="Additional text to be appended to prompt", default="best quality, extremely detailed"),
61
+ n_prompt: str = Input(description="Negative Prompt", default="longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality"),
62
+ detect_resolution: int = Input(description="Resolution at which detection method will be applied)", default=512, ge=128, le=1024), # only applicable when model type is 'HED', 'seg', or 'MLSD'
63
+ # bg_threshold: float = Input(description="Background Threshold (only applicable when model type is 'normal')", default=0.0, ge=0.0, le=1.0), # only applicable when model type is 'normal'
64
+ # value_threshold: float = Input(description="Value Threshold (only applicable when model type is 'MLSD')", default=0.1, ge=0.01, le=2.0), # only applicable when model type is 'MLSD'
65
+ # distance_threshold: float = Input(description="Distance Threshold (only applicable when model type is 'MLSD')", default=0.1, ge=0.01, le=20.0), # only applicable when model type is 'MLSD'
66
+ ) -> List[Path]:
67
+ """Run a single prediction on the model"""
68
+ num_samples = int(num_samples)
69
+ image_resolution = int(image_resolution)
70
+ if not seed:
71
+ seed = np.random.randint(1000000)
72
+ else:
73
+ seed = int(seed)
74
+
75
+ # load input_image
76
+ input_image = Image.open(image)
77
+ # convert to numpy
78
+ input_image = np.array(input_image)
79
+
80
+ if MODEL_TYPE == "canny":
81
+ outputs = process_canny(
82
+ input_image,
83
+ prompt,
84
+ a_prompt,
85
+ n_prompt,
86
+ num_samples,
87
+ image_resolution,
88
+ ddim_steps,
89
+ scale,
90
+ seed,
91
+ eta,
92
+ low_threshold,
93
+ high_threshold,
94
+ self.model,
95
+ self.ddim_sampler,
96
+ )
97
+ elif MODEL_TYPE == "depth":
98
+ outputs = process_depth(
99
+ input_image,
100
+ prompt,
101
+ a_prompt,
102
+ n_prompt,
103
+ num_samples,
104
+ image_resolution,
105
+ detect_resolution,
106
+ ddim_steps,
107
+ scale,
108
+ seed,
109
+ eta,
110
+ self.model,
111
+ self.ddim_sampler,
112
+ )
113
+ elif MODEL_TYPE == "hed":
114
+ outputs = process_hed(
115
+ input_image,
116
+ prompt,
117
+ a_prompt,
118
+ n_prompt,
119
+ num_samples,
120
+ image_resolution,
121
+ detect_resolution,
122
+ ddim_steps,
123
+ scale,
124
+ seed,
125
+ eta,
126
+ self.model,
127
+ self.ddim_sampler,
128
+ )
129
+ elif MODEL_TYPE == "normal":
130
+ outputs = process_normal(
131
+ input_image,
132
+ prompt,
133
+ a_prompt,
134
+ n_prompt,
135
+ num_samples,
136
+ image_resolution,
137
+ ddim_steps,
138
+ scale,
139
+ seed,
140
+ eta,
141
+ bg_threshold,
142
+ self.model,
143
+ self.ddim_sampler,
144
+ )
145
+ elif MODEL_TYPE == "mlsd":
146
+ outputs = process_mlsd(
147
+ input_image,
148
+ prompt,
149
+ a_prompt,
150
+ n_prompt,
151
+ num_samples,
152
+ image_resolution,
153
+ detect_resolution,
154
+ ddim_steps,
155
+ scale,
156
+ seed,
157
+ eta,
158
+ value_threshold,
159
+ distance_threshold,
160
+ self.model,
161
+ self.ddim_sampler,
162
+ )
163
+ elif MODEL_TYPE == "scribble":
164
+ outputs = process_scribble(
165
+ input_image,
166
+ prompt,
167
+ a_prompt,
168
+ n_prompt,
169
+ num_samples,
170
+ image_resolution,
171
+ ddim_steps,
172
+ scale,
173
+ seed,
174
+ eta,
175
+ self.model,
176
+ self.ddim_sampler,
177
+ )
178
+ elif MODEL_TYPE == "seg":
179
+ outputs = process_seg(
180
+ input_image,
181
+ prompt,
182
+ a_prompt,
183
+ n_prompt,
184
+ num_samples,
185
+ image_resolution,
186
+ detect_resolution,
187
+ ddim_steps,
188
+ scale,
189
+ seed,
190
+ eta,
191
+ self.model,
192
+ self.ddim_sampler,
193
+ )
194
+ elif MODEL_TYPE == "openpose":
195
+ outputs = process_pose(
196
+ input_image,
197
+ prompt,
198
+ a_prompt,
199
+ n_prompt,
200
+ num_samples,
201
+ image_resolution,
202
+ detect_resolution,
203
+ ddim_steps,
204
+ scale,
205
+ seed,
206
+ eta,
207
+ self.model,
208
+ self.ddim_sampler,
209
+ )
210
+
211
+ # outputs from list to PIL
212
+ outputs = [Image.fromarray(output) for output in outputs]
213
+ # save outputs to file
214
+ outputs = [output.save(f"tmp/output_{i}.png") for i, output in enumerate(outputs)]
215
+ # return paths to output files
216
+ return [Path(f"tmp/output_{i}.png") for i in range(len(outputs))]
tool_add_control.py ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ import os
3
+
4
+ assert len(sys.argv) == 3, 'Args are wrong.'
5
+
6
+ input_path = sys.argv[1]
7
+ output_path = sys.argv[2]
8
+
9
+ assert os.path.exists(input_path), 'Input model does not exist.'
10
+ assert not os.path.exists(output_path), 'Output filename already exists.'
11
+ assert os.path.exists(os.path.dirname(output_path)), 'Output path is not valid.'
12
+
13
+ import torch
14
+ from cldm.model import create_model
15
+
16
+
17
+ def get_node_name(name, parent_name):
18
+ if len(name) <= len(parent_name):
19
+ return False, ''
20
+ p = name[:len(parent_name)]
21
+ if p != parent_name:
22
+ return False, ''
23
+ return True, name[len(parent_name):]
24
+
25
+
26
+ model = create_model(config_path='./models/cldm_v15.yaml')
27
+
28
+ pretrained_weights = torch.load(input_path)
29
+ if 'state_dict' in pretrained_weights:
30
+ pretrained_weights = pretrained_weights['state_dict']
31
+
32
+ scratch_dict = model.state_dict()
33
+
34
+ target_dict = {}
35
+ for k in scratch_dict.keys():
36
+ is_control, name = get_node_name(k, 'control_')
37
+ if is_control:
38
+ copy_k = 'model.diffusion_' + name
39
+ else:
40
+ copy_k = k
41
+ if copy_k in pretrained_weights:
42
+ target_dict[k] = pretrained_weights[copy_k].clone()
43
+ else:
44
+ target_dict[k] = scratch_dict[k].clone()
45
+ print(f'These weights are newly added: {k}')
46
+
47
+ model.load_state_dict(target_dict, strict=True)
48
+ torch.save(model.state_dict(), output_path)
49
+ print('Done.')
train.md ADDED
@@ -0,0 +1,251 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Train a ControlNet to Control SD
2
+
3
+ You are here because you want to control SD in your own way, maybe you have an idea for your perfect research project, and you will annotate some data or have already annotated your own dataset automatically or manually. Herein, the control can be anything that can be converted to images, such as edges, keypoints, segments, etc.
4
+
5
+ Before moving on to your own dataset, we highly recommend to first try the toy dataset, Fill50K, as a sanity check. This will help you get a "feeling" for the training. You will know how long it will take for the model to converge and whether your device will be able to complete the training in an acceptable amount of time. And what it "feels" like when the model converges.
6
+
7
+ We hope that after you read this page, you will find that training a ControlNet is as easy as (or easier than) training a pix2pix.
8
+
9
+ ## Step 0 - Design your control
10
+
11
+ Let us take a look at a very simple task to control SD to fill color in circles.
12
+
13
+ ![p](github_page/t1.png)
14
+
15
+ This is simple: we want to control SD to fill a circle with colors, and the prompt contains some description of our target.
16
+
17
+ Stable diffusion is trained on billions of images, and it already knows what is "cyan", what is "circle", what is "pink", and what is "background".
18
+
19
+ But it does not know the meaning of that "Control Image (Source Image)". Our target is to let it know.
20
+
21
+ ## Step 1 - Get a dataset
22
+
23
+ Just download the Fill50K dataset from [our huggingface page](https://huggingface.co/lllyasviel/ControlNet) (training/fill50k.zip, the file is only 200M!). Make sure that the data is decompressed as
24
+
25
+ ControlNet/training/fill50k/prompt.json
26
+ ControlNet/training/fill50k/source/X.png
27
+ ControlNet/training/fill50k/target/X.png
28
+
29
+ In the folder "fill50k/source", you will have 50k images of circle lines.
30
+
31
+ ![p](github_page/t2.png)
32
+
33
+ In the folder "fill50k/target", you will have 50k images of filled circles.
34
+
35
+ ![p](github_page/t3.png)
36
+
37
+ In the "fill50k/prompt.json", you will have their filenames and prompts. Each prompt is like "a balabala color circle in some other color background."
38
+
39
+ ![p](github_page/t4.png)
40
+
41
+ ## Step 2 - Load the dataset
42
+
43
+ Then you need to write a simple script to read this dataset for pytorch. (In fact we have written it for you in "tutorial_dataset.py".)
44
+
45
+ ```python
46
+ import json
47
+ import cv2
48
+ import numpy as np
49
+
50
+ from torch.utils.data import Dataset
51
+
52
+
53
+ class MyDataset(Dataset):
54
+ def __init__(self):
55
+ self.data = []
56
+ with open('./training/fill50k/prompt.json', 'rt') as f:
57
+ for line in f:
58
+ self.data.append(json.loads(line))
59
+
60
+ def __len__(self):
61
+ return len(self.data)
62
+
63
+ def __getitem__(self, idx):
64
+ item = self.data[idx]
65
+
66
+ source_filename = item['source']
67
+ target_filename = item['target']
68
+ prompt = item['prompt']
69
+
70
+ source = cv2.imread('./training/fill50k/' + source_filename)
71
+ target = cv2.imread('./training/fill50k/' + target_filename)
72
+
73
+ # Do not forget that OpenCV read images in BGR order.
74
+ source = cv2.cvtColor(source, cv2.COLOR_BGR2RGB)
75
+ target = cv2.cvtColor(target, cv2.COLOR_BGR2RGB)
76
+
77
+ # Normalize images to [-1, 1].
78
+ source = (source.astype(np.float32) / 127.5) - 1.0
79
+ target = (target.astype(np.float32) / 127.5) - 1.0
80
+
81
+ return dict(jpg=target, txt=prompt, hint=source)
82
+
83
+ ```
84
+
85
+ This will make your dataset into an array-like object in python. You can test this dataset simply by accessing the array, like this
86
+
87
+ ```python
88
+ from tutorial_dataset import MyDataset
89
+
90
+ dataset = MyDataset()
91
+ print(len(dataset))
92
+
93
+ item = dataset[1234]
94
+ jpg = item['jpg']
95
+ txt = item['txt']
96
+ hint = item['hint']
97
+ print(txt)
98
+ print(jpg.shape)
99
+ print(hint.shape)
100
+
101
+ ```
102
+
103
+ The outputs of this simple test on my machine are
104
+
105
+ 50000
106
+ burly wood circle with orange background
107
+ (512, 512, 3)
108
+ (512, 512, 3)
109
+
110
+ And this code is in "tutorial_dataset_test.py".
111
+
112
+ In this way, the dataset is an array-like object with 50000 items. Each item is a dict with three entry "jpg", "txt", and "hint". The "jpg" is the target image, the "hint" is the control image, and the "txt" is the prompt.
113
+
114
+ Do not ask us why we use these three names - this is related to the dark history of a library called LDM.
115
+
116
+ ## Step 3 - What SD model do you want to control?
117
+
118
+ Then you need to decide which Stable Diffusion Model you want to control. In this example, we will just use standard SD1.5. You can download it from the [official page of Stability](https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main). You want the file "v1-5-pruned.ckpt".
119
+
120
+ Then you need to attach a control net to the SD model. The architecture is
121
+
122
+ ![img](github_page/sd.png)
123
+
124
+ Note that all weights inside the ControlNet are also copied from SD so that no layer is trained from scratch, and you are still finetuning the entire model.
125
+
126
+ We provide a simple script for you to achieve this easily. If your SD filename is "./models/v1-5-pruned.ckpt" and you want the script to save the processed model (SD+ControlNet) at location "./models/control_sd15_ini.ckpt", you can just run:
127
+
128
+ python tool_add_control.py ./models/v1-5-pruned.ckpt ./models/control_sd15_ini.ckpt
129
+
130
+ You may also use other filenames as long as the command is "python tool_add_control.py input_path output_path".
131
+
132
+ This is the correct output from my machine:
133
+
134
+ ![img](github_page/t5.png)
135
+
136
+ ## Step 4 - Train!
137
+
138
+ Happy! We finally come to the most exciting part: training!
139
+
140
+ The training code in "tutorial_train.py" is actually surprisingly simple:
141
+
142
+ ```python
143
+ import pytorch_lightning as pl
144
+ from torch.utils.data import DataLoader
145
+ from tutorial_dataset import MyDataset
146
+ from cldm.logger import ImageLogger
147
+ from cldm.model import create_model, load_state_dict
148
+
149
+
150
+ # Configs
151
+ resume_path = './models/control_sd15_ini.ckpt'
152
+ batch_size = 4
153
+ logger_freq = 300
154
+ learning_rate = 1e-5
155
+ sd_locked = True
156
+ only_mid_control = False
157
+
158
+
159
+ # First use cpu to load models. Pytorch Lightning will automatically move it to GPUs.
160
+ model = create_model('./models/cldm_v15.yaml').cpu()
161
+ model.load_state_dict(load_state_dict(resume_path, location='cpu'))
162
+ model.learning_rate = learning_rate
163
+ model.sd_locked = sd_locked
164
+ model.only_mid_control = only_mid_control
165
+
166
+
167
+ # Misc
168
+ dataset = MyDataset()
169
+ dataloader = DataLoader(dataset, num_workers=0, batch_size=batch_size, shuffle=True)
170
+ logger = ImageLogger(batch_frequency=logger_freq)
171
+ trainer = pl.Trainer(gpus=1, precision=32, callbacks=[logger])
172
+
173
+
174
+ # Train!
175
+ trainer.fit(model, dataloader)
176
+
177
+ ```
178
+
179
+ Thanks to our organized dataset pytorch object and the power of pytorch_lightning, the entire code is just super short.
180
+
181
+ Now, you may take a look at [Pytorch Lightning Official DOC](https://pytorch-lightning.readthedocs.io/en/latest/api/pytorch_lightning.trainer.trainer.Trainer.html?highlight=trainer) to find out how to enable many useful features like gradient accumulation, multiple GPU training, accelerated dataset loading, flexible checkpoint saving, etc. All these only need about one line of code. Great!
182
+
183
+ Note that if you find OOM, perhaps you need to use smaller batch size and gradient accumulation. Or you may also want to use some “advanced” tricks like sliced attention or xformers. For example:
184
+
185
+ ```python
186
+ # Configs
187
+ batch_size = 1
188
+
189
+ # Misc
190
+ trainer = pl.Trainer(gpus=1, precision=32, callbacks=[logger], accumulate_grad_batches=4) # But this will be 4x slower
191
+ ```
192
+
193
+ Note that training with 8 GB laptop GPU is challenging. We will need some GPU memory optimization at least as good as automatic1111’s UI. This may require expert modifications to the code.
194
+
195
+ ### Screenshots
196
+
197
+ The training is fast. After 4000 steps (batch size 4, learning rate 1e-5, about 50 minutes on PCIE 40G), the results on my machine (in an output folder "image_log") is
198
+
199
+ Control:
200
+
201
+ ![img](github_page/t/ip.png)
202
+
203
+ Prompt:
204
+
205
+ ![img](github_page/t/t.png)
206
+
207
+ Prediction:
208
+
209
+ ![img](github_page/t/op.png)
210
+
211
+ Ground Truth:
212
+
213
+ ![img](github_page/t/gt.png)
214
+
215
+ Note that the SD's capability is preserved. Even training on this super aligned dataset, it still draws some random textures and those snow decorations. (Besides, note that the ground truth looks a bit modified because it is converted from SD's latent image.)
216
+
217
+ Larger batch size and longer training will further improve this. Adequate training will make the filling perfect.
218
+
219
+ Of course, training SD to fill circles is meaningless, but this is a successful beginning of your story.
220
+
221
+ Let us work together to control large models more and more.
222
+
223
+ ## Other options
224
+
225
+ Beyond standard things, we also provide two important parameters "sd_locked" and "only_mid_control" that you need to know.
226
+
227
+ ### only_mid_control
228
+
229
+ By default, only_mid_control is False. When it is True, you will train the below architecture.
230
+
231
+ ![img](github_page/t6.png)
232
+
233
+ This can be helpful when your computation power is limited and want to speed up the training, or when you want to facilitate the "global" context learning. Note that sometimes you may pause training, set it to True, resume training, and pause again, and set it again, and resume again.
234
+
235
+ If your computation device is good, perhaps you do not need this. But I also know some artists are willing to train a model on their laptop for a month - in that case, perhaps this option can be useful.
236
+
237
+ ### sd_locked
238
+
239
+ By default, sd_locked is True. When it is False, you will train the below architecture.
240
+
241
+ ![img](github_page/t7.png)
242
+
243
+ This will unlock some layers in SD and you will train them as a whole.
244
+
245
+ This option is DANGEROUS! If your dataset is not good enough, this may downgrade the capability of your SD model.
246
+
247
+ However, this option is also very useful when you are training on images with some specific style, or when you are training with special datasets (like medical dataset with X-ray images or geographic datasets with lots of Google Maps). You can understand this as simultaneously training the ControlNet and something like a DreamBooth.
248
+
249
+ Also, if your dataset is large, you may want to end the training with a few thousands of steps with those layer unlocked. This usually improve the "problem-specific" solutions a little. You may try it yourself to feel the difference.
250
+
251
+ Also, if you unlock some original layers, you may want a lower learning rate, like 2e-6.
tutorial_dataset.py ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import cv2
3
+ import numpy as np
4
+
5
+ from torch.utils.data import Dataset
6
+
7
+
8
+ class MyDataset(Dataset):
9
+ def __init__(self):
10
+ self.data = []
11
+ with open('./training/fill50k/prompt.json', 'rt') as f:
12
+ for line in f:
13
+ self.data.append(json.loads(line))
14
+
15
+ def __len__(self):
16
+ return len(self.data)
17
+
18
+ def __getitem__(self, idx):
19
+ item = self.data[idx]
20
+
21
+ source_filename = item['source']
22
+ target_filename = item['target']
23
+ prompt = item['prompt']
24
+
25
+ source = cv2.imread('./training/fill50k/' + source_filename)
26
+ target = cv2.imread('./training/fill50k/' + target_filename)
27
+
28
+ # Do not forget that OpenCV read images in BGR order.
29
+ source = cv2.cvtColor(source, cv2.COLOR_BGR2RGB)
30
+ target = cv2.cvtColor(target, cv2.COLOR_BGR2RGB)
31
+
32
+ # Normalize images to [-1, 1].
33
+ source = (source.astype(np.float32) / 127.5) - 1.0
34
+ target = (target.astype(np.float32) / 127.5) - 1.0
35
+
36
+ return dict(jpg=target, txt=prompt, hint=source)
37
+
tutorial_dataset_test.py ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from tutorial_dataset import MyDataset
2
+
3
+ dataset = MyDataset()
4
+ print(len(dataset))
5
+
6
+ item = dataset[1234]
7
+ jpg = item['jpg']
8
+ txt = item['txt']
9
+ hint = item['hint']
10
+ print(txt)
11
+ print(jpg.shape)
12
+ print(hint.shape)
tutorial_train.py ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from cldm.hack import disable_verbosity
2
+ disable_verbosity()
3
+
4
+ import pytorch_lightning as pl
5
+ from torch.utils.data import DataLoader
6
+ from tutorial_dataset import MyDataset
7
+ from cldm.logger import ImageLogger
8
+ from cldm.model import create_model, load_state_dict
9
+
10
+
11
+ # Configs
12
+ resume_path = './models/control_sd15_ini.ckpt'
13
+ batch_size = 4
14
+ logger_freq = 300
15
+ learning_rate = 1e-5
16
+ sd_locked = True
17
+ only_mid_control = False
18
+
19
+
20
+ # First use cpu to load models. Pytorch Lightning will automatically move it to GPUs.
21
+ model = create_model('./models/cldm_v15.yaml').cpu()
22
+ model.load_state_dict(load_state_dict(resume_path, location='cpu'))
23
+ model.learning_rate = learning_rate
24
+ model.sd_locked = sd_locked
25
+ model.only_mid_control = only_mid_control
26
+
27
+
28
+ # Misc
29
+ dataset = MyDataset()
30
+ dataloader = DataLoader(dataset, num_workers=0, batch_size=batch_size, shuffle=True)
31
+ logger = ImageLogger(batch_frequency=logger_freq)
32
+ trainer = pl.Trainer(gpus=1, precision=32, callbacks=[logger])
33
+
34
+
35
+ # Train!
36
+ trainer.fit(model, dataloader)
utils.py ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from subprocess import call
3
+
4
+ model_dl_urls = {
5
+ "canny": "https://huggingface.co/lllyasviel/ControlNet/resolve/main/models/control_sd15_canny.pth",
6
+ "depth": "https://huggingface.co/lllyasviel/ControlNet/resolve/main/models/control_sd15_depth.pth",
7
+ "hed": "https://huggingface.co/lllyasviel/ControlNet/resolve/main/models/control_sd15_hed.pth",
8
+ "normal": "https://huggingface.co/lllyasviel/ControlNet/resolve/main/models/control_sd15_normal.pth",
9
+ "mlsd": "https://huggingface.co/lllyasviel/ControlNet/resolve/main/models/control_sd15_mlsd.pth",
10
+ "openpose": "https://huggingface.co/lllyasviel/ControlNet/resolve/main/models/control_sd15_openpose.pth",
11
+ "scribble": "https://huggingface.co/lllyasviel/ControlNet/resolve/main/models/control_sd15_scribble.pth",
12
+ "seg": "https://huggingface.co/lllyasviel/ControlNet/resolve/main/models/control_sd15_seg.pth",
13
+ }
14
+
15
+ annotator_dl_urls = {
16
+ "body_pose_model.pth": "https://huggingface.co/lllyasviel/ControlNet/resolve/main/annotator/ckpts/body_pose_model.pth",
17
+ "dpt_hybrid-midas-501f0c75.pt": "https://huggingface.co/lllyasviel/ControlNet/resolve/main/annotator/ckpts/dpt_hybrid-midas-501f0c75.pt",
18
+ "hand_pose_model.pth": "https://huggingface.co/lllyasviel/ControlNet/resolve/main/annotator/ckpts/hand_pose_model.pth",
19
+ "mlsd_large_512_fp32.pth": "https://huggingface.co/lllyasviel/ControlNet/resolve/main/annotator/ckpts/mlsd_large_512_fp32.pth",
20
+ "mlsd_tiny_512_fp32.pth": "https://huggingface.co/lllyasviel/ControlNet/resolve/main/annotator/ckpts/mlsd_tiny_512_fp32.pth",
21
+ "network-bsds500.pth": "https://huggingface.co/lllyasviel/ControlNet/resolve/main/annotator/ckpts/network-bsds500.pth",
22
+ "upernet_global_small.pth": "https://huggingface.co/lllyasviel/ControlNet/resolve/main/annotator/ckpts/upernet_global_small.pth",
23
+ }
24
+
25
+ def download_model(model_name, urls_map):
26
+ """
27
+ Download model from huggingface with wget and save to models directory
28
+ """
29
+ model_url = urls_map[model_name]
30
+ relative_path_to_model = model_url.replace("https://huggingface.co/lllyasviel/ControlNet/resolve/main/", "")
31
+ if not os.path.exists(relative_path_to_model):
32
+ print(f"Downloading {model_name}...")
33
+ call(["wget", "-O", relative_path_to_model, model_url])
34
+
35
+ def get_state_dict_path(model_name):
36
+ """
37
+ Get path to model state dict
38
+ """
39
+ return f"./models/control_sd15_{model_name}.pth"