BarrenWardo commited on
Commit
4f1a205
·
1 Parent(s): 34dd491

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -324
README.md DELETED
@@ -1,324 +0,0 @@
1
- ---
2
- tags:
3
- - text-to-image
4
- - stable-diffusion
5
-
6
- language:
7
- - en
8
- library_name: diffusers
9
- ---
10
-
11
- # IP-Adapter-FaceID Model Card
12
-
13
-
14
- <div align="center">
15
-
16
- [**Project Page**](https://ip-adapter.github.io) **|** [**Paper (ArXiv)**](https://arxiv.org/abs/2308.06721) **|** [**Code**](https://github.com/tencent-ailab/IP-Adapter)
17
- </div>
18
-
19
- ---
20
-
21
-
22
-
23
- ## Introduction
24
-
25
- An experimental version of IP-Adapter-FaceID: we use face ID embedding from a face recognition model instead of CLIP image embedding, additionally, we use LoRA to improve ID consistency. IP-Adapter-FaceID can generate various style images conditioned on a face with only text prompts.
26
-
27
- ![results](./ip-adapter-faceid.jpg)
28
-
29
-
30
- **Update 2023/12/27**:
31
-
32
- IP-Adapter-FaceID-Plus: face ID embedding (for face ID) + CLIP image embedding (for face structure)
33
-
34
- <div align="center">
35
-
36
- ![results](./faceid-plus.jpg)
37
- </div>
38
-
39
- **Update 2023/12/28**:
40
-
41
- IP-Adapter-FaceID-PlusV2: face ID embedding (for face ID) + controllable CLIP image embedding (for face structure)
42
-
43
- You can adjust the weight of the face structure to get different generation!
44
-
45
- <div align="center">
46
-
47
- ![results](./faceid_plusv2.jpg)
48
- </div>
49
-
50
- **Update 2024/01/04**:
51
-
52
- IP-Adapter-FaceID-SDXL: An experimental SDXL version of IP-Adapter-FaceID
53
-
54
- <div align="center">
55
-
56
- ![results](./sdxl_faceid.jpg)
57
- </div>
58
-
59
- ## Usage
60
-
61
- ### IP-Adapter-FaceID
62
-
63
- Firstly, you should use [insightface](https://github.com/deepinsight/insightface) to extract face ID embedding:
64
-
65
- ```python
66
-
67
- import cv2
68
- from insightface.app import FaceAnalysis
69
- import torch
70
-
71
- app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
72
- app.prepare(ctx_id=0, det_size=(640, 640))
73
-
74
- image = cv2.imread("person.jpg")
75
- faces = app.get(image)
76
-
77
- faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)
78
- ```
79
-
80
- Then, you can generate images conditioned on the face embeddings:
81
-
82
- ```python
83
-
84
- import torch
85
- from diffusers import StableDiffusionPipeline, DDIMScheduler, AutoencoderKL
86
- from PIL import Image
87
-
88
- from ip_adapter.ip_adapter_faceid import IPAdapterFaceID
89
-
90
- base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE"
91
- vae_model_path = "stabilityai/sd-vae-ft-mse"
92
- ip_ckpt = "ip-adapter-faceid_sd15.bin"
93
- device = "cuda"
94
-
95
- noise_scheduler = DDIMScheduler(
96
- num_train_timesteps=1000,
97
- beta_start=0.00085,
98
- beta_end=0.012,
99
- beta_schedule="scaled_linear",
100
- clip_sample=False,
101
- set_alpha_to_one=False,
102
- steps_offset=1,
103
- )
104
- vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16)
105
- pipe = StableDiffusionPipeline.from_pretrained(
106
- base_model_path,
107
- torch_dtype=torch.float16,
108
- scheduler=noise_scheduler,
109
- vae=vae,
110
- feature_extractor=None,
111
- safety_checker=None
112
- )
113
-
114
- # load ip-adapter
115
- ip_model = IPAdapterFaceID(pipe, ip_ckpt, device)
116
-
117
- # generate image
118
- prompt = "photo of a woman in red dress in a garden"
119
- negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality, blurry"
120
-
121
- images = ip_model.generate(
122
- prompt=prompt, negative_prompt=negative_prompt, faceid_embeds=faceid_embeds, num_samples=4, width=512, height=768, num_inference_steps=30, seed=2023
123
- )
124
-
125
- ```
126
-
127
- you can also use a normal IP-Adapter and a normal LoRA to load model:
128
-
129
- ```python
130
- import torch
131
- from diffusers import StableDiffusionPipeline, DDIMScheduler, AutoencoderKL
132
- from PIL import Image
133
-
134
- from ip_adapter.ip_adapter_faceid_separate import IPAdapterFaceID
135
-
136
- base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE"
137
- vae_model_path = "stabilityai/sd-vae-ft-mse"
138
- ip_ckpt = "ip-adapter-faceid_sd15.bin"
139
- lora_ckpt = "ip-adapter-faceid_sd15_lora.safetensors"
140
- device = "cuda"
141
-
142
- noise_scheduler = DDIMScheduler(
143
- num_train_timesteps=1000,
144
- beta_start=0.00085,
145
- beta_end=0.012,
146
- beta_schedule="scaled_linear",
147
- clip_sample=False,
148
- set_alpha_to_one=False,
149
- steps_offset=1,
150
- )
151
- vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16)
152
- pipe = StableDiffusionPipeline.from_pretrained(
153
- base_model_path,
154
- torch_dtype=torch.float16,
155
- scheduler=noise_scheduler,
156
- vae=vae,
157
- feature_extractor=None,
158
- safety_checker=None
159
- )
160
-
161
- # load lora and fuse
162
- pipe.load_lora_weights(lora_ckpt)
163
- pipe.fuse_lora()
164
-
165
- # load ip-adapter
166
- ip_model = IPAdapterFaceID(pipe, ip_ckpt, device)
167
-
168
- # generate image
169
- prompt = "photo of a woman in red dress in a garden"
170
- negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality, blurry"
171
-
172
- images = ip_model.generate(
173
- prompt=prompt, negative_prompt=negative_prompt, faceid_embeds=faceid_embeds, num_samples=4, width=512, height=768, num_inference_steps=30, seed=2023
174
- )
175
-
176
-
177
- ```
178
-
179
- ### IP-Adapter-FaceID-SDXL
180
-
181
- Firstly, you should use [insightface](https://github.com/deepinsight/insightface) to extract face ID embedding:
182
-
183
- ```python
184
-
185
- import cv2
186
- from insightface.app import FaceAnalysis
187
- import torch
188
-
189
- app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
190
- app.prepare(ctx_id=0, det_size=(640, 640))
191
-
192
- image = cv2.imread("person.jpg")
193
- faces = app.get(image)
194
-
195
- faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)
196
- ```
197
-
198
- Then, you can generate images conditioned on the face embeddings:
199
-
200
- ```python
201
-
202
- import torch
203
- from diffusers import StableDiffusionXLPipeline, DDIMScheduler
204
- from PIL import Image
205
-
206
- from ip_adapter.ip_adapter_faceid import IPAdapterFaceIDXL
207
-
208
- base_model_path = "SG161222/RealVisXL_V3.0"
209
- ip_ckpt = "ip-adapter-faceid_sdxl.bin"
210
- device = "cuda"
211
-
212
- noise_scheduler = DDIMScheduler(
213
- num_train_timesteps=1000,
214
- beta_start=0.00085,
215
- beta_end=0.012,
216
- beta_schedule="scaled_linear",
217
- clip_sample=False,
218
- set_alpha_to_one=False,
219
- steps_offset=1,
220
- )
221
- pipe = StableDiffusionXLPipeline.from_pretrained(
222
- base_model_path,
223
- torch_dtype=torch.float16,
224
- scheduler=noise_scheduler,
225
- add_watermarker=False,
226
- )
227
-
228
- # load ip-adapter
229
- ip_model = IPAdapterFaceIDXL(pipe, ip_ckpt, device)
230
-
231
- # generate image
232
- prompt = "A closeup shot of a beautiful Asian teenage girl in a white dress wearing small silver earrings in the garden, under the soft morning light"
233
- negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality, blurry"
234
-
235
- images = ip_model.generate(
236
- prompt=prompt, negative_prompt=negative_prompt, faceid_embeds=faceid_embeds, num_samples=2,
237
- width=1024, height=1024,
238
- num_inference_steps=30, guidance_scale=7.5, seed=2023
239
- )
240
-
241
- ```
242
-
243
-
244
- ### IP-Adapter-FaceID-Plus
245
-
246
- Firstly, you should use [insightface](https://github.com/deepinsight/insightface) to extract face ID embedding and face image:
247
-
248
- ```python
249
-
250
- import cv2
251
- from insightface.app import FaceAnalysis
252
- from insightface.utils import face_align
253
- import torch
254
-
255
- app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
256
- app.prepare(ctx_id=0, det_size=(640, 640))
257
-
258
- image = cv2.imread("person.jpg")
259
- faces = app.get(image)
260
-
261
- faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)
262
- face_image = face_align.norm_crop(image, landmark=faces[0].kps, image_size=224) # you can also segment the face
263
- ```
264
-
265
- Then, you can generate images conditioned on the face embeddings:
266
-
267
- ```python
268
-
269
- import torch
270
- from diffusers import StableDiffusionPipeline, DDIMScheduler, AutoencoderKL
271
- from PIL import Image
272
-
273
- from ip_adapter.ip_adapter_faceid import IPAdapterFaceIDPlus
274
-
275
- v2 = False
276
- base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE"
277
- vae_model_path = "stabilityai/sd-vae-ft-mse"
278
- image_encoder_path = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"
279
- ip_ckpt = "ip-adapter-faceid-plus_sd15.bin" if not v2 else "ip-adapter-faceid-plusv2_sd15.bin"
280
- device = "cuda"
281
-
282
- noise_scheduler = DDIMScheduler(
283
- num_train_timesteps=1000,
284
- beta_start=0.00085,
285
- beta_end=0.012,
286
- beta_schedule="scaled_linear",
287
- clip_sample=False,
288
- set_alpha_to_one=False,
289
- steps_offset=1,
290
- )
291
- vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16)
292
- pipe = StableDiffusionPipeline.from_pretrained(
293
- base_model_path,
294
- torch_dtype=torch.float16,
295
- scheduler=noise_scheduler,
296
- vae=vae,
297
- feature_extractor=None,
298
- safety_checker=None
299
- )
300
-
301
- # load ip-adapter
302
- ip_model = IPAdapterFaceIDPlus(pipe, image_encoder_path, ip_ckpt, device)
303
-
304
- # generate image
305
- prompt = "photo of a woman in red dress in a garden"
306
- negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality, blurry"
307
-
308
- images = ip_model.generate(
309
- prompt=prompt, negative_prompt=negative_prompt, face_image=face_image, faceid_embeds=faceid_embeds, shortcut=v2, s_scale=1.0,
310
- num_samples=4, width=512, height=768, num_inference_steps=30, seed=2023
311
- )
312
-
313
- ```
314
-
315
-
316
- ## Limitations and Bias
317
- - The model does not achieve perfect photorealism and ID consistency.
318
- - The generalization of the model is limited due to limitations of the training data, base model and face recognition model.
319
-
320
-
321
-
322
- ## Non-commercial use
323
- **This model is released exclusively for research purposes and is not intended for commercial use.**
324
-