Update README.md

f7cd162 verified over 1 year ago

16.5 kB


	# AnimatedDiff ControlNet SDXL Example

	This document provides a step-by-step guide to setting up and running the `animatediff_controlnet_sdxl.py` script from the Hugging Face repository. The script leverages the `diffusers-sdxl-controlnet` library to generate animated images using ControlNet and SDXL models.

	## Prerequisites

	Before running the script, ensure you have the necessary dependencies installed. You can install them using the following commands:

	### System Dependencies

	```bash
	sudo apt-get update && sudo apt-get install git-lfs cbm ffmpeg
	```

	### Python Dependencies

	```bash
	pip install git+https://huggingface.co/svjack/diffusers-sdxl-controlnet
	pip install transformers peft sentencepiece moviepy==1.0.3 controlnet_aux
	```

	### Clone the Repository

	```bash
	git clone https://huggingface.co/svjack/diffusers-sdxl-controlnet
	cp diffusers-sdxl-controlnet/girl-pose.gif .
	cp diffusers-sdxl-controlnet/girl_beach.mp4 .
	```

	## Script Modifications

	The script requires some modifications to work correctly. Specifically, you need to comment out certain lines related to LoRA processors:

	```python
	'''
	drop #LoRAAttnProcessor2_0,
	#LoRAXFormersAttnProcessor,
	'''
	```

	## GIF to Frames Conversion

	The script includes a function to convert a GIF into individual frames. This is useful for preparing input data for the animation pipeline.

	```python
	from PIL import Image, ImageSequence
	import os

	def gif_to_frames(gif_path, output_folder):
	# Open the GIF file
	gif = Image.open(gif_path)

	# Ensure the output folder exists
	if not os.path.exists(output_folder):
	os.makedirs(output_folder)

	# Iterate through each frame of the GIF
	for i, frame in enumerate(ImageSequence.Iterator(gif)):
	# Copy the frame
	frame_copy = frame.copy()

	# Save the frame to the specified folder
	frame_path = os.path.join(output_folder, f"frame_{i:04d}.png")
	frame_copy.save(frame_path)

	print(f"Successfully extracted {i + 1} frames to {output_folder}")

	# Example call
	gif_to_frames("girl-pose.gif", "girl_pose_frames")
	```

	### Use this girl pose as pose source video (gif)

	![image/gif](https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/6oTdxQtI0nLGq2YB4KYTh.gif)

	## Running the Script

	To run the script, follow these steps:

	1. Add the Script Path to System Path:

	```python
	import sys
	sys.path.insert(0, "diffusers-sdxl-controlnet/examples/community/")
	from animatediff_controlnet_sdxl import *
	from controlnet_aux.processor import Processor
	```

	2. Load Necessary Libraries and Models:

	```python
	import torch
	from diffusers.models import MotionAdapter
	from diffusers import DDIMScheduler
	from diffusers.utils import export_to_gif
	from diffusers import AutoPipelineForText2Image, ControlNetModel
	from diffusers.utils import load_image
	from PIL import Image
	```

	3. Load the MotionAdapter Model:

	```python
	adapter = MotionAdapter.from_pretrained(
	"a-r-r-o-w/animatediff-motion-adapter-sdxl-beta",
	torch_dtype=torch.float16
	)
	```

	4. Configure the Scheduler and ControlNet:

	```python
	model_id = "svjack/GenshinImpact_XL_Base"
	scheduler = DDIMScheduler.from_pretrained(
	model_id,
	subfolder="scheduler",
	clip_sample=False,
	timestep_spacing="linspace",
	beta_schedule="linear",
	steps_offset=1,
	)

	controlnet = ControlNetModel.from_pretrained(
	"thibaud/controlnet-openpose-sdxl-1.0",
	torch_dtype=torch.float16,
	).to("cuda")
	```

	5. Load the AnimateDiffSDXLControlnetPipeline:

	```python
	pipe = AnimateDiffSDXLControlnetPipeline.from_pretrained(
	model_id,
	controlnet=controlnet,
	motion_adapter=adapter,
	scheduler=scheduler,
	torch_dtype=torch.float16,
	).to("cuda")
	```

	6. Enable Memory Saving Features:

	```python
	pipe.enable_vae_slicing()
	pipe.enable_vae_tiling()
	```

	7. Load Conditioning Frames:

	```python
	import os
	folder_path = "girl_pose_frames/"
	frames = os.listdir(folder_path)
	frames = list(filter(lambda x: x.endswith(".png"), frames))
	frames.sort()
	conditioning_frames = list(map(lambda x: Image.open(os.path.join(folder_path ,x)).resize((1024, 1024)), frames))[:16]
	```

	8. Process Conditioning Frames:

	```python
	p2 = Processor("openpose")
	cn2 = [p2(frame) for frame in conditioning_frames]
	```

	9. Define Prompts:

	```python
	prompt = '''
	solo,Xiangling\(genshin impact\),1girl,
	full body professional photograph of a stunning detailed, sharp focus, dramatic
	cinematic lighting, octane render unreal engine (film grain, blurry background
	'''
	prompt = "solo,Xiangling\(genshin impact\),1girl,full body professional photograph of a stunning detailed"
	negative_prompt = "bad quality, worst quality, jpeg artifacts, ugly"
	```

	10. Generate Output: (Use Genshin Impact character Xiangling)

	```python
	prompt = '''
	solo,Xiangling\(genshin impact\),1girl,
	full body professional photograph of a stunning detailed, sharp focus, dramatic
	cinematic lighting, octane render unreal engine (film grain, blurry background
	'''
	prompt = "solo,Xiangling\(genshin impact\),1girl,full body professional photograph of a stunning detailed"

	#prompt = "solo,Xiangling\(genshin impact\),1girl"
	negative_prompt = "bad quality, worst quality, jpeg artifacts, ugly"

	generator = torch.Generator(device="cpu").manual_seed(0)
	output = pipe(
	prompt=prompt,
	negative_prompt=negative_prompt,
	num_inference_steps=50,
	guidance_scale=20,
	controlnet_conditioning_scale = 1.0,
	width=512,
	height=768,
	num_frames=16,
	conditioning_frames=cn2,
	generator = generator
	)
	```

	11. Export Frames to GIF:

	```python
	frames = output.frames[0]
	export_to_gif(frames, "xiangling_animation.gif")
	```

	12. Display the Result:

	```python
	from IPython import display
	display.Image("xiangling_animation.gif")
	```

	### Target gif

	<div style="display: flex; justify-content: center; flex-wrap: nowrap;">
	<div style="margin-right: 10px;">
	<img src="xiangling_animation.gif" alt="Image 1" style="width: 512px; height: 768px;">
	</div>
	</div>

	### Use Anime Upscale in https://github.com/svjack/APISR

	<div style="display: flex; justify-content: center; flex-wrap: nowrap;">
	<div style="margin-left: 10px;">
	<img src="xiangling_animation_frames_4x.gif" alt="Image 2" style="width: 512px; height: 768px;">
	</div>
	</div>

	### Run in Command line
	- animatediff_controlnet_sdxl_run_script.py
	```python
	import sys
	sys.path.insert(0, "diffusers-sdxl-controlnet/examples/community/")
	from animatediff_controlnet_sdxl import *

	import argparse
	from moviepy.editor import VideoFileClip, ImageSequenceClip
	import os
	import torch
	from diffusers.models import MotionAdapter
	from diffusers import DDIMScheduler, AutoPipelineForText2Image, ControlNetModel
	from diffusers.utils import export_to_gif
	from PIL import Image
	from controlnet_aux.processor import Processor

	# 初始化 MotionAdapter 和 ControlNetModel
	adapter = MotionAdapter.from_pretrained("a-r-r-o-w/animatediff-motion-adapter-sdxl-beta", torch_dtype=torch.float16)

	def initialize_pipeline(model_id):
	scheduler = DDIMScheduler.from_pretrained(model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", beta_schedule="linear", steps_offset=1)
	controlnet = ControlNetModel.from_pretrained("thibaud/controlnet-openpose-sdxl-1.0", torch_dtype=torch.float16).to("cuda")

	# 初始化 AnimateDiffSDXLControlnetPipeline
	pipe = AnimateDiffSDXLControlnetPipeline.from_pretrained(
	model_id,
	controlnet=controlnet,
	motion_adapter=adapter,
	scheduler=scheduler,
	torch_dtype=torch.float16,
	).to("cuda")
	pipe.enable_vae_slicing()
	pipe.enable_vae_tiling()
	return pipe

	def split_video_into_frames(input_video_path, num_frames, temp_folder='temp_frames'):
	"""
	将视频处理成指定帧数的视频，并保持原始的帧率。

	:param input_video_path: 输入视频文件路径
	:param num_frames: 目标帧数
	:param temp_folder: 临时文件夹路径
	"""
	clip = VideoFileClip(input_video_path)
	original_duration = clip.duration
	segment_duration = original_duration / num_frames

	if not os.path.exists(temp_folder):
	os.makedirs(temp_folder)

	for i in range(num_frames):
	frame_time = i * segment_duration
	frame_path = os.path.join(temp_folder, f'frame_{i:04d}.png')
	clip.save_frame(frame_path, t=frame_time)

	frame_paths = [os.path.join(temp_folder, f'frame_{i:04d}.png') for i in range(num_frames)]
	final_clip = ImageSequenceClip(frame_paths, fps=clip.fps)
	final_clip.write_videofile("resampled_video.mp4", codec='libx264')

	print(f"新的视频已保存到 resampled_video.mp4，包含 {num_frames} 个帧，并保持原始的帧率。")

	def generate_video_with_prompt(input_video_path, prompt, model_id, gif_output_path, seed=0, num_frames=16, keep_imgs=False, temp_folder='temp_frames', num_inference_steps=50, guidance_scale=20, controlnet_conditioning_scale=1.0, width=512, height=768):
	"""
	生成带有文本提示的视频。

	:param input_video_path: 输入视频文件路径
	:param prompt: 文本提示
	:param model_id: 模型ID
	:param gif_output_path: GIF 输出文件路径
	:param seed: 随机种子
	:param num_frames: 目标帧数
	:param keep_imgs: 是否保留临时图片
	:param temp_folder: 临时文件夹路径
	:param num_inference_steps: 推理步数
	:param guidance_scale: 引导比例
	:param controlnet_conditioning_scale: ControlNet 条件比例
	:param width: 输出宽度
	:param height: 输出高度
	"""
	split_video_into_frames(input_video_path, num_frames, temp_folder)

	folder_path = temp_folder
	frames = os.listdir(folder_path)
	frames = list(filter(lambda x: x.endswith(".png"), frames))
	frames.sort()
	conditioning_frames = list(map(lambda x: Image.open(os.path.join(folder_path, x)).resize((1024, 1024)), frames))[:num_frames]

	p2 = Processor("openpose")
	cn2 = [p2(frame) for frame in conditioning_frames]

	negative_prompt = "bad quality, worst quality, jpeg artifacts, ugly"
	generator = torch.Generator(device="cuda").manual_seed(seed)

	pipe = initialize_pipeline(model_id)

	output = pipe(
	prompt=prompt,
	negative_prompt=negative_prompt,
	num_inference_steps=num_inference_steps,
	guidance_scale=guidance_scale,
	controlnet_conditioning_scale=controlnet_conditioning_scale,
	width=width,
	height=height,
	num_frames=num_frames,
	conditioning_frames=cn2,
	generator=generator
	)

	frames = output.frames[0]
	export_to_gif(frames, gif_output_path)

	print(f"生成的 GIF 已保存到 {gif_output_path}")

	if not keep_imgs:
	# 删除临时文件夹
	import shutil
	shutil.rmtree(temp_folder)

	if __name__ == "__main__":
	parser = argparse.ArgumentParser(description="生成带有文本提示的视频")
	parser.add_argument("input_video", help="输入视频文件路径")
	parser.add_argument("prompt", help="文本提示")
	parser.add_argument("model_id", help="模型ID")
	parser.add_argument("gif_output_path", help="GIF 输出文件路径")
	parser.add_argument("--seed", type=int, default=0, help="随机种子")
	parser.add_argument("--num_frames", type=int, default=16, help="目标帧数")
	parser.add_argument("--keep_imgs", action="store_true", help="是否保留临时图片")
	parser.add_argument("--temp_folder", default='temp_frames', help="临时文件夹路径")
	parser.add_argument("--num_inference_steps", type=int, default=50, help="推理步数")
	parser.add_argument("--guidance_scale", type=float, default=20.0, help="引导比例")
	parser.add_argument("--controlnet_conditioning_scale", type=float, default=1.0, help="ControlNet 条件比例")
	parser.add_argument("--width", type=int, default=512, help="输出宽度")
	parser.add_argument("--height", type=int, default=768, help="输出高度")

	args = parser.parse_args()

	generate_video_with_prompt(args.input_video, args.prompt, args.model_id, args.gif_output_path, args.seed, args.num_frames,
	args.keep_imgs, args.temp_folder, args.num_inference_steps, args.guidance_scale, args.controlnet_conditioning_scale, args.width, args.height)
	```

	```bash
	python animatediff_controlnet_sdxl_run_script.py girl_beach.mp4 \
	"solo,Xiangling\(genshin impact\),1girl,full body professional photograph of a stunning detailed, drink tea use chinese cup" \
	"svjack/GenshinImpact_XL_Base" \
	xiangling_tea_animation.gif --num_frames 16 --temp_folder temp_frames
	```
	- Pose: girl_beach.mp4
	<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/pYx23VyLNkLk3YxAAqu5i.mp4"></video>
	- Output: xiangling_tea_animation.gif
	![image/gif](https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/qUZOvGs5rzxN8zaZ4Xp3s.gif)
	- Upscaled:
	<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/uwUDYOPiZbHuq5v6jWADr.mp4"></video>

	### Some Other Samples

	#### Makise Kurisu in Steins Gate
	```bash
	python animatediff_controlnet_sdxl_run_script.py girl_beach.mp4 \
	"1girl, Makise Kurisu, masterpiece, white lab coat, red tie, laboratory" \
	"cagliostrolab/animagine-xl-3.1" \
	Makise_Kurisu_animation_short.gif --num_frames 16 --temp_folder temp_frames --guidance_scale 20 --controlnet_conditioning_scale 0.3
	```
	- Output: Makise_Kurisu_animation_short.gif
	![image/gif](https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/WZu1raXfuaHlmrzTTOBbz.gif)
	- Upscaled:
	<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/v69NuN5UsAokrfBNW_c9P.mp4"></video>

	#### Souryuu Asuka Langley in EVA
	```bash
	python animatediff_controlnet_sdxl_run_script.py girl_beach.mp4 \
	"1girl, souryuu asuka langley, masterpiece" \
	"cagliostrolab/animagine-xl-3.1" \
	asuka_langley_animation_short.gif --num_frames 16 --temp_folder temp_frames --guidance_scale 20 --controlnet_conditioning_scale 0.3 --num_inference_steps 50
	```
	- Output: asuka_langley_animation_short.gif
	![image/gif](https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/wZVvYaYqpigXENEVJVGaM.gif)
	- Upscaled:
	<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/uusv36dl0NT80fpUeo5pA.mp4"></video>

	```bash
	python animatediff_controlnet_sdxl_run_script.py girl_beach.mp4 \
	"1girl, souryuu asuka langley, masterpiece, neon genesis evangelion, solo, upper body, v, smile, looking at viewer, outdoors, night" \
	"cagliostrolab/animagine-xl-3.1" \
	asuka_langley_animation_long.gif --num_frames 16 --temp_folder temp_frames --guidance_scale 20 --controlnet_conditioning_scale 0.3
	```
	- Output: asuka_langley_animation_long.gif
	![image/gif](https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/qoLf2rKuGLnIW5liQg8tq.gif)
	- Upscaled:
	<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/T2iREkPkWXWCjzOHmq82-.mp4"></video>

	#### XiangLing in Genshin Impact
	- produce_gif_script.py
	```bash
	python produce_gif_script.py xiangling_video_seed.csv "svjack/GenshinImpact_XL_Base" xiangling_gif_dir \
	--num_frames 16 --temp_folder temp_frames --seed 0 --controlnet_conditioning_scale 0.3
	```
	![image/gif](https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/R2SpiNASjQj8k_wrZDJA5.gif)
	![image/gif](https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/ssJZD1SXLLu4EdpSZKcP2.gif)


	## Conclusion

	This script demonstrates how to use the `diffusers-sdxl-controlnet` library to generate animated images with ControlNet and SDXL models. By following the steps outlined above, you can create and visualize your own animated sequences.