PencilFolder / log /Wan2.1-1.3b-mc-lora.out

Upload folder using huggingface_hub

1146a67 verified 3 months ago

14.8 kB

	/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
	import pynvml # type: ignore[import]
	The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `6`
	More than one GPU was found, enabling multi-GPU training.
	If this was unintended please pass in `--num_processes=1`.
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--dynamo_backend` was set to a value of `'no'`
	To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
	/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
	import pynvml # type: ignore[import]
	/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
	import pynvml # type: ignore[import]
	/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
	import pynvml # type: ignore[import]
	/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
	import pynvml # type: ignore[import]
	/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
	import pynvml # type: ignore[import]
	/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
	import pynvml # type: ignore[import]
	wandb: Currently logged in as: 850587960 (850587960-tsinghua-university) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
	wandb: setting up run 8hhtbq9v
	wandb: Tracking run with wandb version 0.23.1
	wandb: Run data is saved locally in /data/rczhang/PencilFolder/DiffSynth-Studio/wandb/run-20251218_060313-8hhtbq9v
	wandb: Run `wandb offline` to turn off syncing.
	wandb: Syncing run Wan2.1-1.3b-mc-lora
	wandb: ⭐️ View project at https://wandb.ai/850587960-tsinghua-university/WanLoRA-Diffsyn
	wandb: 🚀 View run at https://wandb.ai/850587960-tsinghua-university/WanLoRA-Diffsyn/runs/8hhtbq9v
	Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors"
	Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors"
	Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors"
	Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors"
	Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors"
	Loaded model: {
	"model_name": "wan_video_dit",
	"model_class": "diffsynth.models.wan_video_dit.WanModel",
	"extra_kwargs": {
	"has_image_input": false,
	"patch_size": [
	1,
	2,
	2
	],
	"in_dim": 16,
	"dim": 1536,
	"ffn_dim": 8960,
	"freq_dim": 256,
	"text_dim": 4096,
	"out_dim": 16,
	"num_heads": 12,
	"num_layers": 30,
	"eps": 1e-06
	}
	}
	Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth"
	Loaded model: {
	"model_name": "wan_video_dit",
	"model_class": "diffsynth.models.wan_video_dit.WanModel",
	"extra_kwargs": {
	"has_image_input": false,
	"patch_size": [
	1,
	2,
	2
	],
	"in_dim": 16,
	"dim": 1536,
	"ffn_dim": 8960,
	"freq_dim": 256,
	"text_dim": 4096,
	"out_dim": 16,
	"num_heads": 12,
	"num_layers": 30,
	"eps": 1e-06
	}
	}
	Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth"
	Loaded model: {
	"model_name": "wan_video_dit",
	"model_class": "diffsynth.models.wan_video_dit.WanModel",
	"extra_kwargs": {
	"has_image_input": false,
	"patch_size": [
	1,
	2,
	2
	],
	"in_dim": 16,
	"dim": 1536,
	"ffn_dim": 8960,
	"freq_dim": 256,
	"text_dim": 4096,
	"out_dim": 16,
	"num_heads": 12,
	"num_layers": 30,
	"eps": 1e-06
	}
	}
	Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth"
	Loaded model: {
	"model_name": "wan_video_dit",
	"model_class": "diffsynth.models.wan_video_dit.WanModel",
	"extra_kwargs": {
	"has_image_input": false,
	"patch_size": [
	1,
	2,
	2
	],
	"in_dim": 16,
	"dim": 1536,
	"ffn_dim": 8960,
	"freq_dim": 256,
	"text_dim": 4096,
	"out_dim": 16,
	"num_heads": 12,
	"num_layers": 30,
	"eps": 1e-06
	}
	}
	Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth"
	Loaded model: {
	"model_name": "wan_video_dit",
	"model_class": "diffsynth.models.wan_video_dit.WanModel",
	"extra_kwargs": {
	"has_image_input": false,
	"patch_size": [
	1,
	2,
	2
	],
	"in_dim": 16,
	"dim": 1536,
	"ffn_dim": 8960,
	"freq_dim": 256,
	"text_dim": 4096,
	"out_dim": 16,
	"num_heads": 12,
	"num_layers": 30,
	"eps": 1e-06
	}
	}
	Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth"
	Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors"
	Loaded model: {
	"model_name": "wan_video_dit",
	"model_class": "diffsynth.models.wan_video_dit.WanModel",
	"extra_kwargs": {
	"has_image_input": false,
	"patch_size": [
	1,
	2,
	2
	],
	"in_dim": 16,
	"dim": 1536,
	"ffn_dim": 8960,
	"freq_dim": 256,
	"text_dim": 4096,
	"out_dim": 16,
	"num_heads": 12,
	"num_layers": 30,
	"eps": 1e-06
	}
	}
	Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth"
	Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file.
	Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file.
	Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file.
	Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file.
	Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file.
	Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file.
	Loaded model: {
	"model_name": "wan_video_text_encoder",
	"model_class": "diffsynth.models.wan_video_text_encoder.WanTextEncoder",
	"extra_kwargs": null
	}
	Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth"
	[rank1]: Traceback (most recent call last):
	[rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/examples/wanvideo/model_training/train_mc_lora.py", line 215, in <module>
	[rank1]: model = WanTrainingModule(
	[rank1]: ^^^^^^^^^^^^^^^^^^
	[rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/examples/wanvideo/model_training/train_mc_lora.py", line 50, in __init__
	[rank1]: self.pipe = WanVideoPipeline.from_pretrained(
	[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	[rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/pipelines/wan_video.py", line 130, in from_pretrained
	[rank1]: model_pool = pipe.download_and_load_models(model_configs, vram_limit)
	[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	[rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/diffusion/base_pipeline.py", line 287, in download_and_load_models
	[rank1]: model_pool.auto_load_model(
	[rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/models/model_loader.py", line 66, in auto_load_model
	[rank1]: model_hash = hash_model_file(path)
	[rank1]: ^^^^^^^^^^^^^^^^^^^^^
	[rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/core/loader/file.py", line 118, in hash_model_file
	[rank1]: keys_dict = load_keys_dict(path)
	[rank1]: ^^^^^^^^^^^^^^^^^^^^
	[rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/core/loader/file.py", line 74, in load_keys_dict
	[rank1]: return load_keys_dict_from_bin(file_path)
	[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	[rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/core/loader/file.py", line 96, in load_keys_dict_from_bin
	[rank1]: state_dict = load_state_dict_from_bin(file_path)
	[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	[rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/core/loader/file.py", line 28, in load_state_dict_from_bin
	[rank1]: state_dict = torch.load(file_path, map_location=device, weights_only=True)
	[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	[rank1]: File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/serialization.py", line 1484, in load
	[rank1]: with _open_file_like(f, "rb") as opened_file:
	[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
	[rank1]: File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/serialization.py", line 759, in _open_file_like
	[rank1]: return _open_file(name_or_buffer, mode)
	[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	[rank1]: File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/serialization.py", line 740, in __init__
	[rank1]: super().__init__(open(name, mode))
	[rank1]: ^^^^^^^^^^^^^^^^
	[rank1]: FileNotFoundError: [Errno 2] No such file or directory: '/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth'
	W1218 06:03:36.164000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 2434589 closing signal SIGTERM
	W1218 06:03:36.165000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 2434607 closing signal SIGTERM
	W1218 06:03:36.165000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 2434608 closing signal SIGTERM
	W1218 06:03:36.166000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 2434609 closing signal SIGTERM
	W1218 06:03:36.166000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 2434610 closing signal SIGTERM
	E1218 06:03:37.036000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:882] failed (exitcode: 1) local_rank: 1 (pid: 2434590) of binary: /home/rczhang/miniconda3/envs/diffsyn/bin/python3.12
	Traceback (most recent call last):
	File "/home/rczhang/miniconda3/envs/diffsyn/bin/accelerate", line 7, in <module>
	sys.exit(main())
	^^^^^^
	File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 50, in main
	args.func(args)
	File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1272, in launch_command
	multi_gpu_launcher(args)
	File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/accelerate/commands/launch.py", line 899, in multi_gpu_launcher
	distrib_run.run(args)
	File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/distributed/run.py", line 927, in run
	elastic_launch(
	File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 156, in __call__
	return launch_agent(self._config, self._entrypoint, list(args))
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 293, in launch_agent
	raise ChildFailedError(
	torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
	============================================================
	examples/wanvideo/model_training/train_mc_lora.py FAILED
	------------------------------------------------------------
	Failures:
	<NO_OTHER_FAILURES>
	------------------------------------------------------------
	Root Cause (first observed failure):
	[0]:
	time : 2025-12-18_06:03:36
	host : bm-9103581
	rank : 1 (local_rank: 1)
	exitcode : 1 (pid: 2434590)
	error_file: <N/A>
	traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
	============================================================