/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] The following values were not passed to `accelerate launch` and had defaults used instead: `--num_processes` was set to a value of `6` More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in `--num_processes=1`. `--num_machines` was set to a value of `1` `--mixed_precision` was set to a value of `'no'` `--dynamo_backend` was set to a value of `'no'` To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. /home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] /home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] wandb: Currently logged in as: 850587960 (850587960-tsinghua-university) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin wandb: setting up run 8hhtbq9v wandb: Tracking run with wandb version 0.23.1 wandb: Run data is saved locally in /data/rczhang/PencilFolder/DiffSynth-Studio/wandb/run-20251218_060313-8hhtbq9v wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run Wan2.1-1.3b-mc-lora wandb: ⭐️ View project at https://wandb.ai/850587960-tsinghua-university/WanLoRA-Diffsyn wandb: 🚀 View run at https://wandb.ai/850587960-tsinghua-university/WanLoRA-Diffsyn/runs/8hhtbq9v Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors" Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors" Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors" Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors" Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors" Loaded model: { "model_name": "wan_video_dit", "model_class": "diffsynth.models.wan_video_dit.WanModel", "extra_kwargs": { "has_image_input": false, "patch_size": [ 1, 2, 2 ], "in_dim": 16, "dim": 1536, "ffn_dim": 8960, "freq_dim": 256, "text_dim": 4096, "out_dim": 16, "num_heads": 12, "num_layers": 30, "eps": 1e-06 } } Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth" Loaded model: { "model_name": "wan_video_dit", "model_class": "diffsynth.models.wan_video_dit.WanModel", "extra_kwargs": { "has_image_input": false, "patch_size": [ 1, 2, 2 ], "in_dim": 16, "dim": 1536, "ffn_dim": 8960, "freq_dim": 256, "text_dim": 4096, "out_dim": 16, "num_heads": 12, "num_layers": 30, "eps": 1e-06 } } Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth" Loaded model: { "model_name": "wan_video_dit", "model_class": "diffsynth.models.wan_video_dit.WanModel", "extra_kwargs": { "has_image_input": false, "patch_size": [ 1, 2, 2 ], "in_dim": 16, "dim": 1536, "ffn_dim": 8960, "freq_dim": 256, "text_dim": 4096, "out_dim": 16, "num_heads": 12, "num_layers": 30, "eps": 1e-06 } } Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth" Loaded model: { "model_name": "wan_video_dit", "model_class": "diffsynth.models.wan_video_dit.WanModel", "extra_kwargs": { "has_image_input": false, "patch_size": [ 1, 2, 2 ], "in_dim": 16, "dim": 1536, "ffn_dim": 8960, "freq_dim": 256, "text_dim": 4096, "out_dim": 16, "num_heads": 12, "num_layers": 30, "eps": 1e-06 } } Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth" Loaded model: { "model_name": "wan_video_dit", "model_class": "diffsynth.models.wan_video_dit.WanModel", "extra_kwargs": { "has_image_input": false, "patch_size": [ 1, 2, 2 ], "in_dim": 16, "dim": 1536, "ffn_dim": 8960, "freq_dim": 256, "text_dim": 4096, "out_dim": 16, "num_heads": 12, "num_layers": 30, "eps": 1e-06 } } Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth" Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors" Loaded model: { "model_name": "wan_video_dit", "model_class": "diffsynth.models.wan_video_dit.WanModel", "extra_kwargs": { "has_image_input": false, "patch_size": [ 1, 2, 2 ], "in_dim": 16, "dim": 1536, "ffn_dim": 8960, "freq_dim": 256, "text_dim": 4096, "out_dim": 16, "num_heads": 12, "num_layers": 30, "eps": 1e-06 } } Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth" Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file. Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file. Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file. Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file. Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file. Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file. Loaded model: { "model_name": "wan_video_text_encoder", "model_class": "diffsynth.models.wan_video_text_encoder.WanTextEncoder", "extra_kwargs": null } Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth" [rank1]: Traceback (most recent call last): [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/examples/wanvideo/model_training/train_mc_lora.py", line 215, in [rank1]: model = WanTrainingModule( [rank1]: ^^^^^^^^^^^^^^^^^^ [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/examples/wanvideo/model_training/train_mc_lora.py", line 50, in __init__ [rank1]: self.pipe = WanVideoPipeline.from_pretrained( [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/pipelines/wan_video.py", line 130, in from_pretrained [rank1]: model_pool = pipe.download_and_load_models(model_configs, vram_limit) [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/diffusion/base_pipeline.py", line 287, in download_and_load_models [rank1]: model_pool.auto_load_model( [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/models/model_loader.py", line 66, in auto_load_model [rank1]: model_hash = hash_model_file(path) [rank1]: ^^^^^^^^^^^^^^^^^^^^^ [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/core/loader/file.py", line 118, in hash_model_file [rank1]: keys_dict = load_keys_dict(path) [rank1]: ^^^^^^^^^^^^^^^^^^^^ [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/core/loader/file.py", line 74, in load_keys_dict [rank1]: return load_keys_dict_from_bin(file_path) [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/core/loader/file.py", line 96, in load_keys_dict_from_bin [rank1]: state_dict = load_state_dict_from_bin(file_path) [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/core/loader/file.py", line 28, in load_state_dict_from_bin [rank1]: state_dict = torch.load(file_path, map_location=device, weights_only=True) [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank1]: File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/serialization.py", line 1484, in load [rank1]: with _open_file_like(f, "rb") as opened_file: [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^ [rank1]: File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/serialization.py", line 759, in _open_file_like [rank1]: return _open_file(name_or_buffer, mode) [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank1]: File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/serialization.py", line 740, in __init__ [rank1]: super().__init__(open(name, mode)) [rank1]: ^^^^^^^^^^^^^^^^ [rank1]: FileNotFoundError: [Errno 2] No such file or directory: '/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth' W1218 06:03:36.164000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 2434589 closing signal SIGTERM W1218 06:03:36.165000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 2434607 closing signal SIGTERM W1218 06:03:36.165000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 2434608 closing signal SIGTERM W1218 06:03:36.166000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 2434609 closing signal SIGTERM W1218 06:03:36.166000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 2434610 closing signal SIGTERM E1218 06:03:37.036000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:882] failed (exitcode: 1) local_rank: 1 (pid: 2434590) of binary: /home/rczhang/miniconda3/envs/diffsyn/bin/python3.12 Traceback (most recent call last): File "/home/rczhang/miniconda3/envs/diffsyn/bin/accelerate", line 7, in sys.exit(main()) ^^^^^^ File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 50, in main args.func(args) File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1272, in launch_command multi_gpu_launcher(args) File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/accelerate/commands/launch.py", line 899, in multi_gpu_launcher distrib_run.run(args) File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/distributed/run.py", line 927, in run elastic_launch( File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 156, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 293, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ examples/wanvideo/model_training/train_mc_lora.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2025-12-18_06:03:36 host : bm-9103581 rank : 1 (local_rank: 1) exitcode : 1 (pid: 2434590) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================