| /home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. |
| import pynvml # type: ignore[import] |
| The following values were not passed to `accelerate launch` and had defaults used instead: |
| `--num_processes` was set to a value of `6` |
| More than one GPU was found, enabling multi-GPU training. |
| If this was unintended please pass in `--num_processes=1`. |
| `--num_machines` was set to a value of `1` |
| `--mixed_precision` was set to a value of `'no'` |
| `--dynamo_backend` was set to a value of `'no'` |
| To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. |
| /home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. |
| import pynvml # type: ignore[import] |
| /home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. |
| import pynvml # type: ignore[import] |
| /home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. |
| import pynvml # type: ignore[import] |
| /home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. |
| import pynvml # type: ignore[import] |
| /home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. |
| import pynvml # type: ignore[import] |
| /home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. |
| import pynvml # type: ignore[import] |
| wandb: Currently logged in as: 850587960 (850587960-tsinghua-university) to https: |
| wandb: setting up run 8hhtbq9v |
| wandb: Tracking run with wandb version 0.23.1 |
| wandb: Run data is saved locally in /data/rczhang/PencilFolder/DiffSynth-Studio/wandb/run-20251218_060313-8hhtbq9v |
| wandb: Run `wandb offline` to turn off syncing. |
| wandb: Syncing run Wan2.1-1.3b-mc-lora |
| wandb: ⭐️ View project at https: |
| wandb: 🚀 View run at https: |
| Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors" |
| Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors" |
| Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors" |
| Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors" |
| Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors" |
| Loaded model: { |
| "model_name": "wan_video_dit", |
| "model_class": "diffsynth.models.wan_video_dit.WanModel", |
| "extra_kwargs": { |
| "has_image_input": false, |
| "patch_size": [ |
| 1, |
| 2, |
| 2 |
| ], |
| "in_dim": 16, |
| "dim": 1536, |
| "ffn_dim": 8960, |
| "freq_dim": 256, |
| "text_dim": 4096, |
| "out_dim": 16, |
| "num_heads": 12, |
| "num_layers": 30, |
| "eps": 1e-06 |
| } |
| } |
| Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth" |
| Loaded model: { |
| "model_name": "wan_video_dit", |
| "model_class": "diffsynth.models.wan_video_dit.WanModel", |
| "extra_kwargs": { |
| "has_image_input": false, |
| "patch_size": [ |
| 1, |
| 2, |
| 2 |
| ], |
| "in_dim": 16, |
| "dim": 1536, |
| "ffn_dim": 8960, |
| "freq_dim": 256, |
| "text_dim": 4096, |
| "out_dim": 16, |
| "num_heads": 12, |
| "num_layers": 30, |
| "eps": 1e-06 |
| } |
| } |
| Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth" |
| Loaded model: { |
| "model_name": "wan_video_dit", |
| "model_class": "diffsynth.models.wan_video_dit.WanModel", |
| "extra_kwargs": { |
| "has_image_input": false, |
| "patch_size": [ |
| 1, |
| 2, |
| 2 |
| ], |
| "in_dim": 16, |
| "dim": 1536, |
| "ffn_dim": 8960, |
| "freq_dim": 256, |
| "text_dim": 4096, |
| "out_dim": 16, |
| "num_heads": 12, |
| "num_layers": 30, |
| "eps": 1e-06 |
| } |
| } |
| Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth" |
| Loaded model: { |
| "model_name": "wan_video_dit", |
| "model_class": "diffsynth.models.wan_video_dit.WanModel", |
| "extra_kwargs": { |
| "has_image_input": false, |
| "patch_size": [ |
| 1, |
| 2, |
| 2 |
| ], |
| "in_dim": 16, |
| "dim": 1536, |
| "ffn_dim": 8960, |
| "freq_dim": 256, |
| "text_dim": 4096, |
| "out_dim": 16, |
| "num_heads": 12, |
| "num_layers": 30, |
| "eps": 1e-06 |
| } |
| } |
| Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth" |
| Loaded model: { |
| "model_name": "wan_video_dit", |
| "model_class": "diffsynth.models.wan_video_dit.WanModel", |
| "extra_kwargs": { |
| "has_image_input": false, |
| "patch_size": [ |
| 1, |
| 2, |
| 2 |
| ], |
| "in_dim": 16, |
| "dim": 1536, |
| "ffn_dim": 8960, |
| "freq_dim": 256, |
| "text_dim": 4096, |
| "out_dim": 16, |
| "num_heads": 12, |
| "num_layers": 30, |
| "eps": 1e-06 |
| } |
| } |
| Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth" |
| Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors" |
| Loaded model: { |
| "model_name": "wan_video_dit", |
| "model_class": "diffsynth.models.wan_video_dit.WanModel", |
| "extra_kwargs": { |
| "has_image_input": false, |
| "patch_size": [ |
| 1, |
| 2, |
| 2 |
| ], |
| "in_dim": 16, |
| "dim": 1536, |
| "ffn_dim": 8960, |
| "freq_dim": 256, |
| "text_dim": 4096, |
| "out_dim": 16, |
| "num_heads": 12, |
| "num_layers": 30, |
| "eps": 1e-06 |
| } |
| } |
| Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth" |
| Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file. |
| Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file. |
| Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file. |
| Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file. |
| Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file. |
| Detected non-safetensors files, which may cause slower loading. It's recommended to convert it to a safetensors file. |
| Loaded model: { |
| "model_name": "wan_video_text_encoder", |
| "model_class": "diffsynth.models.wan_video_text_encoder.WanTextEncoder", |
| "extra_kwargs": null |
| } |
| Loading models from: "/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth" |
| [rank1]: Traceback (most recent call last): |
| [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/examples/wanvideo/model_training/train_mc_lora.py", line 215, in <module> |
| [rank1]: model = WanTrainingModule( |
| [rank1]: ^^^^^^^^^^^^^^^^^^ |
| [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/examples/wanvideo/model_training/train_mc_lora.py", line 50, in __init__ |
| [rank1]: self.pipe = WanVideoPipeline.from_pretrained( |
| [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/pipelines/wan_video.py", line 130, in from_pretrained |
| [rank1]: model_pool = pipe.download_and_load_models(model_configs, vram_limit) |
| [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/diffusion/base_pipeline.py", line 287, in download_and_load_models |
| [rank1]: model_pool.auto_load_model( |
| [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/models/model_loader.py", line 66, in auto_load_model |
| [rank1]: model_hash = hash_model_file(path) |
| [rank1]: ^^^^^^^^^^^^^^^^^^^^^ |
| [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/core/loader/file.py", line 118, in hash_model_file |
| [rank1]: keys_dict = load_keys_dict(path) |
| [rank1]: ^^^^^^^^^^^^^^^^^^^^ |
| [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/core/loader/file.py", line 74, in load_keys_dict |
| [rank1]: return load_keys_dict_from_bin(file_path) |
| [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/core/loader/file.py", line 96, in load_keys_dict_from_bin |
| [rank1]: state_dict = load_state_dict_from_bin(file_path) |
| [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| [rank1]: File "/data/rczhang/PencilFolder/DiffSynth-Studio/diffsynth/core/loader/file.py", line 28, in load_state_dict_from_bin |
| [rank1]: state_dict = torch.load(file_path, map_location=device, weights_only=True) |
| [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| [rank1]: File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/serialization.py", line 1484, in load |
| [rank1]: with _open_file_like(f, "rb") as opened_file: |
| [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^ |
| [rank1]: File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/serialization.py", line 759, in _open_file_like |
| [rank1]: return _open_file(name_or_buffer, mode) |
| [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| [rank1]: File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/serialization.py", line 740, in __init__ |
| [rank1]: super().__init__(open(name, mode)) |
| [rank1]: ^^^^^^^^^^^^^^^^ |
| [rank1]: FileNotFoundError: [Errno 2] No such file or directory: '/data/rczhang/PencilFolder/DiffSynth-Studio/models/Wan-AI/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth' |
| W1218 06:03:36.164000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 2434589 closing signal SIGTERM |
| W1218 06:03:36.165000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 2434607 closing signal SIGTERM |
| W1218 06:03:36.165000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 2434608 closing signal SIGTERM |
| W1218 06:03:36.166000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 2434609 closing signal SIGTERM |
| W1218 06:03:36.166000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:908] Sending process 2434610 closing signal SIGTERM |
| E1218 06:03:37.036000 2434465 site-packages/torch/distributed/elastic/multiprocessing/api.py:882] failed (exitcode: 1) local_rank: 1 (pid: 2434590) of binary: /home/rczhang/miniconda3/envs/diffsyn/bin/python3.12 |
| Traceback (most recent call last): |
| File "/home/rczhang/miniconda3/envs/diffsyn/bin/accelerate", line 7, in <module> |
| sys.exit(main()) |
| ^^^^^^ |
| File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 50, in main |
| args.func(args) |
| File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1272, in launch_command |
| multi_gpu_launcher(args) |
| File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/accelerate/commands/launch.py", line 899, in multi_gpu_launcher |
| distrib_run.run(args) |
| File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/distributed/run.py", line 927, in run |
| elastic_launch( |
| File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 156, in __call__ |
| return launch_agent(self._config, self._entrypoint, list(args)) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/home/rczhang/miniconda3/envs/diffsyn/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 293, in launch_agent |
| raise ChildFailedError( |
| torch.distributed.elastic.multiprocessing.errors.ChildFailedError: |
| ============================================================ |
| examples/wanvideo/model_training/train_mc_lora.py FAILED |
| ------------------------------------------------------------ |
| Failures: |
| <NO_OTHER_FAILURES> |
| ------------------------------------------------------------ |
| Root Cause (first observed failure): |
| [0]: |
| time : 2025-12-18_06:03:36 |
| host : bm-9103581 |
| rank : 1 (local_rank: 1) |
| exitcode : 1 (pid: 2434590) |
| error_file: <N/A> |
| traceback : To enable traceback see: https: |
| ============================================================ |
|
|