Create log.txt
Browse files
log.txt
ADDED
|
@@ -0,0 +1,218 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
python quantize.py
|
| 2 |
+
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
|
| 3 |
+
2026-04-23 22:05:23 INFO autoround.py L178: using MLLM mode for multimodal model.
|
| 4 |
+
2026-04-23 22:05:23 WARNING modeling_qwen3_5.py L411: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
|
| 5 |
+
Loading weights: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1184/1184 [00:00<00:00, 16102.02it/s]
|
| 6 |
+
2026-04-23 22:05:24 INFO base.py L517: using torch.bfloat16 for quantization tuning
|
| 7 |
+
2026-04-23 22:05:24 WARNING formats.py L166: some layers are skipped quantization (shape not divisible by 32): model.language_model.layers.[0-2,4-6,8-10,12-14,16-18,20-22,24-26,28-30,32-34,36-38,40-42,44-46,48-50,52-54,56-58,60-62].linear_attn.in_proj_a, model.language_model.layers.[0-2,4-6,8-10,12-14,16-18,20-22,24-26,28-30,32-34,36-38,40-42,44-46,48-50,52-54,56-58,60-62].linear_attn.in_proj_b, model.visual.blocks.[0-26].mlp.linear_fc1, model.visual.blocks.[0-26].mlp.linear_fc2
|
| 8 |
+
2026-04-23 22:05:24 WARNING modeling_utils.py L4435: `loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
|
| 9 |
+
2026-04-23 22:05:24 WARNING utils.py L444: Layer name or regex 'embed_tokens' in layer_config does not match any supported layers. Please check for typos or update the regex pattern, ignore it for now
|
| 10 |
+
2026-04-23 22:05:24 INFO base.py L1818: start to cache block inputs
|
| 11 |
+
cache block inputs: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 128/128 [00:00<00:00, 301.65it/s]
|
| 12 |
+
2026-04-23 22:05:25 INFO base.py L1835: caching done
|
| 13 |
+
Quantizing model.language_model.layers.0: 0%| | 0/64 [00:00<?, ?it/s]2026-04-23 22:13:45 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 14 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000000 -> iter 970: 0.000000,'peak_ram': 6.17GB, 'peak_vram': {'0': 17.11GB, '1': 2.78GB}
|
| 15 |
+
Quantizing model.language_model.layers.1: 2%|ββ | 1/64 [08:25<8:50:44, 505.47s/it]2026-04-23 22:22:07 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 16 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000000 -> iter 927: 0.000000,'peak_ram': 7.35GB, 'peak_vram': {'0': 17.11GB, '1': 2.79GB}
|
| 17 |
+
Quantizing model.language_model.layers.2: 3%|ββββ | 2/64 [16:48<8:40:35, 503.80s/it]2026-04-23 22:30:30 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 18 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000001 -> iter 683: 0.000001,'peak_ram': 8.53GB, 'peak_vram': {'0': 17.11GB, '1': 2.79GB}
|
| 19 |
+
Quantizing model.language_model.layers.3: 5%|βββββ | 3/64 [25:10<8:31:46, 503.38s/it]/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/torch/autograd/graph.py:869: UserWarning: Flash Attention defaults to a non-deterministic algorithm. To explicitly enable determinism call torch.use_deterministic_algorithms(True, warn_only=False). (Triggered internally at /pytorch/aten/src/ATen/native/transformers/cuda/attention_backward.cu:124.)
|
| 20 |
+
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
|
| 21 |
+
W0423 22:40:41.557000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
|
| 22 |
+
W0423 22:40:41.557000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
|
| 23 |
+
W0423 22:40:41.557000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
|
| 24 |
+
W0423 22:40:41.557000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
|
| 25 |
+
W0423 22:40:41.557000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
|
| 26 |
+
quantized 7/7 layers in the block, loss iter 0: 0.000002 -> iter 62: 0.000001,'peak_ram': 8.53GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 27 |
+
Quantizing model.language_model.layers.4: 6%|βββββββ | 4/64 [35:20<9:05:15, 545.26s/it]2026-04-23 22:49:03 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 28 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000002 -> iter 502: 0.000001,'peak_ram': 9.08GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 29 |
+
Quantizing model.language_model.layers.5: 8%|ββββββββ | 5/64 [43:43<8:41:09, 529.99s/it]2026-04-23 22:57:26 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 30 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000002 -> iter 985: 0.000002,'peak_ram': 10.26GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 31 |
+
Quantizing model.language_model.layers.6: 9%|ββββββββββ | 6/64 [52:06<8:23:30, 520.87s/it]2026-04-23 23:05:49 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 32 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000003 -> iter 625: 0.000002,'peak_ram': 11.44GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 33 |
+
Quantizing model.language_model.layers.7: 11%|βββββββββββ | 7/64 [1:00:30<8:09:36, 515.38s/it]W0423 23:16:01.317000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
|
| 34 |
+
W0423 23:16:01.317000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
|
| 35 |
+
W0423 23:16:01.317000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
|
| 36 |
+
W0423 23:16:01.317000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
|
| 37 |
+
W0423 23:16:01.317000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
|
| 38 |
+
quantized 7/7 layers in the block, loss iter 0: 0.000006 -> iter 843: 0.000004,'peak_ram': 11.44GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 39 |
+
Quantizing model.language_model.layers.8: 12%|βββββββββββββ | 8/64 [1:10:40<8:29:00, 545.36s/it]2026-04-23 23:24:22 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 40 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000006 -> iter 790: 0.000004,'peak_ram': 11.44GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 41 |
+
Quantizing model.language_model.layers.9: 14%|βββββββββββββββ | 9/64 [1:19:02<8:07:40, 532.01s/it]2026-04-23 23:32:45 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 42 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000007 -> iter 997: 0.000005,'peak_ram': 11.88GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 43 |
+
Quantizing model.language_model.layers.10: 16%|ββββββββββββββββ | 10/64 [1:27:25<7:50:38, 522.93s/it]2026-04-23 23:41:07 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 44 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000008 -> iter 872: 0.000006,'peak_ram': 13.05GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 45 |
+
Quantizing model.language_model.layers.11: 17%|βββββββββββββββββ | 11/64 [1:35:47<7:36:19, 516.59s/it]W0423 23:51:17.791000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
|
| 46 |
+
W0423 23:51:17.791000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
|
| 47 |
+
W0423 23:51:17.791000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
|
| 48 |
+
W0423 23:51:17.791000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
|
| 49 |
+
W0423 23:51:17.791000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
|
| 50 |
+
quantized 7/7 layers in the block, loss iter 0: 0.000012 -> iter 958: 0.000009,'peak_ram': 14.17GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 51 |
+
Quantizing model.language_model.layers.12: 19%|βββββββββββββββββββ | 12/64 [1:45:56<7:52:04, 544.70s/it]2026-04-23 23:59:38 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 52 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000013 -> iter 870: 0.000010,'peak_ram': 15.31GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 53 |
+
Quantizing model.language_model.layers.13: 20%|ββββββββββββββββββββ | 13/64 [1:54:18<7:31:59, 531.75s/it]2026-04-24 00:08:00 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 54 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000013 -> iter 578: 0.000011,'peak_ram': 16.43GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 55 |
+
Quantizing model.language_model.layers.14: 22%|ββββββββββββββββββββββ | 14/64 [2:02:41<7:15:50, 523.01s/it]2026-04-24 00:16:23 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 56 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000016 -> iter 389: 0.000013,'peak_ram': 16.43GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 57 |
+
Quantizing model.language_model.layers.15: 23%|ββββββββββββββββββββββββ | 15/64 [2:11:03<7:01:56, 516.66s/it]W0424 00:26:33.444000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
|
| 58 |
+
W0424 00:26:33.444000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
|
| 59 |
+
W0424 00:26:33.444000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
|
| 60 |
+
W0424 00:26:33.444000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
|
| 61 |
+
W0424 00:26:33.444000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
|
| 62 |
+
quantized 7/7 layers in the block, loss iter 0: 0.000021 -> iter 770: 0.000016,'peak_ram': 16.43GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 63 |
+
Quantizing model.language_model.layers.16: 25%|βββββββββββββββββββββββββ | 16/64 [2:21:12<7:15:33, 544.44s/it]2026-04-24 00:34:53 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 64 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000021 -> iter 368: 0.000017,'peak_ram': 16.94GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 65 |
+
Quantizing model.language_model.layers.17: 27%|βββββββββββββββββββββββββββ | 17/64 [2:29:33<6:56:23, 531.56s/it]2026-04-24 00:43:15 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 66 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000024 -> iter 899: 0.000018,'peak_ram': 16.94GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 67 |
+
Quantizing model.language_model.layers.18: 28%|ββββββββββββββββββββββββββββ | 18/64 [2:37:56<6:40:47, 522.76s/it]2026-04-24 00:51:38 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 68 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000031 -> iter 840: 0.000023,'peak_ram': 16.94GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 69 |
+
Quantizing model.language_model.layers.19: 30%|ββββββββββββββββββββββββββββββ | 19/64 [2:46:18<6:27:32, 516.72s/it]W0424 01:01:48.975000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
|
| 70 |
+
W0424 01:01:48.975000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
|
| 71 |
+
W0424 01:01:48.975000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
|
| 72 |
+
W0424 01:01:48.975000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
|
| 73 |
+
W0424 01:01:48.975000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
|
| 74 |
+
quantized 7/7 layers in the block, loss iter 0: 0.000041 -> iter 999: 0.000030,'peak_ram': 16.94GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 75 |
+
Quantizing model.language_model.layers.20: 31%|βββββββββββββββββββββββββββββββ | 20/64 [2:56:27<6:39:14, 544.43s/it]2026-04-24 01:10:09 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 76 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000040 -> iter 576: 0.000032,'peak_ram': 17.87GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 77 |
+
Quantizing model.language_model.layers.21: 33%|βββββββββββββββββββββββββββββββββ | 21/64 [3:04:50<6:21:13, 531.94s/it]2026-04-24 01:18:32 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 78 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000050 -> iter 804: 0.000035,'peak_ram': 17.87GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 79 |
+
Quantizing model.language_model.layers.22: 34%|ββββββββββββββββββββββββββββββββββ | 22/64 [3:13:13<6:06:10, 523.11s/it]2026-04-24 01:26:55 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 80 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000048 -> iter 501: 0.000040,'peak_ram': 17.87GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 81 |
+
Quantizing model.language_model.layers.23: 36%|ββββββββββββββββββββββββββββββββββββ | 23/64 [3:21:35<5:53:11, 516.86s/it]W0424 01:37:05.967000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
|
| 82 |
+
W0424 01:37:05.967000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
|
| 83 |
+
W0424 01:37:05.967000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
|
| 84 |
+
W0424 01:37:05.967000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
|
| 85 |
+
W0424 01:37:05.967000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
|
| 86 |
+
quantized 7/7 layers in the block, loss iter 0: 0.000067 -> iter 363: 0.000048,'peak_ram': 18.34GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 87 |
+
Quantizing model.language_model.layers.24: 38%|ββββββββββββββββββββββββββββββββββββββ | 24/64 [3:31:44<6:03:05, 544.63s/it]2026-04-24 01:45:27 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 88 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000064 -> iter 966: 0.000050,'peak_ram': 18.34GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 89 |
+
Quantizing model.language_model.layers.25: 39%|βββββββββββββββββββββββββββββββββββββββ | 25/64 [3:40:07<5:45:52, 532.12s/it]2026-04-24 01:53:50 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 90 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000069 -> iter 957: 0.000055,'peak_ram': 18.34GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 91 |
+
Quantizing model.language_model.layers.26: 41%|βββββββββββββββββββββββββββββββββββββββββ | 26/64 [3:48:30<5:31:24, 523.27s/it]2026-04-24 02:02:12 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 92 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000078 -> iter 248: 0.000062,'peak_ram': 18.34GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 93 |
+
Quantizing model.language_model.layers.27: 42%|ββββββββββββββββββββββββββββββββββββββββββ | 27/64 [3:56:52<5:18:46, 516.94s/it]W0424 02:12:23.058000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
|
| 94 |
+
W0424 02:12:23.058000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
|
| 95 |
+
W0424 02:12:23.058000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
|
| 96 |
+
W0424 02:12:23.058000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
|
| 97 |
+
W0424 02:12:23.058000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
|
| 98 |
+
quantized 7/7 layers in the block, loss iter 0: 0.000095 -> iter 484: 0.000070,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
|
| 99 |
+
Quantizing model.language_model.layers.28: 44%|ββββββββββββββββββββββββββββββββββββββββββββ | 28/64 [4:07:03<5:27:00, 545.01s/it]2026-04-24 02:20:45 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 100 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000113 -> iter 991: 0.000078,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
|
| 101 |
+
Quantizing model.language_model.layers.29: 45%|βββββββββββββββββββββββββββββββββββββββββββββ | 29/64 [4:15:25<5:10:28, 532.26s/it]2026-04-24 02:29:07 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 102 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000117 -> iter 446: 0.000084,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
|
| 103 |
+
Quantizing model.language_model.layers.30: 47%|βββββββββββββββββββββββββββββββββββββββββββββββ | 30/64 [4:23:47<4:56:30, 523.26s/it]2026-04-24 02:37:30 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 104 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000120 -> iter 975: 0.000094,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
|
| 105 |
+
Quantizing model.language_model.layers.31: 48%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 31/64 [4:32:10<4:44:21, 517.00s/it]W0424 02:47:40.894000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
|
| 106 |
+
W0424 02:47:40.894000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
|
| 107 |
+
W0424 02:47:40.894000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
|
| 108 |
+
W0424 02:47:40.894000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
|
| 109 |
+
W0424 02:47:40.894000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
|
| 110 |
+
quantized 7/7 layers in the block, loss iter 0: 0.000152 -> iter 706: 0.000109,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
|
| 111 |
+
Quantizing model.language_model.layers.32: 50%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 32/64 [4:42:19<4:50:32, 544.75s/it]2026-04-24 02:56:01 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 112 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000153 -> iter 621: 0.000115,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
|
| 113 |
+
Quantizing model.language_model.layers.33: 52%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 33/64 [4:50:41<4:34:47, 531.87s/it]2026-04-24 03:04:23 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 114 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000135 -> iter 391: 0.000116,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
|
| 115 |
+
Quantizing model.language_model.layers.34: 53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 34/64 [4:59:05<4:21:43, 523.46s/it]2026-04-24 03:12:47 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 116 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000155 -> iter 943: 0.000126,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
|
| 117 |
+
Quantizing model.language_model.layers.35: 55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 35/64 [5:07:27<4:09:52, 516.99s/it]W0424 03:22:57.328000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
|
| 118 |
+
W0424 03:22:57.328000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
|
| 119 |
+
W0424 03:22:57.328000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
|
| 120 |
+
W0424 03:22:57.328000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
|
| 121 |
+
W0424 03:22:57.328000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
|
| 122 |
+
quantized 7/7 layers in the block, loss iter 0: 0.000257 -> iter 761: 0.000163,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
|
| 123 |
+
Quantizing model.language_model.layers.36: 56%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 36/64 [5:17:36<4:14:07, 544.56s/it]2026-04-24 03:31:17 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 124 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000231 -> iter 785: 0.000165,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
|
| 125 |
+
Quantizing model.language_model.layers.37: 58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 37/64 [5:25:58<3:59:18, 531.78s/it]2026-04-24 03:39:39 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 126 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000292 -> iter 135: 0.000173,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
|
| 127 |
+
Quantizing model.language_model.layers.38: 59%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 38/64 [5:34:20<3:46:33, 522.82s/it]2026-04-24 03:48:01 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 128 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000250 -> iter 786: 0.000179,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
|
| 129 |
+
Quantizing model.language_model.layers.39: 61%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 39/64 [5:42:41<3:35:11, 516.44s/it]W0424 03:58:11.749000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
|
| 130 |
+
W0424 03:58:11.749000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
|
| 131 |
+
W0424 03:58:11.749000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
|
| 132 |
+
W0424 03:58:11.749000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
|
| 133 |
+
W0424 03:58:11.749000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
|
| 134 |
+
quantized 7/7 layers in the block, loss iter 0: 0.000275 -> iter 762: 0.000189,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
|
| 135 |
+
Quantizing model.language_model.layers.40: 62%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 40/64 [5:52:50<3:37:40, 544.20s/it]2026-04-24 04:06:32 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 136 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000418 -> iter 563: 0.000180,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
|
| 137 |
+
Quantizing model.language_model.layers.41: 64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 41/64 [6:01:13<3:23:50, 531.77s/it]2026-04-24 04:14:55 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 138 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000331 -> iter 979: 0.000189,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.28GB, '1': 4.05GB}
|
| 139 |
+
Quantizing model.language_model.layers.42: 66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 42/64 [6:09:35<3:11:42, 522.83s/it]2026-04-24 04:23:16 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 140 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000269 -> iter 976: 0.000193,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.28GB, '1': 4.05GB}
|
| 141 |
+
Quantizing model.language_model.layers.43: 67%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 43/64 [6:17:57<3:00:47, 516.53s/it]W0424 04:33:27.778000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
|
| 142 |
+
W0424 04:33:27.778000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
|
| 143 |
+
W0424 04:33:27.778000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
|
| 144 |
+
W0424 04:33:27.778000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
|
| 145 |
+
W0424 04:33:27.778000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
|
| 146 |
+
quantized 7/7 layers in the block, loss iter 0: 0.000655 -> iter 877: 0.000232,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.28GB, '1': 4.05GB}
|
| 147 |
+
Quantizing model.language_model.layers.44: 69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 44/64 [6:28:06<3:01:28, 544.41s/it]2026-04-24 04:41:48 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 148 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000602 -> iter 279: 0.000233,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.28GB, '1': 4.05GB}
|
| 149 |
+
Quantizing model.language_model.layers.45: 70%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 45/64 [6:36:28<2:48:20, 531.59s/it]2026-04-24 04:50:10 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 150 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000658 -> iter 295: 0.000253,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 151 |
+
Quantizing model.language_model.layers.46: 72%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 46/64 [6:44:50<2:36:49, 522.73s/it]2026-04-24 04:58:31 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 152 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000329 -> iter 307: 0.000277,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 153 |
+
Quantizing model.language_model.layers.47: 73%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 47/64 [6:53:12<2:26:19, 516.42s/it]W0424 05:08:42.448000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
|
| 154 |
+
W0424 05:08:42.448000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
|
| 155 |
+
W0424 05:08:42.448000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
|
| 156 |
+
W0424 05:08:42.448000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
|
| 157 |
+
W0424 05:08:42.448000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
|
| 158 |
+
quantized 7/7 layers in the block, loss iter 0: 0.000436 -> iter 456: 0.000267,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 159 |
+
Quantizing model.language_model.layers.48: 75%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 48/64 [7:03:22<2:25:14, 544.65s/it]2026-04-24 05:17:04 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 160 |
+
quantized 3/8 layers in the block, loss iter 0: 0.000644 -> iter 952: 0.000323,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 161 |
+
Quantizing model.language_model.layers.49: 77%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 49/64 [7:11:44<2:12:57, 531.84s/it]2026-04-24 05:25:26 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 162 |
+
quantized 3/8 layers in the block, loss iter 0: 0.001545 -> iter 448: 0.000352,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 163 |
+
Quantizing model.language_model.layers.50: 78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 50/64 [7:20:06<2:02:00, 522.89s/it]2026-04-24 05:33:48 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 164 |
+
quantized 3/8 layers in the block, loss iter 0: 0.001232 -> iter 939: 0.000404,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 165 |
+
Quantizing model.language_model.layers.51: 80%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 51/64 [7:28:28<1:51:56, 516.65s/it]W0424 05:43:58.587000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
|
| 166 |
+
W0424 05:43:58.587000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
|
| 167 |
+
W0424 05:43:58.587000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
|
| 168 |
+
W0424 05:43:58.587000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
|
| 169 |
+
W0424 05:43:58.587000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
|
| 170 |
+
quantized 7/7 layers in the block, loss iter 0: 0.001256 -> iter 400: 0.000540,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 171 |
+
Quantizing model.language_model.layers.52: 81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 52/64 [7:38:37<1:48:51, 544.31s/it]2026-04-24 05:52:19 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 172 |
+
quantized 3/8 layers in the block, loss iter 0: 0.001144 -> iter 864: 0.000658,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 173 |
+
Quantizing model.language_model.layers.53: 83%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 53/64 [7:46:59<1:37:27, 531.61s/it]2026-04-24 06:00:40 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 174 |
+
quantized 3/8 layers in the block, loss iter 0: 0.001151 -> iter 691: 0.000810,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 175 |
+
Quantizing model.language_model.layers.54: 84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 54/64 [7:55:20<1:27:05, 522.53s/it]2026-04-24 06:09:02 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 176 |
+
quantized 3/8 layers in the block, loss iter 0: 0.007836 -> iter 8: 0.001167,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 177 |
+
Quantizing model.language_model.layers.55: 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 55/64 [8:03:43<1:17:29, 516.63s/it]W0424 06:19:14.204000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
|
| 178 |
+
W0424 06:19:14.204000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
|
| 179 |
+
W0424 06:19:14.204000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
|
| 180 |
+
W0424 06:19:14.204000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
|
| 181 |
+
W0424 06:19:14.204000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
|
| 182 |
+
quantized 7/7 layers in the block, loss iter 0: 0.003367 -> iter 941: 0.001292,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 183 |
+
Quantizing model.language_model.layers.56: 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 56/64 [8:13:53<1:12:35, 544.48s/it]2026-04-24 06:27:34 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 184 |
+
quantized 3/8 layers in the block, loss iter 0: 0.001866 -> iter 561: 0.001428,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 185 |
+
Quantizing model.language_model.layers.57: 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 57/64 [8:22:14<1:02:01, 531.59s/it]2026-04-24 06:35:56 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 186 |
+
quantized 3/8 layers in the block, loss iter 0: 0.002944 -> iter 28: 0.001646,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 187 |
+
Quantizing model.language_model.layers.58: 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 58/64 [8:30:37<52:17, 522.86s/it]2026-04-24 06:44:19 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 188 |
+
quantized 3/8 layers in the block, loss iter 0: 0.029795 -> iter 986: 0.002081,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 189 |
+
Quantizing model.language_model.layers.59: 92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 59/64 [8:38:59<43:03, 516.67s/it]W0424 06:54:29.850000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
|
| 190 |
+
W0424 06:54:29.850000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
|
| 191 |
+
W0424 06:54:29.850000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
|
| 192 |
+
W0424 06:54:29.850000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
|
| 193 |
+
W0424 06:54:29.850000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
|
| 194 |
+
quantized 7/7 layers in the block, loss iter 0: 0.004233 -> iter 560: 0.002663,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 195 |
+
Quantizing model.language_model.layers.60: 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 60/64 [8:49:08<36:17, 544.49s/it]2026-04-24 07:02:50 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 196 |
+
quantized 3/8 layers in the block, loss iter 0: 0.005730 -> iter 702: 0.003024,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 197 |
+
Quantizing model.language_model.layers.61: 95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 61/64 [8:57:30<26:35, 531.79s/it]2026-04-24 07:11:12 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 198 |
+
quantized 3/8 layers in the block, loss iter 0: 0.007457 -> iter 697: 0.003657,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 199 |
+
Quantizing model.language_model.layers.62: 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 62/64 [9:05:54<17:26, 523.34s/it]2026-04-24 07:19:37 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
|
| 200 |
+
quantized 3/8 layers in the block, loss iter 0: 0.013277 -> iter 66: 0.005018,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 201 |
+
Quantizing model.language_model.layers.63: 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 63/64 [9:14:17<08:37, 517.29s/it]W0424 07:29:48.607000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
|
| 202 |
+
W0424 07:29:48.607000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
|
| 203 |
+
W0424 07:29:48.607000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
|
| 204 |
+
W0424 07:29:48.607000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
|
| 205 |
+
W0424 07:29:48.607000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
|
| 206 |
+
quantized 7/7 layers in the block, loss iter 0: 0.021613 -> iter 737: 0.007835,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 207 |
+
Quantizing done: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 64/64 [9:24:27<00:00, 529.18s/it]
|
| 208 |
+
2026-04-24 07:29:53 INFO device.py L1692: 'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|
| 209 |
+
2026-04-24 07:30:01 INFO shard_writer.py L250: model has been saved to ./Qwen3.6-27B-INT8-autoround/
|
| 210 |
+
2026-04-24 07:30:01 INFO base.py L1893: quantization tuning time 33875.81219172478
|
| 211 |
+
2026-04-24 07:30:01 INFO base.py L1912: Summary: quantized 256/607 in the model, unquantized layers: lm_head, model.language_model.layers.[0-2,4-6,8-10,12-14,16-18,20-22,24-26,28-30,32-34,36-38,40-42,44-46,48-50,52-54,56-58,60-62].linear_attn.in_proj_a, model.language_model.layers.[0-2,4-6,8-10,12-14,16-18,20-22,24-26,28-30,32-34,36-38,40-42,44-46,48-50,52-54,56-58,60-62].linear_attn.in_proj_b, model.language_model.layers.[0-2,4-6,8-10,12-14,16-18,20-22,24-26,28-30,32-34,36-38,40-42,44-46,48-50,52-54,56-58,60-62].linear_attn.in_proj_qkv, model.language_model.layers.[0-2,4-6,8-10,12-14,16-18,20-22,24-26,28-30,32-34,36-38,40-42,44-46,48-50,52-54,56-58,60-62].linear_attn.in_proj_z, model.language_model.layers.[0-2,4-6,8-10,12-14,16-18,20-22,24-26,28-30,32-34,36-38,40-42,44-46,48-50,52-54,56-58,60-62].linear_attn.out_proj, model.visual.blocks.[0-26].attn.proj, model.visual.blocks.[0-26].attn.qkv, model.visual.blocks.[0-26].mlp.linear_fc1, model.visual.blocks.[0-26].mlp.linear_fc2, model.visual.merger.linear_fc1, model.visual.merger.linear_fc2
|
| 212 |
+
2026-04-24 07:30:01 INFO missing_tensors.py L236: Found 15 tensor(s) in the source checkpoint that are absent from the saved output (e.g., MTP parameters): mtp.fc, mtp.layers.0.input_layernorm, mtp.layers.0.mlp.down_proj, mtp.layers.0.mlp.gate_proj, mtp.layers.0.mlp.up_proj, mtp.layers.0.post_attention_layernorm, mtp.layers.0.self_attn.k_norm, mtp.layers.0.self_attn.k_proj, mtp.layers.0.self_attn.o_proj, mtp.layers.0.self_attn.q_norm, mtp.layers.0.self_attn.q_proj, mtp.layers.0.self_attn.v_proj, mtp.norm, mtp.pre_fc_norm_embedding, mtp.pre_fc_norm_hidden. Copying them now...
|
| 213 |
+
|
| 214 |
+
Loading missing tensors: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 447.89shard/s]
|
| 215 |
+
2026-04-24 07:30:01 INFO missing_tensors.py L644: Processing config.json to update quantization_config for missing tensors...
|
| 216 |
+
2026-04-24 07:30:01 INFO missing_tensors.py L614: Updated extra_config for 8 ignored layer(s): mtp.fc, mtp.layers.0.mlp.down_proj, mtp.layers.0.mlp.gate_proj, mtp.layers.0.mlp.up_proj, mtp.layers.0.self_attn.k_proj, mtp.layers.0.self_attn.o_proj, mtp.layers.0.self_attn.q_proj, mtp.layers.0.self_attn.v_proj
|
| 217 |
+
2026-04-24 07:30:02 INFO missing_tensors.py L370: Successfully wrote 15 missing tensor(s) to 'model_extra_tensors.safetensors' in ./Qwen3.6-27B-INT8-autoround.
|
| 218 |
+
2026-04-24 07:30:02 INFO device.py L1692: 'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
|