Minachist commited on
Commit
a31d4d8
Β·
verified Β·
1 Parent(s): 55b36ba

Create log.txt

Browse files
Files changed (1) hide show
  1. log.txt +218 -0
log.txt ADDED
@@ -0,0 +1,218 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ python quantize.py
2
+ Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
3
+ 2026-04-23 22:05:23 INFO autoround.py L178: using MLLM mode for multimodal model.
4
+ 2026-04-23 22:05:23 WARNING modeling_qwen3_5.py L411: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
5
+ Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1184/1184 [00:00<00:00, 16102.02it/s]
6
+ 2026-04-23 22:05:24 INFO base.py L517: using torch.bfloat16 for quantization tuning
7
+ 2026-04-23 22:05:24 WARNING formats.py L166: some layers are skipped quantization (shape not divisible by 32): model.language_model.layers.[0-2,4-6,8-10,12-14,16-18,20-22,24-26,28-30,32-34,36-38,40-42,44-46,48-50,52-54,56-58,60-62].linear_attn.in_proj_a, model.language_model.layers.[0-2,4-6,8-10,12-14,16-18,20-22,24-26,28-30,32-34,36-38,40-42,44-46,48-50,52-54,56-58,60-62].linear_attn.in_proj_b, model.visual.blocks.[0-26].mlp.linear_fc1, model.visual.blocks.[0-26].mlp.linear_fc2
8
+ 2026-04-23 22:05:24 WARNING modeling_utils.py L4435: `loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
9
+ 2026-04-23 22:05:24 WARNING utils.py L444: Layer name or regex 'embed_tokens' in layer_config does not match any supported layers. Please check for typos or update the regex pattern, ignore it for now
10
+ 2026-04-23 22:05:24 INFO base.py L1818: start to cache block inputs
11
+ cache block inputs: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 128/128 [00:00<00:00, 301.65it/s]
12
+ 2026-04-23 22:05:25 INFO base.py L1835: caching done
13
+ Quantizing model.language_model.layers.0: 0%| | 0/64 [00:00<?, ?it/s]2026-04-23 22:13:45 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
14
+ quantized 3/8 layers in the block, loss iter 0: 0.000000 -> iter 970: 0.000000,'peak_ram': 6.17GB, 'peak_vram': {'0': 17.11GB, '1': 2.78GB}
15
+ Quantizing model.language_model.layers.1: 2%|β–ˆβ–Œ | 1/64 [08:25<8:50:44, 505.47s/it]2026-04-23 22:22:07 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
16
+ quantized 3/8 layers in the block, loss iter 0: 0.000000 -> iter 927: 0.000000,'peak_ram': 7.35GB, 'peak_vram': {'0': 17.11GB, '1': 2.79GB}
17
+ Quantizing model.language_model.layers.2: 3%|β–ˆβ–ˆβ–ˆβ– | 2/64 [16:48<8:40:35, 503.80s/it]2026-04-23 22:30:30 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
18
+ quantized 3/8 layers in the block, loss iter 0: 0.000001 -> iter 683: 0.000001,'peak_ram': 8.53GB, 'peak_vram': {'0': 17.11GB, '1': 2.79GB}
19
+ Quantizing model.language_model.layers.3: 5%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 3/64 [25:10<8:31:46, 503.38s/it]/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/torch/autograd/graph.py:869: UserWarning: Flash Attention defaults to a non-deterministic algorithm. To explicitly enable determinism call torch.use_deterministic_algorithms(True, warn_only=False). (Triggered internally at /pytorch/aten/src/ATen/native/transformers/cuda/attention_backward.cu:124.)
20
+ return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
21
+ W0423 22:40:41.557000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
22
+ W0423 22:40:41.557000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
23
+ W0423 22:40:41.557000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
24
+ W0423 22:40:41.557000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
25
+ W0423 22:40:41.557000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
26
+ quantized 7/7 layers in the block, loss iter 0: 0.000002 -> iter 62: 0.000001,'peak_ram': 8.53GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
27
+ Quantizing model.language_model.layers.4: 6%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 4/64 [35:20<9:05:15, 545.26s/it]2026-04-23 22:49:03 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
28
+ quantized 3/8 layers in the block, loss iter 0: 0.000002 -> iter 502: 0.000001,'peak_ram': 9.08GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
29
+ Quantizing model.language_model.layers.5: 8%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 5/64 [43:43<8:41:09, 529.99s/it]2026-04-23 22:57:26 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
30
+ quantized 3/8 layers in the block, loss iter 0: 0.000002 -> iter 985: 0.000002,'peak_ram': 10.26GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
31
+ Quantizing model.language_model.layers.6: 9%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 6/64 [52:06<8:23:30, 520.87s/it]2026-04-23 23:05:49 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
32
+ quantized 3/8 layers in the block, loss iter 0: 0.000003 -> iter 625: 0.000002,'peak_ram': 11.44GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
33
+ Quantizing model.language_model.layers.7: 11%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 7/64 [1:00:30<8:09:36, 515.38s/it]W0423 23:16:01.317000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
34
+ W0423 23:16:01.317000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
35
+ W0423 23:16:01.317000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
36
+ W0423 23:16:01.317000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
37
+ W0423 23:16:01.317000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
38
+ quantized 7/7 layers in the block, loss iter 0: 0.000006 -> iter 843: 0.000004,'peak_ram': 11.44GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
39
+ Quantizing model.language_model.layers.8: 12%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 8/64 [1:10:40<8:29:00, 545.36s/it]2026-04-23 23:24:22 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
40
+ quantized 3/8 layers in the block, loss iter 0: 0.000006 -> iter 790: 0.000004,'peak_ram': 11.44GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
41
+ Quantizing model.language_model.layers.9: 14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 9/64 [1:19:02<8:07:40, 532.01s/it]2026-04-23 23:32:45 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
42
+ quantized 3/8 layers in the block, loss iter 0: 0.000007 -> iter 997: 0.000005,'peak_ram': 11.88GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
43
+ Quantizing model.language_model.layers.10: 16%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 10/64 [1:27:25<7:50:38, 522.93s/it]2026-04-23 23:41:07 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
44
+ quantized 3/8 layers in the block, loss iter 0: 0.000008 -> iter 872: 0.000006,'peak_ram': 13.05GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
45
+ Quantizing model.language_model.layers.11: 17%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 11/64 [1:35:47<7:36:19, 516.59s/it]W0423 23:51:17.791000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
46
+ W0423 23:51:17.791000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
47
+ W0423 23:51:17.791000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
48
+ W0423 23:51:17.791000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
49
+ W0423 23:51:17.791000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
50
+ quantized 7/7 layers in the block, loss iter 0: 0.000012 -> iter 958: 0.000009,'peak_ram': 14.17GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
51
+ Quantizing model.language_model.layers.12: 19%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 12/64 [1:45:56<7:52:04, 544.70s/it]2026-04-23 23:59:38 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
52
+ quantized 3/8 layers in the block, loss iter 0: 0.000013 -> iter 870: 0.000010,'peak_ram': 15.31GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
53
+ Quantizing model.language_model.layers.13: 20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 13/64 [1:54:18<7:31:59, 531.75s/it]2026-04-24 00:08:00 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
54
+ quantized 3/8 layers in the block, loss iter 0: 0.000013 -> iter 578: 0.000011,'peak_ram': 16.43GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
55
+ Quantizing model.language_model.layers.14: 22%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 14/64 [2:02:41<7:15:50, 523.01s/it]2026-04-24 00:16:23 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
56
+ quantized 3/8 layers in the block, loss iter 0: 0.000016 -> iter 389: 0.000013,'peak_ram': 16.43GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
57
+ Quantizing model.language_model.layers.15: 23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 15/64 [2:11:03<7:01:56, 516.66s/it]W0424 00:26:33.444000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
58
+ W0424 00:26:33.444000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
59
+ W0424 00:26:33.444000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
60
+ W0424 00:26:33.444000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
61
+ W0424 00:26:33.444000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
62
+ quantized 7/7 layers in the block, loss iter 0: 0.000021 -> iter 770: 0.000016,'peak_ram': 16.43GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
63
+ Quantizing model.language_model.layers.16: 25%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 16/64 [2:21:12<7:15:33, 544.44s/it]2026-04-24 00:34:53 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
64
+ quantized 3/8 layers in the block, loss iter 0: 0.000021 -> iter 368: 0.000017,'peak_ram': 16.94GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
65
+ Quantizing model.language_model.layers.17: 27%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 17/64 [2:29:33<6:56:23, 531.56s/it]2026-04-24 00:43:15 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
66
+ quantized 3/8 layers in the block, loss iter 0: 0.000024 -> iter 899: 0.000018,'peak_ram': 16.94GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
67
+ Quantizing model.language_model.layers.18: 28%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 18/64 [2:37:56<6:40:47, 522.76s/it]2026-04-24 00:51:38 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
68
+ quantized 3/8 layers in the block, loss iter 0: 0.000031 -> iter 840: 0.000023,'peak_ram': 16.94GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
69
+ Quantizing model.language_model.layers.19: 30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 19/64 [2:46:18<6:27:32, 516.72s/it]W0424 01:01:48.975000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
70
+ W0424 01:01:48.975000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
71
+ W0424 01:01:48.975000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
72
+ W0424 01:01:48.975000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
73
+ W0424 01:01:48.975000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
74
+ quantized 7/7 layers in the block, loss iter 0: 0.000041 -> iter 999: 0.000030,'peak_ram': 16.94GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
75
+ Quantizing model.language_model.layers.20: 31%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 20/64 [2:56:27<6:39:14, 544.43s/it]2026-04-24 01:10:09 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
76
+ quantized 3/8 layers in the block, loss iter 0: 0.000040 -> iter 576: 0.000032,'peak_ram': 17.87GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
77
+ Quantizing model.language_model.layers.21: 33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 21/64 [3:04:50<6:21:13, 531.94s/it]2026-04-24 01:18:32 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
78
+ quantized 3/8 layers in the block, loss iter 0: 0.000050 -> iter 804: 0.000035,'peak_ram': 17.87GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
79
+ Quantizing model.language_model.layers.22: 34%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 22/64 [3:13:13<6:06:10, 523.11s/it]2026-04-24 01:26:55 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
80
+ quantized 3/8 layers in the block, loss iter 0: 0.000048 -> iter 501: 0.000040,'peak_ram': 17.87GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
81
+ Quantizing model.language_model.layers.23: 36%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 23/64 [3:21:35<5:53:11, 516.86s/it]W0424 01:37:05.967000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
82
+ W0424 01:37:05.967000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
83
+ W0424 01:37:05.967000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
84
+ W0424 01:37:05.967000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
85
+ W0424 01:37:05.967000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
86
+ quantized 7/7 layers in the block, loss iter 0: 0.000067 -> iter 363: 0.000048,'peak_ram': 18.34GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
87
+ Quantizing model.language_model.layers.24: 38%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 24/64 [3:31:44<6:03:05, 544.63s/it]2026-04-24 01:45:27 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
88
+ quantized 3/8 layers in the block, loss iter 0: 0.000064 -> iter 966: 0.000050,'peak_ram': 18.34GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
89
+ Quantizing model.language_model.layers.25: 39%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 25/64 [3:40:07<5:45:52, 532.12s/it]2026-04-24 01:53:50 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
90
+ quantized 3/8 layers in the block, loss iter 0: 0.000069 -> iter 957: 0.000055,'peak_ram': 18.34GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
91
+ Quantizing model.language_model.layers.26: 41%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 26/64 [3:48:30<5:31:24, 523.27s/it]2026-04-24 02:02:12 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
92
+ quantized 3/8 layers in the block, loss iter 0: 0.000078 -> iter 248: 0.000062,'peak_ram': 18.34GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
93
+ Quantizing model.language_model.layers.27: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 27/64 [3:56:52<5:18:46, 516.94s/it]W0424 02:12:23.058000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
94
+ W0424 02:12:23.058000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
95
+ W0424 02:12:23.058000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
96
+ W0424 02:12:23.058000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
97
+ W0424 02:12:23.058000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
98
+ quantized 7/7 layers in the block, loss iter 0: 0.000095 -> iter 484: 0.000070,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.11GB, '1': 4.05GB}
99
+ Quantizing model.language_model.layers.28: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 28/64 [4:07:03<5:27:00, 545.01s/it]2026-04-24 02:20:45 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
100
+ quantized 3/8 layers in the block, loss iter 0: 0.000113 -> iter 991: 0.000078,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
101
+ Quantizing model.language_model.layers.29: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 29/64 [4:15:25<5:10:28, 532.26s/it]2026-04-24 02:29:07 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
102
+ quantized 3/8 layers in the block, loss iter 0: 0.000117 -> iter 446: 0.000084,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
103
+ Quantizing model.language_model.layers.30: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 30/64 [4:23:47<4:56:30, 523.26s/it]2026-04-24 02:37:30 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
104
+ quantized 3/8 layers in the block, loss iter 0: 0.000120 -> iter 975: 0.000094,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
105
+ Quantizing model.language_model.layers.31: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 31/64 [4:32:10<4:44:21, 517.00s/it]W0424 02:47:40.894000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
106
+ W0424 02:47:40.894000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
107
+ W0424 02:47:40.894000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
108
+ W0424 02:47:40.894000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
109
+ W0424 02:47:40.894000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
110
+ quantized 7/7 layers in the block, loss iter 0: 0.000152 -> iter 706: 0.000109,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
111
+ Quantizing model.language_model.layers.32: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 32/64 [4:42:19<4:50:32, 544.75s/it]2026-04-24 02:56:01 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
112
+ quantized 3/8 layers in the block, loss iter 0: 0.000153 -> iter 621: 0.000115,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
113
+ Quantizing model.language_model.layers.33: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 33/64 [4:50:41<4:34:47, 531.87s/it]2026-04-24 03:04:23 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
114
+ quantized 3/8 layers in the block, loss iter 0: 0.000135 -> iter 391: 0.000116,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
115
+ Quantizing model.language_model.layers.34: 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 34/64 [4:59:05<4:21:43, 523.46s/it]2026-04-24 03:12:47 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
116
+ quantized 3/8 layers in the block, loss iter 0: 0.000155 -> iter 943: 0.000126,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
117
+ Quantizing model.language_model.layers.35: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 35/64 [5:07:27<4:09:52, 516.99s/it]W0424 03:22:57.328000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
118
+ W0424 03:22:57.328000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
119
+ W0424 03:22:57.328000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
120
+ W0424 03:22:57.328000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
121
+ W0424 03:22:57.328000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
122
+ quantized 7/7 layers in the block, loss iter 0: 0.000257 -> iter 761: 0.000163,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
123
+ Quantizing model.language_model.layers.36: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 36/64 [5:17:36<4:14:07, 544.56s/it]2026-04-24 03:31:17 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
124
+ quantized 3/8 layers in the block, loss iter 0: 0.000231 -> iter 785: 0.000165,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
125
+ Quantizing model.language_model.layers.37: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 37/64 [5:25:58<3:59:18, 531.78s/it]2026-04-24 03:39:39 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
126
+ quantized 3/8 layers in the block, loss iter 0: 0.000292 -> iter 135: 0.000173,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
127
+ Quantizing model.language_model.layers.38: 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 38/64 [5:34:20<3:46:33, 522.82s/it]2026-04-24 03:48:01 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
128
+ quantized 3/8 layers in the block, loss iter 0: 0.000250 -> iter 786: 0.000179,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
129
+ Quantizing model.language_model.layers.39: 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 39/64 [5:42:41<3:35:11, 516.44s/it]W0424 03:58:11.749000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
130
+ W0424 03:58:11.749000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
131
+ W0424 03:58:11.749000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
132
+ W0424 03:58:11.749000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
133
+ W0424 03:58:11.749000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
134
+ quantized 7/7 layers in the block, loss iter 0: 0.000275 -> iter 762: 0.000189,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
135
+ Quantizing model.language_model.layers.40: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 40/64 [5:52:50<3:37:40, 544.20s/it]2026-04-24 04:06:32 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
136
+ quantized 3/8 layers in the block, loss iter 0: 0.000418 -> iter 563: 0.000180,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.2GB, '1': 4.05GB}
137
+ Quantizing model.language_model.layers.41: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 41/64 [6:01:13<3:23:50, 531.77s/it]2026-04-24 04:14:55 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
138
+ quantized 3/8 layers in the block, loss iter 0: 0.000331 -> iter 979: 0.000189,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.28GB, '1': 4.05GB}
139
+ Quantizing model.language_model.layers.42: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 42/64 [6:09:35<3:11:42, 522.83s/it]2026-04-24 04:23:16 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
140
+ quantized 3/8 layers in the block, loss iter 0: 0.000269 -> iter 976: 0.000193,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.28GB, '1': 4.05GB}
141
+ Quantizing model.language_model.layers.43: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 43/64 [6:17:57<3:00:47, 516.53s/it]W0424 04:33:27.778000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
142
+ W0424 04:33:27.778000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
143
+ W0424 04:33:27.778000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
144
+ W0424 04:33:27.778000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
145
+ W0424 04:33:27.778000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
146
+ quantized 7/7 layers in the block, loss iter 0: 0.000655 -> iter 877: 0.000232,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.28GB, '1': 4.05GB}
147
+ Quantizing model.language_model.layers.44: 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 44/64 [6:28:06<3:01:28, 544.41s/it]2026-04-24 04:41:48 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
148
+ quantized 3/8 layers in the block, loss iter 0: 0.000602 -> iter 279: 0.000233,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.28GB, '1': 4.05GB}
149
+ Quantizing model.language_model.layers.45: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 45/64 [6:36:28<2:48:20, 531.59s/it]2026-04-24 04:50:10 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
150
+ quantized 3/8 layers in the block, loss iter 0: 0.000658 -> iter 295: 0.000253,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
151
+ Quantizing model.language_model.layers.46: 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 46/64 [6:44:50<2:36:49, 522.73s/it]2026-04-24 04:58:31 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
152
+ quantized 3/8 layers in the block, loss iter 0: 0.000329 -> iter 307: 0.000277,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
153
+ Quantizing model.language_model.layers.47: 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 47/64 [6:53:12<2:26:19, 516.42s/it]W0424 05:08:42.448000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
154
+ W0424 05:08:42.448000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
155
+ W0424 05:08:42.448000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
156
+ W0424 05:08:42.448000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
157
+ W0424 05:08:42.448000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
158
+ quantized 7/7 layers in the block, loss iter 0: 0.000436 -> iter 456: 0.000267,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
159
+ Quantizing model.language_model.layers.48: 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 48/64 [7:03:22<2:25:14, 544.65s/it]2026-04-24 05:17:04 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
160
+ quantized 3/8 layers in the block, loss iter 0: 0.000644 -> iter 952: 0.000323,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
161
+ Quantizing model.language_model.layers.49: 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 49/64 [7:11:44<2:12:57, 531.84s/it]2026-04-24 05:25:26 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
162
+ quantized 3/8 layers in the block, loss iter 0: 0.001545 -> iter 448: 0.000352,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
163
+ Quantizing model.language_model.layers.50: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 50/64 [7:20:06<2:02:00, 522.89s/it]2026-04-24 05:33:48 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
164
+ quantized 3/8 layers in the block, loss iter 0: 0.001232 -> iter 939: 0.000404,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
165
+ Quantizing model.language_model.layers.51: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 51/64 [7:28:28<1:51:56, 516.65s/it]W0424 05:43:58.587000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
166
+ W0424 05:43:58.587000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
167
+ W0424 05:43:58.587000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
168
+ W0424 05:43:58.587000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
169
+ W0424 05:43:58.587000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
170
+ quantized 7/7 layers in the block, loss iter 0: 0.001256 -> iter 400: 0.000540,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
171
+ Quantizing model.language_model.layers.52: 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 52/64 [7:38:37<1:48:51, 544.31s/it]2026-04-24 05:52:19 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
172
+ quantized 3/8 layers in the block, loss iter 0: 0.001144 -> iter 864: 0.000658,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
173
+ Quantizing model.language_model.layers.53: 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 53/64 [7:46:59<1:37:27, 531.61s/it]2026-04-24 06:00:40 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
174
+ quantized 3/8 layers in the block, loss iter 0: 0.001151 -> iter 691: 0.000810,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
175
+ Quantizing model.language_model.layers.54: 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 54/64 [7:55:20<1:27:05, 522.53s/it]2026-04-24 06:09:02 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
176
+ quantized 3/8 layers in the block, loss iter 0: 0.007836 -> iter 8: 0.001167,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
177
+ Quantizing model.language_model.layers.55: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 55/64 [8:03:43<1:17:29, 516.63s/it]W0424 06:19:14.204000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
178
+ W0424 06:19:14.204000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
179
+ W0424 06:19:14.204000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
180
+ W0424 06:19:14.204000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
181
+ W0424 06:19:14.204000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
182
+ quantized 7/7 layers in the block, loss iter 0: 0.003367 -> iter 941: 0.001292,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
183
+ Quantizing model.language_model.layers.56: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 56/64 [8:13:53<1:12:35, 544.48s/it]2026-04-24 06:27:34 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
184
+ quantized 3/8 layers in the block, loss iter 0: 0.001866 -> iter 561: 0.001428,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
185
+ Quantizing model.language_model.layers.57: 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 57/64 [8:22:14<1:02:01, 531.59s/it]2026-04-24 06:35:56 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
186
+ quantized 3/8 layers in the block, loss iter 0: 0.002944 -> iter 28: 0.001646,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
187
+ Quantizing model.language_model.layers.58: 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 58/64 [8:30:37<52:17, 522.86s/it]2026-04-24 06:44:19 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
188
+ quantized 3/8 layers in the block, loss iter 0: 0.029795 -> iter 986: 0.002081,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
189
+ Quantizing model.language_model.layers.59: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 59/64 [8:38:59<43:03, 516.67s/it]W0424 06:54:29.850000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
190
+ W0424 06:54:29.850000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
191
+ W0424 06:54:29.850000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
192
+ W0424 06:54:29.850000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
193
+ W0424 06:54:29.850000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
194
+ quantized 7/7 layers in the block, loss iter 0: 0.004233 -> iter 560: 0.002663,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
195
+ Quantizing model.language_model.layers.60: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 60/64 [8:49:08<36:17, 544.49s/it]2026-04-24 07:02:50 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
196
+ quantized 3/8 layers in the block, loss iter 0: 0.005730 -> iter 702: 0.003024,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
197
+ Quantizing model.language_model.layers.61: 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 61/64 [8:57:30<26:35, 531.79s/it]2026-04-24 07:11:12 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
198
+ quantized 3/8 layers in the block, loss iter 0: 0.007457 -> iter 697: 0.003657,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
199
+ Quantizing model.language_model.layers.62: 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 62/64 [9:05:54<17:26, 523.34s/it]2026-04-24 07:19:37 INFO base.py L3187: Unquantized layers: ['linear_attn.out_proj', 'linear_attn.in_proj_qkv', 'linear_attn.in_proj_z', 'linear_attn.in_proj_b', 'linear_attn.in_proj_a']
200
+ quantized 3/8 layers in the block, loss iter 0: 0.013277 -> iter 66: 0.005018,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
201
+ Quantizing model.language_model.layers.63: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 63/64 [9:14:17<08:37, 517.29s/it]W0424 07:29:48.607000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] torch._dynamo hit config.recompile_limit (8)
202
+ W0424 07:29:48.607000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] function: 'quant_tensor_sym' (/mnt/[redacted]/Qwen3.6-27B/venv/lib/python3.14/site-packages/auto_round/data_type/int.py:118)
203
+ W0424 07:29:48.607000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] last reason: 0/7: tensor 'v' Tensor device index mismatch. Expected device index to be , actual
204
+ W0424 07:29:48.607000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
205
+ W0424 07:29:48.607000 2005701 venv/lib/python3.14/site-packages/torch/_dynamo/convert_frame.py:1743] [0/8] To diagnose recompilation issues, see https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html
206
+ quantized 7/7 layers in the block, loss iter 0: 0.021613 -> iter 737: 0.007835,'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
207
+ Quantizing done: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 64/64 [9:24:27<00:00, 529.18s/it]
208
+ 2026-04-24 07:29:53 INFO device.py L1692: 'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}
209
+ 2026-04-24 07:30:01 INFO shard_writer.py L250: model has been saved to ./Qwen3.6-27B-INT8-autoround/
210
+ 2026-04-24 07:30:01 INFO base.py L1893: quantization tuning time 33875.81219172478
211
+ 2026-04-24 07:30:01 INFO base.py L1912: Summary: quantized 256/607 in the model, unquantized layers: lm_head, model.language_model.layers.[0-2,4-6,8-10,12-14,16-18,20-22,24-26,28-30,32-34,36-38,40-42,44-46,48-50,52-54,56-58,60-62].linear_attn.in_proj_a, model.language_model.layers.[0-2,4-6,8-10,12-14,16-18,20-22,24-26,28-30,32-34,36-38,40-42,44-46,48-50,52-54,56-58,60-62].linear_attn.in_proj_b, model.language_model.layers.[0-2,4-6,8-10,12-14,16-18,20-22,24-26,28-30,32-34,36-38,40-42,44-46,48-50,52-54,56-58,60-62].linear_attn.in_proj_qkv, model.language_model.layers.[0-2,4-6,8-10,12-14,16-18,20-22,24-26,28-30,32-34,36-38,40-42,44-46,48-50,52-54,56-58,60-62].linear_attn.in_proj_z, model.language_model.layers.[0-2,4-6,8-10,12-14,16-18,20-22,24-26,28-30,32-34,36-38,40-42,44-46,48-50,52-54,56-58,60-62].linear_attn.out_proj, model.visual.blocks.[0-26].attn.proj, model.visual.blocks.[0-26].attn.qkv, model.visual.blocks.[0-26].mlp.linear_fc1, model.visual.blocks.[0-26].mlp.linear_fc2, model.visual.merger.linear_fc1, model.visual.merger.linear_fc2
212
+ 2026-04-24 07:30:01 INFO missing_tensors.py L236: Found 15 tensor(s) in the source checkpoint that are absent from the saved output (e.g., MTP parameters): mtp.fc, mtp.layers.0.input_layernorm, mtp.layers.0.mlp.down_proj, mtp.layers.0.mlp.gate_proj, mtp.layers.0.mlp.up_proj, mtp.layers.0.post_attention_layernorm, mtp.layers.0.self_attn.k_norm, mtp.layers.0.self_attn.k_proj, mtp.layers.0.self_attn.o_proj, mtp.layers.0.self_attn.q_norm, mtp.layers.0.self_attn.q_proj, mtp.layers.0.self_attn.v_proj, mtp.norm, mtp.pre_fc_norm_embedding, mtp.pre_fc_norm_hidden. Copying them now...
213
+
214
+ Loading missing tensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 447.89shard/s]
215
+ 2026-04-24 07:30:01 INFO missing_tensors.py L644: Processing config.json to update quantization_config for missing tensors...
216
+ 2026-04-24 07:30:01 INFO missing_tensors.py L614: Updated extra_config for 8 ignored layer(s): mtp.fc, mtp.layers.0.mlp.down_proj, mtp.layers.0.mlp.gate_proj, mtp.layers.0.mlp.up_proj, mtp.layers.0.self_attn.k_proj, mtp.layers.0.self_attn.o_proj, mtp.layers.0.self_attn.q_proj, mtp.layers.0.self_attn.v_proj
217
+ 2026-04-24 07:30:02 INFO missing_tensors.py L370: Successfully wrote 15 missing tensor(s) to 'model_extra_tensors.safetensors' in ./Qwen3.6-27B-INT8-autoround.
218
+ 2026-04-24 07:30:02 INFO device.py L1692: 'peak_ram': 19.06GB, 'peak_vram': {'0': 17.3GB, '1': 4.05GB}