Alex W. commited on
Commit
0105df7
ยท
1 Parent(s): e1ce951

feat: add global debug switch and unified debug logging system

Browse files

- Global configuration module
- DEBUG = False (default: silent)
- Single line to toggle all debug output across entire codebase

- Unified debug output utilities
- dlog(lines, msg): appends [DEBUG] msg to UI log list (for metrics/analyze)
- dprint(msg): prints to stdout (for fetcher, no access to lines)
- Both functions are no-ops when DEBUG=False, zero performance impact

- Replaced all print() with dprint()
- Debug info covered:
- tensor_name, shape, dtype
- data_offsets (raw and absolute)
- expected_bytes vs actual_bytes check โœ…/โŒ
- first 8 bytes hex (for cross-validation with local file reader)
- result[0,:5] (first row sanity check)
- All controlled by DEBUG switch, zero output in production

- Replaced all log.append("[DEBUG]...") with dlog(lines, ...)
- Debug info covered:
- key_q / key_k / key_v (full key names)
- W_q / W_k / W_v shapes
- n_q / n_kv / group / d_head / head_dim_source
- W_k[0,:10] / W_q[0,:10] raw weights (for cross-validation)
- Per KV head: k_t shape, s_kๅ‰5, k_t[0,:10]
- Per Q head: q_t shape, s_qๅ‰5, q_t[0,:10]
- Per Q head: pearson, alpha_QK, s_q[0], s_k[0]

- Added dlog() for shard/key/offset info before tensor loading
- Debug info covered:
- q/k/v shard filename
- q/k/v full key name
- k_header_size
- k_offsets (raw data_offsets)
- k_abs_start (= 8 + header_size + offset, the actual HTTP Range start)

During cross-validation of gemma-4-31b-it against reference implementation:

Reference code result: K head0 sigma_max = 393.07 (wrong)
Our result: K head0 sigma_max = 4.40 (correct)

Root cause found via debug output:
Reference code bug in load_tensor_from_file():
f.seek(start) # โŒ offset relative to data section
f.seek(8 + header_len + start) # โœ… correct absolute file offset

gemma-4-31b-it header_size โ‰ˆ 136KB
โ†’ seek error = 136KB = ~13 rows of BF16 data
โ†’ KV head0 first row completely wrong
โ†’ sigma_max inflated from 4.40 to 393.07

Smaller models (gemma-4-e2b, Qwen2.5, LLaMA-3) not affected because
their early tensor offsets start near 0, masking the seek error.

Our HTTP Range Request implementation was correct throughout:
abs_start = 8 + header_size + offsets[0] โœ…

# Enable debug (cross-validation, new model investigation):
# core/config.py
DEBUG = True

# Disable debug (production):
DEBUG = False

One line change, all debug output across fetcher/metrics/analyze
synchronized instantly.

Files changed (5) hide show
  1. core/config.py +21 -0
  2. core/debug.py +26 -0
  3. core/fetcher.py +5 -3
  4. core/metrics.py +28 -23
  5. ui/tab_analyze.py +11 -0
core/config.py ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # core/config.py
2
+ """
3
+ ๅ…จๅฑ€้…็ฝฎๅผ€ๅ…ณ
4
+ """
5
+
6
+ # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
7
+ # Debug ๅผ€ๅ…ณ
8
+ # True โ†’ ๆ‰“ๅฐ่ฏฆ็ป†่ฐƒ่ฏ•ไฟกๆฏๅˆฐๆ—ฅๅฟ—
9
+ # False โ†’ ้™้ป˜่ฟ่กŒ๏ผŒๅช่พ“ๅ‡บ็ป“ๆžœ
10
+ # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
11
+ DEBUG = False# core/config.py
12
+ """
13
+ ๅ…จๅฑ€้…็ฝฎๅผ€ๅ…ณ
14
+ """
15
+
16
+ # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
17
+ # Debug ๅผ€ๅ…ณ
18
+ # True โ†’ ๆ‰“ๅฐ่ฏฆ็ป†่ฐƒ่ฏ•ไฟกๆฏๅˆฐๆ—ฅๅฟ—
19
+ # False โ†’ ้™้ป˜่ฟ่กŒ๏ผŒๅช่พ“ๅ‡บ็ป“ๆžœ
20
+ # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
21
+ DEBUG = False
core/debug.py ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # core/debug.py
2
+ """
3
+ ่ฐƒ่ฏ•่พ“ๅ‡บๅทฅๅ…ท
4
+ ๆ‰€ๆœ‰่ฐƒ่ฏ•ไฟกๆฏ็ปŸไธ€่ตฐ่ฟ™้‡Œ๏ผŒๅ— DEBUG ๅผ€ๅ…ณๆŽงๅˆถ
5
+ """
6
+
7
+ from core.config import DEBUG
8
+
9
+
10
+ def dlog(lines: list[str], msg: str):
11
+ """
12
+ ๅ‘ lines ่ฟฝๅŠ ่ฐƒ่ฏ•ไฟกๆฏ๏ผˆไป… DEBUG=True ๆ—ถ๏ผ‰
13
+ lines: ๆ—ฅๅฟ—่กŒๅˆ—่กจ๏ผˆไผ ๅผ•็”จ๏ผŒ็›ดๆŽฅ append๏ผ‰
14
+ msg: ่ฐƒ่ฏ•ไฟกๆฏๅญ—็ฌฆไธฒ
15
+ """
16
+ if DEBUG:
17
+ lines.append(f"[DEBUG] {msg}\n")
18
+
19
+
20
+ def dprint(msg: str):
21
+ """
22
+ ๆ‰“ๅฐๅˆฐ stdout๏ผˆไป… DEBUG=True ๆ—ถ๏ผ‰
23
+ ็”จไบŽ fetcher.py ็ญ‰ๆ— ๆณ•่ฎฟ้—ฎ lines ็š„ๅœฐๆ–น
24
+ """
25
+ if DEBUG:
26
+ print(f"[DEBUG] {msg}")
core/fetcher.py CHANGED
@@ -9,6 +9,8 @@ import json
9
  import requests
10
  import torch
11
  from huggingface_hub import list_repo_files
 
 
12
 
13
  # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
14
  # dtype ๆ˜ ๅฐ„
@@ -102,7 +104,7 @@ def load_tensor_remote(
102
  expected_elems = 1
103
  for d in shape:
104
  expected_elems *= d
105
- print(
106
  f"[FETCH] {tensor_name}\n"
107
  f" shape={shape} dtype={dtype_str}\n"
108
  f" data_offsets={offsets}\n"
@@ -123,7 +125,7 @@ def load_tensor_remote(
123
 
124
  # โ”€โ”€ ่ฐƒ่ฏ•๏ผšๆ‰“ๅฐๅฎž้™…ๆ”ถๅˆฐ็š„ๅญ—่Š‚ๆ•ฐ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
125
  actual_bytes = len(r.content)
126
- print(
127
  f" actual_bytes={actual_bytes} "
128
  f"{'โœ…' if actual_bytes == expected_bytes else 'โŒ ๅญ—่Š‚ๆ•ฐไธๅŒน้…!'}\n"
129
  f" ๅ‰8ๅญ—่Š‚(hex)={r.content[:8].hex()}\n"
@@ -139,7 +141,7 @@ def load_tensor_remote(
139
  result = tensor.reshape(shape).float()
140
 
141
  # โ”€โ”€ ่ฐƒ่ฏ•๏ผšๆ‰“ๅฐ็ป“ๆžœ้ฆ–่กŒ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
142
- print(f" result[0,:5]={result[0,:5].tolist()}\n")
143
 
144
  return result
145
 
 
9
  import requests
10
  import torch
11
  from huggingface_hub import list_repo_files
12
+ from core.debug import dprint
13
+
14
 
15
  # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
16
  # dtype ๆ˜ ๅฐ„
 
104
  expected_elems = 1
105
  for d in shape:
106
  expected_elems *= d
107
+ dprint(
108
  f"[FETCH] {tensor_name}\n"
109
  f" shape={shape} dtype={dtype_str}\n"
110
  f" data_offsets={offsets}\n"
 
125
 
126
  # โ”€โ”€ ่ฐƒ่ฏ•๏ผšๆ‰“ๅฐๅฎž้™…ๆ”ถๅˆฐ็š„ๅญ—่Š‚ๆ•ฐ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
127
  actual_bytes = len(r.content)
128
+ dprint(
129
  f" actual_bytes={actual_bytes} "
130
  f"{'โœ…' if actual_bytes == expected_bytes else 'โŒ ๅญ—่Š‚ๆ•ฐไธๅŒน้…!'}\n"
131
  f" ๅ‰8ๅญ—่Š‚(hex)={r.content[:8].hex()}\n"
 
141
  result = tensor.reshape(shape).float()
142
 
143
  # โ”€โ”€ ่ฐƒ่ฏ•๏ผšๆ‰“ๅฐ็ป“ๆžœ้ฆ–่กŒ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
144
+ dprint(f" result[0,:5]={result[0,:5].tolist()}\n")
145
 
146
  return result
147
 
core/metrics.py CHANGED
@@ -3,6 +3,7 @@ import torch
3
  import numpy as np
4
  from scipy.stats import spearmanr
5
  from core.layer_profile import LayerProfile
 
6
 
7
 
8
  def pearson(a: torch.Tensor, b: torch.Tensor) -> float:
@@ -83,17 +84,16 @@ def analyze_layer(
83
  lines: list[str] = []
84
 
85
  # โ”€โ”€ ่ฐƒ่ฏ•๏ผšๆ‰“ๅฐๆ•ดไฝ“ไฟกๆฏ + ๅŽŸๅง‹ๆƒ้‡้ฆ–่กŒ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
86
- lines.append(
87
- f"\n[DEBUG] โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•\n"
88
- f"[DEBUG] key_q = {profile.q.key}\n"
89
- f"[DEBUG] key_k = {profile.k.key}\n"
90
- f"[DEBUG] key_v = {profile.v.key if profile.v else 'K=V shared'}\n"
91
- f"[DEBUG] W_q={list(W_q.shape)} W_k={list(W_k.shape)} W_v={list(W_v.shape)}\n"
92
- f"[DEBUG] n_q={n_q} n_kv={n_kv} group={group} d_head={d_head}\n"
93
- f"[DEBUG] W_k[0, :10] = {W_k[0, :10].tolist()}\n"
94
- f"[DEBUG] W_q[0, :10] = {W_q[0, :10].tolist()}\n"
95
- f"[DEBUG] โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•\n"
96
- )
97
 
98
  kv_tag = " [K=Vๅ…ฑไบซ]" if kv_shared else ""
99
  lines.append(
@@ -121,13 +121,12 @@ def analyze_layer(
121
  smxv, smnv, cond_v = sigma_stats(s_v)
122
 
123
  # โ”€โ”€ ่ฐƒ่ฏ•๏ผšKVๅคดๅˆ‡็‰‡้ฆ–่กŒๅŽŸๅง‹ๆƒ้‡ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
124
- lines.append(
125
- f"[DEBUG] KVๅคด{kv_h}: "
126
- f"k_t={list(k_t.shape)} "
127
- f"s_kๅ‰5={[round(x,4) for x in s_k[:5].tolist()]}\n"
128
- f"[DEBUG] KVๅคด{kv_h}: "
129
- f"k_t[0,:10]={k_t[0, :10].tolist()}\n"
130
  )
 
131
 
132
  # KV ๆŒ‡ๆ ‡
133
  if kv_shared:
@@ -153,13 +152,12 @@ def analyze_layer(
153
  smxq, smnq, cond_q = sigma_stats(s_q)
154
 
155
  # โ”€โ”€ ่ฐƒ่ฏ•๏ผšQๅคดๅˆ‡็‰‡้ฆ–่กŒๅŽŸๅง‹ๆƒ้‡ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
156
- lines.append(
157
- f"[DEBUG] Qๅคด{h}: "
158
- f"q_t={list(q_t.shape)} "
159
- f"s_qๅ‰5={[round(x,4) for x in s_q[:5].tolist()]}\n"
160
- f"[DEBUG] Qๅคด{h}: "
161
- f"q_t[0,:10]={q_t[0, :10].tolist()}\n"
162
  )
 
163
 
164
  nqk = min(len(s_q), len(s_k))
165
  nqv = min(len(s_q), len(s_v))
@@ -177,6 +175,13 @@ def analyze_layer(
177
  cU_QV = cos_U(U_q, U_v)
178
  cV_QV = cos_V(Vt_q, Vt_v)
179
 
 
 
 
 
 
 
 
180
  records.append({
181
  "prefix": profile.prefix,
182
  "layer": profile.layer_idx,
 
3
  import numpy as np
4
  from scipy.stats import spearmanr
5
  from core.layer_profile import LayerProfile
6
+ from core.debug import dlog
7
 
8
 
9
  def pearson(a: torch.Tensor, b: torch.Tensor) -> float:
 
84
  lines: list[str] = []
85
 
86
  # โ”€โ”€ ่ฐƒ่ฏ•๏ผšๆ‰“ๅฐๆ•ดไฝ“ไฟกๆฏ + ๅŽŸๅง‹ๆƒ้‡้ฆ–่กŒ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
87
+ # โ”€โ”€ Debug๏ผšๆ•ดไฝ“ไฟกๆฏ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
88
+ dlog(lines, f"โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•")
89
+ dlog(lines, f"key_q = {profile.q.key}")
90
+ dlog(lines, f"key_k = {profile.k.key}")
91
+ dlog(lines, f"key_v = {profile.v.key if profile.v else 'K=V shared'}")
92
+ dlog(lines, f"W_q={list(W_q.shape)} W_k={list(W_k.shape)} W_v={list(W_v.shape)}")
93
+ dlog(lines, f"n_q={n_q} n_kv={n_kv} group={group} d_head={d_head} source={profile.head_dim_source}")
94
+ dlog(lines, f"W_k[0,:10] = {W_k[0, :10].tolist()}")
95
+ dlog(lines, f"W_q[0,:10] = {W_q[0, :10].tolist()}")
96
+ dlog(lines, f"โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•")
 
97
 
98
  kv_tag = " [K=Vๅ…ฑไบซ]" if kv_shared else ""
99
  lines.append(
 
121
  smxv, smnv, cond_v = sigma_stats(s_v)
122
 
123
  # โ”€โ”€ ่ฐƒ่ฏ•๏ผšKVๅคดๅˆ‡็‰‡้ฆ–่กŒๅŽŸๅง‹ๆƒ้‡ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
124
+ # โ”€โ”€ Debug๏ผšKV ๅคด โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
125
+ dlog(lines,
126
+ f"KVๅคด{kv_h}: k_t={list(k_t.shape)} "
127
+ f"s_kๅ‰5={[round(x,4) for x in s_k[:5].tolist()]}"
 
 
128
  )
129
+ dlog(lines, f"KVๅคด{kv_h}: k_t[0,:10]={k_t[0, :10].tolist()}")
130
 
131
  # KV ๆŒ‡ๆ ‡
132
  if kv_shared:
 
152
  smxq, smnq, cond_q = sigma_stats(s_q)
153
 
154
  # โ”€โ”€ ่ฐƒ่ฏ•๏ผšQๅคดๅˆ‡็‰‡้ฆ–่กŒๅŽŸๅง‹ๆƒ้‡ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
155
+ # โ”€โ”€ Debug๏ผšQ ๅคด โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
156
+ dlog(lines,
157
+ f" Qๅคด{h}: q_t={list(q_t.shape)} "
158
+ f"s_qๅ‰5={[round(x,4) for x in s_q[:5].tolist()]}"
 
 
159
  )
160
+ dlog(lines, f" Qๅคด{h}: q_t[0,:10]={q_t[0, :10].tolist()}")
161
 
162
  nqk = min(len(s_q), len(s_k))
163
  nqv = min(len(s_q), len(s_v))
 
175
  cU_QV = cos_U(U_q, U_v)
176
  cV_QV = cos_V(Vt_q, Vt_v)
177
 
178
+ # โ”€โ”€ Debug๏ผšๅ…ณ้”ฎๆŒ‡ๆ ‡ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
179
+ dlog(lines,
180
+ f" Qๅคด{h}: pearson={pqk:+.4f} "
181
+ f"alpha_QK={a_qk:.4f} "
182
+ f"s_q[0]={s_q[0]:.4f} s_k[0]={s_k[0]:.4f}"
183
+ )
184
+
185
  records.append({
186
  "prefix": profile.prefix,
187
  "layer": profile.layer_idx,
ui/tab_analyze.py CHANGED
@@ -11,6 +11,7 @@ import gradio as gr
11
  import requests
12
  import pandas as pd
13
  import numpy as np
 
14
 
15
  from core.fetcher import (
16
  load_all_shard_headers,
@@ -175,6 +176,16 @@ def run_analysis(
175
  q_hdr, q_hs = all_headers[prof.q.shard]
176
  k_hdr, k_hs = all_headers[prof.k.shard]
177
 
 
 
 
 
 
 
 
 
 
 
178
  W_q = load_tensor_remote(q_url, prof.q.key, q_hdr, q_hs, token)
179
  W_k = load_tensor_remote(k_url, prof.k.key, k_hdr, k_hs, token)
180
 
 
11
  import requests
12
  import pandas as pd
13
  import numpy as np
14
+ from core.debug import dlog
15
 
16
  from core.fetcher import (
17
  load_all_shard_headers,
 
176
  q_hdr, q_hs = all_headers[prof.q.shard]
177
  k_hdr, k_hs = all_headers[prof.k.shard]
178
 
179
+ dlog(log,
180
+ f"Layer {idx}:\n"
181
+ f" q: {prof.q.shard} โ†’ {prof.q.key}\n"
182
+ f" k: {prof.k.shard} โ†’ {prof.k.key}\n"
183
+ f" v: {prof.v.shard + ' โ†’ ' + prof.v.key if prof.v else 'K=V shared'}\n"
184
+ f" k_header_size={k_hs}\n"
185
+ f" k_offsets={k_hdr[prof.k.key]['data_offsets']}\n"
186
+ f" k_abs_start={8 + k_hs + k_hdr[prof.k.key]['data_offsets'][0]}"
187
+ )
188
+
189
  W_q = load_tensor_remote(q_url, prof.q.key, q_hdr, q_hs, token)
190
  W_k = load_tensor_remote(k_url, prof.k.key, k_hdr, k_hs, token)
191