Spaces:
Runtime error
Runtime error
Ashira Pitchayapakayakul commited on
Commit Β·
e3077e1
1
Parent(s): 9a0b1b3
v18-safe-defaults: flip SUR_LORA_INIT=loftq + DISABLE_AL=1 as defaults
Browse filesV#7 burned 9 of 9.1 wall-clock hours on AL pre-filter scoring 20K samples
at 1.6s/sample BEFORE training even began, then crashed with PiSSA + 4-bit
quantization incompatibility ('Please initialize PiSSA under fp32/fp16/bf16').
Both failure modes are now avoided WITHOUT requiring the user to remember
to set env vars:
- SUR_LORA_INIT default: pissa_niter_4 β loftq (4-bit-safe)
- DISABLE_AL default: 0 β 1 (skip 9h pre-filter)
Override only if the base loads in fp16/bf16 (e.g. SUR_LORA_INIT=pissa_niter_4
when not running 4-bit), or once a stable adapter exists and we want the AL
teachable-zone filter back on for a refinement run (DISABLE_AL=0).
- bin/kaggle-trainer.sh +10 -2
bin/kaggle-trainer.sh
CHANGED
|
@@ -1026,7 +1026,11 @@ if USE_LIGER:
|
|
| 1026 |
# Cost: 1 fwd pass per scored sample, ~30-60 ms each on T4 7B 4-bit.
|
| 1027 |
# AL_SAMPLE_CAP=20000 β ~10-20 min budget. Skip with DISABLE_AL=1 or if
|
| 1028 |
# raw is below the floor (5000 rows β not enough signal to bother).
|
| 1029 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1030 |
AL_SAMPLE_CAP = int(os.environ.get("AL_SAMPLE_CAP", "20000"))
|
| 1031 |
|
| 1032 |
if DISABLE_AL or len(raw) < 5000:
|
|
@@ -1111,7 +1115,11 @@ lora_kwargs = dict(
|
|
| 1111 |
# task-aware SVD with quant-awareness in one shot. peft
|
| 1112 |
# β₯0.13 with CordaConfig. Falls back to pissa if missing.
|
| 1113 |
# gaussian β Kaiming default (ablation baseline)
|
| 1114 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1115 |
try:
|
| 1116 |
from peft import LoraConfig as _Probe
|
| 1117 |
import inspect
|
|
|
|
| 1026 |
# Cost: 1 fwd pass per scored sample, ~30-60 ms each on T4 7B 4-bit.
|
| 1027 |
# AL_SAMPLE_CAP=20000 β ~10-20 min budget. Skip with DISABLE_AL=1 or if
|
| 1028 |
# raw is below the floor (5000 rows β not enough signal to bother).
|
| 1029 |
+
# V18 default flipped to disabled β V#7 spent 9 of 9.1 wall-clock hours
|
| 1030 |
+
# scoring 20K samples at 1.6s/sample BEFORE training even started. With
|
| 1031 |
+
# Kaggle's 30h/week budget, AL filter is an unsustainable upfront cost.
|
| 1032 |
+
# Re-enable explicitly with DISABLE_AL=0 once a stable adapter exists.
|
| 1033 |
+
DISABLE_AL = os.environ.get("DISABLE_AL", "1") == "1"
|
| 1034 |
AL_SAMPLE_CAP = int(os.environ.get("AL_SAMPLE_CAP", "20000"))
|
| 1035 |
|
| 1036 |
if DISABLE_AL or len(raw) < 5000:
|
|
|
|
| 1115 |
# task-aware SVD with quant-awareness in one shot. peft
|
| 1116 |
# β₯0.13 with CordaConfig. Falls back to pissa if missing.
|
| 1117 |
# gaussian β Kaiming default (ablation baseline)
|
| 1118 |
+
# V18 default flipped to "loftq" β PiSSA + 4-bit BitsAndBytes crashed V#7 at
|
| 1119 |
+
# 9.1h with `Please initialize PiSSA under float32, float16, or bfloat16`.
|
| 1120 |
+
# LoftQ is the safe 4-bit-aware path. Override with SUR_LORA_INIT=pissa_niter_4
|
| 1121 |
+
# only if the base model is loaded in fp16/bf16 (no 4-bit quant).
|
| 1122 |
+
LORA_INIT = os.environ.get("SUR_LORA_INIT", "loftq")
|
| 1123 |
try:
|
| 1124 |
from peft import LoraConfig as _Probe
|
| 1125 |
import inspect
|