Ashira Pitchayapakayakul commited on
Commit
e3077e1
Β·
1 Parent(s): 9a0b1b3

v18-safe-defaults: flip SUR_LORA_INIT=loftq + DISABLE_AL=1 as defaults

Browse files

V#7 burned 9 of 9.1 wall-clock hours on AL pre-filter scoring 20K samples
at 1.6s/sample BEFORE training even began, then crashed with PiSSA + 4-bit
quantization incompatibility ('Please initialize PiSSA under fp32/fp16/bf16').

Both failure modes are now avoided WITHOUT requiring the user to remember
to set env vars:

- SUR_LORA_INIT default: pissa_niter_4 β†’ loftq (4-bit-safe)
- DISABLE_AL default: 0 β†’ 1 (skip 9h pre-filter)

Override only if the base loads in fp16/bf16 (e.g. SUR_LORA_INIT=pissa_niter_4
when not running 4-bit), or once a stable adapter exists and we want the AL
teachable-zone filter back on for a refinement run (DISABLE_AL=0).

Files changed (1) hide show
  1. bin/kaggle-trainer.sh +10 -2
bin/kaggle-trainer.sh CHANGED
@@ -1026,7 +1026,11 @@ if USE_LIGER:
1026
  # Cost: 1 fwd pass per scored sample, ~30-60 ms each on T4 7B 4-bit.
1027
  # AL_SAMPLE_CAP=20000 β†’ ~10-20 min budget. Skip with DISABLE_AL=1 or if
1028
  # raw is below the floor (5000 rows β€” not enough signal to bother).
1029
- DISABLE_AL = os.environ.get("DISABLE_AL", "0") == "1"
 
 
 
 
1030
  AL_SAMPLE_CAP = int(os.environ.get("AL_SAMPLE_CAP", "20000"))
1031
 
1032
  if DISABLE_AL or len(raw) < 5000:
@@ -1111,7 +1115,11 @@ lora_kwargs = dict(
1111
  # task-aware SVD with quant-awareness in one shot. peft
1112
  # β‰₯0.13 with CordaConfig. Falls back to pissa if missing.
1113
  # gaussian β€” Kaiming default (ablation baseline)
1114
- LORA_INIT = os.environ.get("SUR_LORA_INIT", "pissa_niter_4")
 
 
 
 
1115
  try:
1116
  from peft import LoraConfig as _Probe
1117
  import inspect
 
1026
  # Cost: 1 fwd pass per scored sample, ~30-60 ms each on T4 7B 4-bit.
1027
  # AL_SAMPLE_CAP=20000 β†’ ~10-20 min budget. Skip with DISABLE_AL=1 or if
1028
  # raw is below the floor (5000 rows β€” not enough signal to bother).
1029
+ # V18 default flipped to disabled β€” V#7 spent 9 of 9.1 wall-clock hours
1030
+ # scoring 20K samples at 1.6s/sample BEFORE training even started. With
1031
+ # Kaggle's 30h/week budget, AL filter is an unsustainable upfront cost.
1032
+ # Re-enable explicitly with DISABLE_AL=0 once a stable adapter exists.
1033
+ DISABLE_AL = os.environ.get("DISABLE_AL", "1") == "1"
1034
  AL_SAMPLE_CAP = int(os.environ.get("AL_SAMPLE_CAP", "20000"))
1035
 
1036
  if DISABLE_AL or len(raw) < 5000:
 
1115
  # task-aware SVD with quant-awareness in one shot. peft
1116
  # β‰₯0.13 with CordaConfig. Falls back to pissa if missing.
1117
  # gaussian β€” Kaiming default (ablation baseline)
1118
+ # V18 default flipped to "loftq" β€” PiSSA + 4-bit BitsAndBytes crashed V#7 at
1119
+ # 9.1h with `Please initialize PiSSA under float32, float16, or bfloat16`.
1120
+ # LoftQ is the safe 4-bit-aware path. Override with SUR_LORA_INIT=pissa_niter_4
1121
+ # only if the base model is loaded in fp16/bf16 (no 4-bit quant).
1122
+ LORA_INIT = os.environ.get("SUR_LORA_INIT", "loftq")
1123
  try:
1124
  from peft import LoraConfig as _Probe
1125
  import inspect