Ashira Pitchayapakayakul commited on
Commit
dddf626
Β·
1 Parent(s): 1e1e228

feat(v7): Spectrum-lite + Magpie + active-learning + full swap-and-bench chain

Browse files

kaggle-trainer.sh V7 adds three SFT-feasible techniques on top of V6:
β€’ Spectrum-lite freezing β€” LoRA only on top 70% transformer layers via
layers_to_transform=[n-N..n], skips bottom 30% (proxy for SNR-based
Spectrum, saves activation memory + sometimes lifts quality).
β€’ Magpie self-instruct merge β€” pulls up to MAGPIE_TAKE=10000 pairs from
axentx/surrogate-1-synth-magpie and appends to training rows. Skips
cleanly with a printed warning if the repo isn't published yet.
β€’ Active-learning teachable filter β€” scores up to AL_SAMPLE_CAP=20000
rows with 4-bit base-model perplexity AFTER model load (correct flow
order: Magpie merge β†’ Dataset.from_list β†’ tokenizer β†’ model load β†’
AL filter β†’ LoRA wrap), keeps middle 50% by perplexity. DISABLE_AL=1
to skip; auto-skipped if dataset <5K rows.

Knobs: SPECTRUM_TOP_FRACTION (default 0.70), MAGPIE_TAKE (10000),
DISABLE_AL (0/1), AL_SAMPLE_CAP (20000). Defaults are conservative to
fit the existing T4Γ—2 ~8 hr Kaggle budget.

swap-zerogpu-lora.sh (new) β€” swaps LORA_REPO env on the two PRO ZeroGPU
Spaces (ashirato + surrogate1) via /api/spaces/{repo}/variables, then
factory_reboot. Default mode = A/B split: ashirato keeps OLD_LORA so the
3-way bench has both v1 and v1.1-extended endpoints live at the same
time. SWAP_BOTH=1 or ONLY=ashirato/surrogate1 for other modes.

auto-swap-and-bench.sh (new) β€” supersedes auto-bench-watcher.sh with the
full post-training chain: poll HF Hub β†’ swap-zerogpu-lora.sh β†’ wait for
Space stage=RUNNING (≀12 min) β†’ smoke-test endpoint β†’ bench-v1-vs-v15.sh
β†’ post-bench-decide.sh. Idempotent via marker file. Now running as
background daemon β€” old watcher killed.

bin/kaggle-trainer.sh CHANGED
@@ -319,7 +319,30 @@ while iterators and len(rows) < MAX_SAMPLES:
319
  print(f" β†’ kept {len(rows):,} samples (target {MAX_SAMPLES:,}, "
320
  f"seen={n_seen:,}, drop={n_drop:,})")
321
  print(f" per-source counts: {n_per_source}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
322
  raw = Dataset.from_list(rows)
 
 
 
323
 
324
  # ── Tokenizer ───────────────────────────────────────────────────────────────
325
  tok = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
@@ -345,42 +368,102 @@ model = prepare_model_for_kbit_training(
345
  gradient_checkpointing_kwargs={"use_reentrant": False},
346
  )
347
 
348
- # ── R1+R2 + EXTENDED LoRA stack ─────────────────────────────────────────────
349
- # v1.1-extended techniques (vs v1's minimal LoRA r=16):
350
- # r=64 (was 32 in V5) β€” more capacity, fits 7B easily
351
- # alpha=128 (2Γ— r convention)
352
- # DoRA decomposition β€” Round 2
353
- # RSLoRA (rank-stabilized) β€” fixes magnitude scaling issue at r>8
354
- # LoftQ initialization β€” start LoRA near optimal manifold (vs gaussian)
355
- # +1-2% over default init in published runs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
356
  LORA_R = int(os.environ.get("LORA_R", "64"))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
357
  lora_kwargs = dict(
358
  r=LORA_R, lora_alpha=LORA_R * 2, lora_dropout=0.05,
359
  target_modules=["q_proj","k_proj","v_proj","o_proj",
360
  "gate_proj","up_proj","down_proj"],
 
361
  use_dora=True, # R2: DoRA
362
  task_type="CAUSAL_LM",
363
  )
364
- # RSLoRA + LoftQ require recent peft versions β€” fall back gracefully if
365
- # the installed peft is older than 0.13.
366
  try:
367
  from peft import LoraConfig as _Probe
368
  import inspect
369
  _sig = inspect.signature(_Probe).parameters
370
  if "use_rslora" in _sig: lora_kwargs["use_rslora"] = True
371
  if "init_lora_weights" in _sig:
372
- # LoftQ requires loftq_config; only enable when peft + bnb support it
373
  try:
374
  from peft import LoftQConfig
375
  lora_kwargs["init_lora_weights"] = "loftq"
376
  lora_kwargs["loftq_config"] = LoftQConfig(loftq_bits=4, loftq_iter=5)
377
  except Exception:
378
- pass # peft too old β€” keep gaussian init
379
  except Exception:
380
  pass
381
  print(f" LoRA config: r={LORA_R}, DoRA={lora_kwargs.get('use_dora')}, "
382
  f"RSLoRA={lora_kwargs.get('use_rslora', False)}, "
383
- f"init={lora_kwargs.get('init_lora_weights', 'gaussian')}")
 
384
 
385
  lora = LoraConfig(**lora_kwargs)
386
  model = get_peft_model(model, lora)
 
319
  print(f" β†’ kept {len(rows):,} samples (target {MAX_SAMPLES:,}, "
320
  f"seen={n_seen:,}, drop={n_drop:,})")
321
  print(f" per-source counts: {n_per_source}")
322
+
323
+ # ── EXTENDED++ V7: Magpie self-instruct pair inclusion ──────────────────────
324
+ # Mix in synth_batch outputs from ZeroGPU pipeline if a public Magpie repo
325
+ # exists. ~84K pairs/mo are produced by synth-puller cron + dual ZeroGPU
326
+ # endpoints. These are higher-quality than raw harvest (model self-curated).
327
+ try:
328
+ magpie_ds = load_dataset("axentx/surrogate-1-synth-magpie",
329
+ split="train", streaming=True)
330
+ n_magpie = 0
331
+ for ex in magpie_ds:
332
+ if n_magpie >= int(os.environ.get("MAGPIE_TAKE", "10000")): break
333
+ pair = extract_pair(ex)
334
+ if pair:
335
+ p, r = pair
336
+ rows.append({"prompt": p, "response": r})
337
+ n_magpie += 1
338
+ print(f" + Magpie pairs merged: {n_magpie:,}")
339
+ except Exception as e:
340
+ print(f" βœ— Magpie skip (repo not yet published): {type(e).__name__}: {str(e)[:80]}")
341
+
342
  raw = Dataset.from_list(rows)
343
+ # (Active-learning teachable filter applied AFTER model load β€” see below.
344
+ # Filtering needs the 4-bit base model to score perplexity, which doesn't
345
+ # exist until BitsAndBytesConfig + AutoModelForCausalLM run further down.)
346
 
347
  # ── Tokenizer ───────────────────────────────────────────────────────────────
348
  tok = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
 
368
  gradient_checkpointing_kwargs={"use_reentrant": False},
369
  )
370
 
371
+ # ── EXTENDED++ V7: Active-learning teachable filter ─────────────────────────
372
+ # Score sampled rows with 4-bit base-model perplexity, keep middle 50%
373
+ # ("teachable zone" β€” too easy = no signal, too hard = noise). Inspired by
374
+ # R7 teachable-prompt-filter (30-70% baseline accuracy band).
375
+ #
376
+ # Cost: 1 fwd pass per scored sample, ~30-60 ms each on T4 7B 4-bit.
377
+ # AL_SAMPLE_CAP=20000 β†’ ~10-20 min budget. Skip with DISABLE_AL=1 or if
378
+ # raw is below the floor (5000 rows β€” not enough signal to bother).
379
+ DISABLE_AL = os.environ.get("DISABLE_AL", "0") == "1"
380
+ AL_SAMPLE_CAP = int(os.environ.get("AL_SAMPLE_CAP", "20000"))
381
+
382
+ if DISABLE_AL or len(raw) < 5000:
383
+ print(f" AL filter SKIPPED ({'flag' if DISABLE_AL else 'small dataset'})")
384
+ else:
385
+ import math, random
386
+ print(f" AL: scoring up to {min(len(raw), AL_SAMPLE_CAP):,} of {len(raw):,} rows...")
387
+ if len(raw) > AL_SAMPLE_CAP:
388
+ score_idx = sorted(random.sample(range(len(raw)), AL_SAMPLE_CAP))
389
+ else:
390
+ score_idx = list(range(len(raw)))
391
+
392
+ model.eval()
393
+ scored = []
394
+ for n, i in enumerate(score_idx):
395
+ ex = raw[i]
396
+ text = (ex["prompt"][:500] + " " + ex["response"][:500])
397
+ try:
398
+ inp = tok(text, return_tensors="pt", truncation=True,
399
+ max_length=512).to(model.device)
400
+ with torch.no_grad():
401
+ out = model(**inp, labels=inp["input_ids"])
402
+ loss_val = out.loss.item()
403
+ ppl = math.exp(loss_val) if loss_val < 100 else 1e9
404
+ except Exception:
405
+ ppl = 1e9
406
+ scored.append((ppl, i))
407
+ if (n + 1) % 1000 == 0:
408
+ print(f" AL scored {n+1:,}/{len(score_idx):,}")
409
+
410
+ scored.sort()
411
+ lo, hi = len(scored) // 4, len(scored) * 3 // 4
412
+ keep_scored = {i for _, i in scored[lo:hi]}
413
+ scored_set = {i for _, i in scored}
414
+ # Keep: (a) the middle-band of scored rows; (b) all unscored rows (they
415
+ # were never sampled, so we can't reject them β€” assume neutral).
416
+ keep_mask = [(i in keep_scored) or (i not in scored_set) for i in range(len(raw))]
417
+ raw = raw.select([i for i, k in enumerate(keep_mask) if k])
418
+ print(f" AL filter: kept {len(raw):,} teachable rows")
419
+
420
+ # ── R1+R2 + EXTENDED++ LoRA stack ───────────────────────────────────────────
421
+ # v1.1-extended++ V7 additions over V6:
422
+ # βœ“ Spectrum freezing LoRA only on top 70% layers (skip bottom 30%)
423
+ # β€” proxy for SNR-based Spectrum (Hayou et al.)
424
+ # β€” saves memory + sometimes quality lift
425
  LORA_R = int(os.environ.get("LORA_R", "64"))
426
+
427
+ # Detect transformer layer count from the loaded model
428
+ try:
429
+ n_layers = model.config.num_hidden_layers
430
+ except AttributeError:
431
+ n_layers = 28 # Qwen2.5-Coder-7B default
432
+
433
+ # Spectrum-lite: keep top 70% of layers, skip bottom 30%
434
+ SPECTRUM_TOP = float(os.environ.get("SPECTRUM_TOP_FRACTION", "0.70"))
435
+ n_train_layers = int(n_layers * SPECTRUM_TOP)
436
+ layers_to_transform = list(range(n_layers - n_train_layers, n_layers))
437
+ print(f" Spectrum-lite: training top {n_train_layers}/{n_layers} layers "
438
+ f"(skip bottom {n_layers - n_train_layers})")
439
+
440
  lora_kwargs = dict(
441
  r=LORA_R, lora_alpha=LORA_R * 2, lora_dropout=0.05,
442
  target_modules=["q_proj","k_proj","v_proj","o_proj",
443
  "gate_proj","up_proj","down_proj"],
444
+ layers_to_transform=layers_to_transform, # NEW: Spectrum-lite
445
  use_dora=True, # R2: DoRA
446
  task_type="CAUSAL_LM",
447
  )
448
+ # RSLoRA + LoftQ require recent peft versions β€” fall back gracefully
 
449
  try:
450
  from peft import LoraConfig as _Probe
451
  import inspect
452
  _sig = inspect.signature(_Probe).parameters
453
  if "use_rslora" in _sig: lora_kwargs["use_rslora"] = True
454
  if "init_lora_weights" in _sig:
 
455
  try:
456
  from peft import LoftQConfig
457
  lora_kwargs["init_lora_weights"] = "loftq"
458
  lora_kwargs["loftq_config"] = LoftQConfig(loftq_bits=4, loftq_iter=5)
459
  except Exception:
460
+ pass
461
  except Exception:
462
  pass
463
  print(f" LoRA config: r={LORA_R}, DoRA={lora_kwargs.get('use_dora')}, "
464
  f"RSLoRA={lora_kwargs.get('use_rslora', False)}, "
465
+ f"init={lora_kwargs.get('init_lora_weights', 'gaussian')}, "
466
+ f"layers={n_train_layers}/{n_layers}")
467
 
468
  lora = LoraConfig(**lora_kwargs)
469
  model = get_peft_model(model, lora)
bin/v2/auto-swap-and-bench.sh ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ # Surrogate-1 β€” auto-swap-and-bench: end-to-end post-training pipeline.
3
+ #
4
+ # Watches HF Hub for v1.1-extended adapter to appear. When detected:
5
+ # 1. swap-zerogpu-lora.sh β†’ loads new LoRA into surrogate1 ZeroGPU
6
+ # Space (ashirato Space keeps OLD_LORA so the
7
+ # bench has both A/B endpoints live in parallel)
8
+ # 2. wait for stage=RUNNING β†’ poll Space runtime until container is hot
9
+ # 3. smoke-test endpoint β†’ 1 cheap completion to confirm LoRA is loaded
10
+ # 4. bench-v1-vs-v15.sh β†’ 3-way: v1 vs base7B vs v1.1-extended
11
+ # 5. post-bench-decide.sh β†’ routes to branch A/B/C automatically
12
+ #
13
+ # Replaces auto-bench-watcher.sh (which only did 1 + 4) with the full chain
14
+ # so user doesn't have to manually trigger the LoRA swap.
15
+ #
16
+ # Usage (long-lived daemon):
17
+ # nohup bash bin/v2/auto-swap-and-bench.sh \
18
+ # > $HOME/.surrogate/logs/auto-swap-and-bench.log 2>&1 &
19
+ #
20
+ # Override target / interval:
21
+ # TARGET=axentx/some-other-adapter \
22
+ # CHECK_INTERVAL_SEC=120 \
23
+ # bash auto-swap-and-bench.sh
24
+ #
25
+ # Idempotent: if marker exists from a prior firing, exits immediately.
26
+ set -uo pipefail
27
+ [[ -f "$HOME/.hermes/.env" ]] && { set -a; source "$HOME/.hermes/.env" 2>/dev/null; set +a; }
28
+
29
+ TARGET="${TARGET:-axentx/surrogate-1-7B-v1.1-extended}"
30
+ CHECK_INTERVAL_SEC="${CHECK_INTERVAL_SEC:-300}" # 5 min
31
+ MAX_HOURS="${MAX_HOURS:-24}"
32
+ SWAP_SPACE="${SWAP_SPACE:-surrogate1/surrogate-1-zero-gpu}"
33
+ SWAP_TOKEN_VAR="${SWAP_TOKEN_VAR:-HF_TOKEN_PRO}" # token env var name
34
+ SPACE_BUILD_WAIT_MIN="${SPACE_BUILD_WAIT_MIN:-12}" # max minutes to wait for RUNNING
35
+ HFB="$HOME/.surrogate/hf-space/bin/v2"
36
+ MARKER="$HOME/.surrogate/state/auto-swap-and-bench.${TARGET//\//_}"
37
+ LOG="$HOME/.surrogate/logs/auto-swap-and-bench.log"
38
+ mkdir -p "$(dirname "$MARKER")" "$(dirname "$LOG")"
39
+
40
+ log() { echo "[$(date '+%Y-%m-%dT%H:%M:%S')] $*" | tee -a "$LOG"; }
41
+ notify() {
42
+ [[ -z "${DISCORD_WEBHOOK:-}" ]] && return
43
+ curl -s -X POST -H "Content-Type: application/json" \
44
+ -d "{\"content\":\"πŸ”„ auto-swap-and-bench: $1\"}" \
45
+ "$DISCORD_WEBHOOK" >/dev/null 2>&1 || true
46
+ }
47
+
48
+ if [[ -f "$MARKER" ]]; then
49
+ log "marker exists ($MARKER) β€” pipeline already fired, exiting"
50
+ exit 0
51
+ fi
52
+
53
+ log "═══ auto-swap-and-bench starting ═══"
54
+ log "target: $TARGET"
55
+ log "swap space: $SWAP_SPACE"
56
+ log "interval: ${CHECK_INTERVAL_SEC}s, max ${MAX_HOURS}h"
57
+ notify "watching $TARGET β†’ will swap $SWAP_SPACE then bench"
58
+
59
+ START=$(date +%s)
60
+ DEADLINE=$(( START + MAX_HOURS * 3600 ))
61
+ n_polls=0
62
+ HF_AUTH="${HF_TOKEN:-}"
63
+
64
+ # ── Step 1: poll for adapter ───────────────────────────────────────────────
65
+ while [[ $(date +%s) -lt $DEADLINE ]]; do
66
+ n_polls=$((n_polls + 1))
67
+ api=$(curl -fsS --max-time 20 \
68
+ ${HF_AUTH:+-H "Authorization: Bearer $HF_AUTH"} \
69
+ "https://huggingface.co/api/models/${TARGET}" 2>/dev/null || echo "")
70
+ has_adapter=0
71
+ if [[ -n "$api" ]]; then
72
+ has_adapter=$(echo "$api" | python3 -c "
73
+ import json, sys
74
+ try: d = json.load(sys.stdin)
75
+ except Exception: print(0); sys.exit(0)
76
+ sib = [s.get('rfilename','') for s in d.get('siblings', [])]
77
+ print(1 if any('adapter' in s for s in sib) else 0)
78
+ " 2>/dev/null | head -1 | tr -d ' \n')
79
+ has_adapter=${has_adapter:-0}
80
+ fi
81
+
82
+ if [[ "$has_adapter" == "1" ]]; then
83
+ log "βœ“ adapter detected on $TARGET after $n_polls polls"
84
+ break
85
+ fi
86
+
87
+ if (( n_polls % 12 == 0 )); then
88
+ elapsed_min=$(( ($(date +%s) - START) / 60 ))
89
+ log "poll $n_polls: no adapter yet (elapsed ${elapsed_min}m)"
90
+ fi
91
+ sleep "$CHECK_INTERVAL_SEC"
92
+ done
93
+
94
+ if [[ "${has_adapter:-0}" != "1" ]]; then
95
+ log "deadline reached without adapter β€” exiting"
96
+ notify "deadline ${MAX_HOURS}h hit, no adapter detected"
97
+ exit 1
98
+ fi
99
+
100
+ touch "$MARKER"
101
+ notify "checkpoint detected β†’ starting swap + bench chain"
102
+
103
+ # ── Step 2: swap LoRA on ZeroGPU ────────────────────────────────────────────
104
+ log ""
105
+ log "── Step 2: swap-zerogpu-lora.sh $TARGET ──"
106
+ bash "$HFB/swap-zerogpu-lora.sh" "$TARGET" 2>&1 | tee -a "$LOG"
107
+
108
+ # ── Step 3: wait for Space stage=RUNNING ────────────────────────────────────
109
+ log ""
110
+ log "── Step 3: waiting up to ${SPACE_BUILD_WAIT_MIN}m for $SWAP_SPACE β†’ RUNNING ──"
111
+ build_start=$(date +%s)
112
+ build_deadline=$(( build_start + SPACE_BUILD_WAIT_MIN * 60 ))
113
+ build_ok=0
114
+ while [[ $(date +%s) -lt $build_deadline ]]; do
115
+ stage=$(curl -fsS --max-time 15 \
116
+ ${HF_AUTH:+-H "Authorization: Bearer $HF_AUTH"} \
117
+ "https://huggingface.co/api/spaces/${SWAP_SPACE}" 2>/dev/null \
118
+ | python3 -c "
119
+ import json, sys
120
+ try: d = json.load(sys.stdin)
121
+ except Exception: print('UNKNOWN'); sys.exit(0)
122
+ print(d.get('runtime', {}).get('stage', 'UNKNOWN'))
123
+ " 2>/dev/null | tr -d ' \n')
124
+ elapsed=$(( ($(date +%s) - build_start) / 60 ))
125
+ log " [${elapsed}m] stage=$stage"
126
+ if [[ "$stage" == "RUNNING" ]]; then
127
+ build_ok=1
128
+ break
129
+ fi
130
+ sleep 30
131
+ done
132
+
133
+ if [[ "$build_ok" != "1" ]]; then
134
+ log "⚠ Space did not reach RUNNING within ${SPACE_BUILD_WAIT_MIN}m β€” proceeding anyway"
135
+ notify "⚠ $SWAP_SPACE slow to rebuild, bench may fail on B endpoint"
136
+ else
137
+ log "βœ“ Space is RUNNING β€” LoRA swap complete"
138
+ notify "ZeroGPU swap complete β€” running bench"
139
+ fi
140
+
141
+ # ── Step 4: smoke-test new endpoint (best-effort) ───────────────────────────
142
+ log ""
143
+ log "── Step 4: smoke test ──"
144
+ SMOKE_URL="https://${SWAP_SPACE//\//-}.hf.space/api/predict"
145
+ smoke=$(curl -fsS --max-time 30 -X POST -H "Content-Type: application/json" \
146
+ -d '{"data":["ping","hello world",16,0.1]}' "$SMOKE_URL" 2>&1 | head -c 200 || echo "smoke_fail")
147
+ log " smoke response: $smoke"
148
+
149
+ # ── Step 5: bench ───────────────────────────────────────────────────────────
150
+ log ""
151
+ log "── Step 5: bench-v1-vs-v15.sh ──"
152
+ notify "firing bench-v1-vs-v15 (~6-8 hr)"
153
+ bash "$HFB/bench-v1-vs-v15.sh" 2>&1 | tee -a "$HOME/.surrogate/logs/bench-v1-vs-v15.log"
154
+
155
+ # ── Step 6: route via post-bench-decide ─────────────────────────────────────
156
+ log ""
157
+ log "── Step 6: post-bench-decide ──"
158
+ LATEST_SUM=$(ls -t "$HOME/.surrogate/eval/bench-v1-vs-v15-"*"/summary.json" 2>/dev/null | head -1)
159
+ if [[ -n "$LATEST_SUM" ]]; then
160
+ log " using summary: $LATEST_SUM"
161
+ bash "$HFB/post-bench-decide.sh" "$LATEST_SUM" 2>&1 \
162
+ | tee -a "$HOME/.surrogate/logs/post-bench-decide.log"
163
+ else
164
+ log " ⚠ no summary.json found β€” skipping decide step"
165
+ notify "⚠ bench finished but summary.json missing, manual decide needed"
166
+ fi
167
+
168
+ log ""
169
+ log "═══ auto-swap-and-bench done ═══"
170
+ notify "pipeline complete β€” see $HOME/.surrogate/logs/post-bench-decide.log"
bin/v2/swap-zerogpu-lora.sh ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ # Surrogate-1 β€” Swap LORA_REPO env on 2 PRO ZeroGPU Spaces + factory_reboot.
3
+ #
4
+ # Each ZeroGPU Space's app.py reads LORA_REPO from os.environ. To deploy a
5
+ # new LoRA adapter (e.g., v1.1-extended) we just need to:
6
+ # 1. PUT new value to Space's variables endpoint
7
+ # 2. factory_reboot the Space β€” new container reads the new env var
8
+ #
9
+ # Strategy: swap ONE Space to the new LoRA, keep the OTHER on the previous
10
+ # LoRA. That way bench can hit both endpoints in parallel and get per-model
11
+ # scores in the same wall-clock time.
12
+ #
13
+ # Usage:
14
+ # bash bin/v2/swap-zerogpu-lora.sh axentx/surrogate-1-7B-v1.1-extended
15
+ #
16
+ # # custom mode β€” both spaces same LoRA:
17
+ # SWAP_BOTH=1 bash bin/v2/swap-zerogpu-lora.sh axentx/...
18
+ #
19
+ # # custom β€” only surrogate1 swap, ashirato keeps v1:
20
+ # ONLY=surrogate1 bash bin/v2/swap-zerogpu-lora.sh axentx/...
21
+ set -uo pipefail
22
+ [[ -f "$HOME/.hermes/.env" ]] && { set -a; source "$HOME/.hermes/.env" 2>/dev/null; set +a; }
23
+
24
+ NEW_LORA="${1:?need LORA_REPO arg, e.g. axentx/surrogate-1-7B-v1.1-extended}"
25
+ OLD_LORA="${OLD_LORA:-axentx/surrogate-1-coder-7b-v1}"
26
+ SWAP_BOTH="${SWAP_BOTH:-0}"
27
+ ONLY="${ONLY:-}"
28
+ LOG="$HOME/.surrogate/logs/swap-zerogpu-lora.log"
29
+ mkdir -p "$(dirname "$LOG")"
30
+ log() { echo "[$(date '+%Y-%m-%dT%H:%M:%S')] $*" | tee -a "$LOG"; }
31
+ notify() {
32
+ [[ -z "${DISCORD_WEBHOOK:-}" ]] && return
33
+ curl -s -X POST -H "Content-Type: application/json" \
34
+ -d "{\"content\":\"πŸ” swap-zerogpu-lora: $1\"}" \
35
+ "$DISCORD_WEBHOOK" >/dev/null 2>&1 || true
36
+ }
37
+
38
+ # Map: account β†’ (space, hf_token)
39
+ declare -a SPACES
40
+ if [[ "$ONLY" == "ashirato" ]]; then
41
+ SPACES=("ashirato/surrogate-1-zero-gpu|$HF_TOKEN_PRO_WRITE|$NEW_LORA")
42
+ elif [[ "$ONLY" == "surrogate1" ]]; then
43
+ SPACES=("surrogate1/surrogate-1-zero-gpu|$HF_TOKEN_PRO|$NEW_LORA")
44
+ elif [[ "$SWAP_BOTH" == "1" ]]; then
45
+ SPACES=(
46
+ "ashirato/surrogate-1-zero-gpu|$HF_TOKEN_PRO_WRITE|$NEW_LORA"
47
+ "surrogate1/surrogate-1-zero-gpu|$HF_TOKEN_PRO|$NEW_LORA"
48
+ )
49
+ else
50
+ # Default: swap ONE (surrogate1), keep ashirato on $OLD_LORA so bench
51
+ # can hit both as A/B endpoints in parallel.
52
+ SPACES=(
53
+ "ashirato/surrogate-1-zero-gpu|$HF_TOKEN_PRO_WRITE|$OLD_LORA"
54
+ "surrogate1/surrogate-1-zero-gpu|$HF_TOKEN_PRO|$NEW_LORA"
55
+ )
56
+ fi
57
+
58
+ log "═══ swap-zerogpu-lora ═══"
59
+ log "new lora: $NEW_LORA"
60
+ log "old lora: $OLD_LORA"
61
+ log "mode: $([[ "$SWAP_BOTH" == "1" ]] && echo "both→new" || ([[ -n "$ONLY" ]] && echo "only=$ONLY" || echo "ashirato=old, surrogate1=new (A/B)"))"
62
+
63
+ set_var() {
64
+ local space="$1" tok="$2" key="$3" val="$4"
65
+ # Try update first, fall back to create
66
+ curl -s -X POST -H "Authorization: Bearer $tok" \
67
+ -H "Content-Type: application/json" \
68
+ -d "{\"key\":\"$key\",\"value\":\"$val\",\"description\":\"adapter swap $(date -u +%Y%m%dT%H%MZ)\"}" \
69
+ "https://huggingface.co/api/spaces/$space/variables" 2>&1 | head -c 200
70
+ echo ""
71
+ }
72
+
73
+ reboot_space() {
74
+ local space="$1" tok="$2"
75
+ curl -s -X POST -H "Authorization: Bearer $tok" \
76
+ "https://huggingface.co/api/spaces/$space/restart?factory=true" 2>&1 \
77
+ | python3 -c "import json,sys; d=json.load(sys.stdin); print(f' stage={d.get(chr(34)+chr(115)+chr(116)+chr(97)+chr(103)+chr(101)+chr(34) if False else \"stage\")}')" 2>/dev/null \
78
+ || echo " reboot triggered"
79
+ }
80
+
81
+ for entry in "${SPACES[@]}"; do
82
+ IFS='|' read -r space tok lora <<< "$entry"
83
+ log ""
84
+ log "── $space β†’ LORA_REPO=$lora ──"
85
+ log " setting env var..."
86
+ set_var "$space" "$tok" "LORA_REPO" "$lora" | tee -a "$LOG"
87
+ log " triggering factory_reboot..."
88
+ reboot_space "$space" "$tok" | tee -a "$LOG"
89
+ done
90
+
91
+ log ""
92
+ log "═══ swap done β€” Spaces rebuilding ═══"
93
+ log "ETA: ~3-5 min for build + model reload"
94
+ log "verify with: curl -s https://huggingface.co/api/spaces/<space> | jq .runtime.stage"
95
+
96
+ notify "swap fired: ${NEW_LORA##*/} on $(echo "${SPACES[*]}" | tr ' ' '\n' | wc -l | tr -d ' ') Space(s) (~3-5min ETA)"