Spaces:

axentx
/

surrogate-1

Runtime error

Ashira Pitchayapakayakul commited on 8 days ago

Commit

dddf626

1 Parent(s): 1e1e228

feat(v7): Spectrum-lite + Magpie + active-learning + full swap-and-bench chain

kaggle-trainer.sh V7 adds three SFT-feasible techniques on top of V6:
• Spectrum-lite freezing — LoRA only on top 70% transformer layers via
layers_to_transform=[n-N..n], skips bottom 30% (proxy for SNR-based
Spectrum, saves activation memory + sometimes lifts quality).
• Magpie self-instruct merge — pulls up to MAGPIE_TAKE=10000 pairs from
axentx/surrogate-1-synth-magpie and appends to training rows. Skips
cleanly with a printed warning if the repo isn't published yet.
• Active-learning teachable filter — scores up to AL_SAMPLE_CAP=20000
rows with 4-bit base-model perplexity AFTER model load (correct flow
order: Magpie merge → Dataset.from_list → tokenizer → model load →
AL filter → LoRA wrap), keeps middle 50% by perplexity. DISABLE_AL=1
to skip; auto-skipped if dataset <5K rows.

Knobs: SPECTRUM_TOP_FRACTION (default 0.70), MAGPIE_TAKE (10000),
DISABLE_AL (0/1), AL_SAMPLE_CAP (20000). Defaults are conservative to
fit the existing T4×2 ~8 hr Kaggle budget.

swap-zerogpu-lora.sh (new) — swaps LORA_REPO env on the two PRO ZeroGPU
Spaces (ashirato + surrogate1) via /api/spaces/{repo}/variables, then
factory_reboot. Default mode = A/B split: ashirato keeps OLD_LORA so the
3-way bench has both v1 and v1.1-extended endpoints live at the same
time. SWAP_BOTH=1 or ONLY=ashirato/surrogate1 for other modes.

auto-swap-and-bench.sh (new) — supersedes auto-bench-watcher.sh with the
full post-training chain: poll HF Hub → swap-zerogpu-lora.sh → wait for
Space stage=RUNNING (≤12 min) → smoke-test endpoint → bench-v1-vs-v15.sh
→ post-bench-decide.sh. Idempotent via marker file. Now running as
background daemon — old watcher killed.

Files changed (3) hide show

bin/kaggle-trainer.sh +96 -13
bin/v2/auto-swap-and-bench.sh +170 -0
bin/v2/swap-zerogpu-lora.sh +96 -0

bin/kaggle-trainer.sh CHANGED Viewed

@@ -319,7 +319,30 @@ while iterators and len(rows) < MAX_SAMPLES:
 print(f"  → kept {len(rows):,} samples (target {MAX_SAMPLES:,}, "
       f"seen={n_seen:,}, drop={n_drop:,})")
 print(f"  per-source counts: {n_per_source}")
 raw = Dataset.from_list(rows)
 # ── Tokenizer ───────────────────────────────────────────────────────────────
 tok = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
@@ -345,42 +368,102 @@ model = prepare_model_for_kbit_training(
     gradient_checkpointing_kwargs={"use_reentrant": False},
 )
-# ── R1+R2 + EXTENDED LoRA stack ─────────────────────────────────────────────
-# v1.1-extended techniques (vs v1's minimal LoRA r=16):
-#   r=64 (was 32 in V5)        — more capacity, fits 7B easily
-#   alpha=128 (2× r convention)
-#   DoRA decomposition          — Round 2
-#   RSLoRA (rank-stabilized)    — fixes magnitude scaling issue at r>8
-#   LoftQ initialization        — start LoRA near optimal manifold (vs gaussian)
-#                                 +1-2% over default init in published runs
 LORA_R = int(os.environ.get("LORA_R", "64"))
 lora_kwargs = dict(
     r=LORA_R, lora_alpha=LORA_R * 2, lora_dropout=0.05,
     target_modules=["q_proj","k_proj","v_proj","o_proj",
                     "gate_proj","up_proj","down_proj"],
     use_dora=True,                                    # R2: DoRA
     task_type="CAUSAL_LM",
 )
-# RSLoRA + LoftQ require recent peft versions — fall back gracefully if
-# the installed peft is older than 0.13.
 try:
     from peft import LoraConfig as _Probe
     import inspect
     _sig = inspect.signature(_Probe).parameters
     if "use_rslora" in _sig: lora_kwargs["use_rslora"] = True
     if "init_lora_weights" in _sig:
-        # LoftQ requires loftq_config; only enable when peft + bnb support it
         try:
             from peft import LoftQConfig
             lora_kwargs["init_lora_weights"] = "loftq"
             lora_kwargs["loftq_config"] = LoftQConfig(loftq_bits=4, loftq_iter=5)
         except Exception:
-            pass  # peft too old — keep gaussian init
 except Exception:
     pass
 print(f"  LoRA config: r={LORA_R}, DoRA={lora_kwargs.get('use_dora')}, "
       f"RSLoRA={lora_kwargs.get('use_rslora', False)}, "
-      f"init={lora_kwargs.get('init_lora_weights', 'gaussian')}")
 lora = LoraConfig(**lora_kwargs)
 model = get_peft_model(model, lora)

 print(f"  → kept {len(rows):,} samples (target {MAX_SAMPLES:,}, "
       f"seen={n_seen:,}, drop={n_drop:,})")
 print(f"  per-source counts: {n_per_source}")
+# ── EXTENDED++ V7: Magpie self-instruct pair inclusion ──────────────────────
+# Mix in synth_batch outputs from ZeroGPU pipeline if a public Magpie repo
+# exists. ~84K pairs/mo are produced by synth-puller cron + dual ZeroGPU
+# endpoints. These are higher-quality than raw harvest (model self-curated).
+try:
+    magpie_ds = load_dataset("axentx/surrogate-1-synth-magpie",
+                             split="train", streaming=True)
+    n_magpie = 0
+    for ex in magpie_ds:
+        if n_magpie >= int(os.environ.get("MAGPIE_TAKE", "10000")): break
+        pair = extract_pair(ex)
+        if pair:
+            p, r = pair
+            rows.append({"prompt": p, "response": r})
+            n_magpie += 1
+    print(f"  + Magpie pairs merged: {n_magpie:,}")
+except Exception as e:
+    print(f"  ✗ Magpie skip (repo not yet published): {type(e).__name__}: {str(e)[:80]}")
 raw = Dataset.from_list(rows)
+# (Active-learning teachable filter applied AFTER model load — see below.
+# Filtering needs the 4-bit base model to score perplexity, which doesn't
+# exist until BitsAndBytesConfig + AutoModelForCausalLM run further down.)
 # ── Tokenizer ───────────────────────────────────────────────────────────────
 tok = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
     gradient_checkpointing_kwargs={"use_reentrant": False},
 )
+# ── EXTENDED++ V7: Active-learning teachable filter ─────────────────────────
+# Score sampled rows with 4-bit base-model perplexity, keep middle 50%
+# ("teachable zone" — too easy = no signal, too hard = noise). Inspired by
+# R7 teachable-prompt-filter (30-70% baseline accuracy band).
+#
+# Cost: 1 fwd pass per scored sample, ~30-60 ms each on T4 7B 4-bit.
+# AL_SAMPLE_CAP=20000 → ~10-20 min budget. Skip with DISABLE_AL=1 or if
+# raw is below the floor (5000 rows — not enough signal to bother).
+DISABLE_AL = os.environ.get("DISABLE_AL", "0") == "1"
+AL_SAMPLE_CAP = int(os.environ.get("AL_SAMPLE_CAP", "20000"))
+if DISABLE_AL or len(raw) < 5000:
+    print(f"  AL filter SKIPPED ({'flag' if DISABLE_AL else 'small dataset'})")
+else:
+    import math, random
+    print(f"  AL: scoring up to {min(len(raw), AL_SAMPLE_CAP):,} of {len(raw):,} rows...")
+    if len(raw) > AL_SAMPLE_CAP:
+        score_idx = sorted(random.sample(range(len(raw)), AL_SAMPLE_CAP))
+    else:
+        score_idx = list(range(len(raw)))
+    model.eval()
+    scored = []
+    for n, i in enumerate(score_idx):
+        ex = raw[i]
+        text = (ex["prompt"][:500] + " " + ex["response"][:500])
+        try:
+            inp = tok(text, return_tensors="pt", truncation=True,
+                      max_length=512).to(model.device)
+            with torch.no_grad():
+                out = model(**inp, labels=inp["input_ids"])
+            loss_val = out.loss.item()
+            ppl = math.exp(loss_val) if loss_val < 100 else 1e9
+        except Exception:
+            ppl = 1e9
+        scored.append((ppl, i))
+        if (n + 1) % 1000 == 0:
+            print(f"    AL scored {n+1:,}/{len(score_idx):,}")
+    scored.sort()
+    lo, hi = len(scored) // 4, len(scored) * 3 // 4
+    keep_scored = {i for _, i in scored[lo:hi]}
+    scored_set = {i for _, i in scored}
+    # Keep: (a) the middle-band of scored rows; (b) all unscored rows (they
+    # were never sampled, so we can't reject them — assume neutral).
+    keep_mask = [(i in keep_scored) or (i not in scored_set) for i in range(len(raw))]
+    raw = raw.select([i for i, k in enumerate(keep_mask) if k])
+    print(f"  AL filter: kept {len(raw):,} teachable rows")
+# ── R1+R2 + EXTENDED++ LoRA stack ───────────────────────────────────────────
+# v1.1-extended++ V7 additions over V6:
+#   ✓ Spectrum freezing      LoRA only on top 70% layers (skip bottom 30%)
+#                             — proxy for SNR-based Spectrum (Hayou et al.)
+#                             — saves memory + sometimes quality lift
 LORA_R = int(os.environ.get("LORA_R", "64"))
+# Detect transformer layer count from the loaded model
+try:
+    n_layers = model.config.num_hidden_layers
+except AttributeError:
+    n_layers = 28   # Qwen2.5-Coder-7B default
+# Spectrum-lite: keep top 70% of layers, skip bottom 30%
+SPECTRUM_TOP = float(os.environ.get("SPECTRUM_TOP_FRACTION", "0.70"))
+n_train_layers = int(n_layers * SPECTRUM_TOP)
+layers_to_transform = list(range(n_layers - n_train_layers, n_layers))
+print(f"  Spectrum-lite: training top {n_train_layers}/{n_layers} layers "
+      f"(skip bottom {n_layers - n_train_layers})")
 lora_kwargs = dict(
     r=LORA_R, lora_alpha=LORA_R * 2, lora_dropout=0.05,
     target_modules=["q_proj","k_proj","v_proj","o_proj",
                     "gate_proj","up_proj","down_proj"],
+    layers_to_transform=layers_to_transform,           # NEW: Spectrum-lite
     use_dora=True,                                    # R2: DoRA
     task_type="CAUSAL_LM",
 )
+# RSLoRA + LoftQ require recent peft versions — fall back gracefully
 try:
     from peft import LoraConfig as _Probe
     import inspect
     _sig = inspect.signature(_Probe).parameters
     if "use_rslora" in _sig: lora_kwargs["use_rslora"] = True
     if "init_lora_weights" in _sig:
         try:
             from peft import LoftQConfig
             lora_kwargs["init_lora_weights"] = "loftq"
             lora_kwargs["loftq_config"] = LoftQConfig(loftq_bits=4, loftq_iter=5)
         except Exception:
+            pass
 except Exception:
     pass
 print(f"  LoRA config: r={LORA_R}, DoRA={lora_kwargs.get('use_dora')}, "
       f"RSLoRA={lora_kwargs.get('use_rslora', False)}, "
+      f"init={lora_kwargs.get('init_lora_weights', 'gaussian')}, "
+      f"layers={n_train_layers}/{n_layers}")
 lora = LoraConfig(**lora_kwargs)
 model = get_peft_model(model, lora)

bin/v2/auto-swap-and-bench.sh ADDED Viewed

	@@ -0,0 +1,170 @@

+#!/usr/bin/env bash
+# Surrogate-1 — auto-swap-and-bench: end-to-end post-training pipeline.
+#
+# Watches HF Hub for v1.1-extended adapter to appear. When detected:
+#   1. swap-zerogpu-lora.sh   → loads new LoRA into surrogate1 ZeroGPU
+#                               Space (ashirato Space keeps OLD_LORA so the
+#                               bench has both A/B endpoints live in parallel)
+#   2. wait for stage=RUNNING → poll Space runtime until container is hot
+#   3. smoke-test endpoint    → 1 cheap completion to confirm LoRA is loaded
+#   4. bench-v1-vs-v15.sh     → 3-way: v1 vs base7B vs v1.1-extended
+#   5. post-bench-decide.sh   → routes to branch A/B/C automatically
+#
+# Replaces auto-bench-watcher.sh (which only did 1 + 4) with the full chain
+# so user doesn't have to manually trigger the LoRA swap.
+#
+# Usage (long-lived daemon):
+#   nohup bash bin/v2/auto-swap-and-bench.sh \
+#       > $HOME/.surrogate/logs/auto-swap-and-bench.log 2>&1 &
+#
+# Override target / interval:
+#   TARGET=axentx/some-other-adapter \
+#   CHECK_INTERVAL_SEC=120 \
+#   bash auto-swap-and-bench.sh
+#
+# Idempotent: if marker exists from a prior firing, exits immediately.
+set -uo pipefail
+[[ -f "$HOME/.hermes/.env" ]] && { set -a; source "$HOME/.hermes/.env" 2>/dev/null; set +a; }
+TARGET="${TARGET:-axentx/surrogate-1-7B-v1.1-extended}"
+CHECK_INTERVAL_SEC="${CHECK_INTERVAL_SEC:-300}"     # 5 min
+MAX_HOURS="${MAX_HOURS:-24}"
+SWAP_SPACE="${SWAP_SPACE:-surrogate1/surrogate-1-zero-gpu}"
+SWAP_TOKEN_VAR="${SWAP_TOKEN_VAR:-HF_TOKEN_PRO}"     # token env var name
+SPACE_BUILD_WAIT_MIN="${SPACE_BUILD_WAIT_MIN:-12}"   # max minutes to wait for RUNNING
+HFB="$HOME/.surrogate/hf-space/bin/v2"
+MARKER="$HOME/.surrogate/state/auto-swap-and-bench.${TARGET//\//_}"
+LOG="$HOME/.surrogate/logs/auto-swap-and-bench.log"
+mkdir -p "$(dirname "$MARKER")" "$(dirname "$LOG")"
+log() { echo "[$(date '+%Y-%m-%dT%H:%M:%S')] $*" | tee -a "$LOG"; }
+notify() {
+    [[ -z "${DISCORD_WEBHOOK:-}" ]] && return
+    curl -s -X POST -H "Content-Type: application/json" \
+        -d "{\"content\":\"🔄 auto-swap-and-bench: $1\"}" \
+        "$DISCORD_WEBHOOK" >/dev/null 2>&1 || true
+}
+if [[ -f "$MARKER" ]]; then
+    log "marker exists ($MARKER) — pipeline already fired, exiting"
+    exit 0
+fi
+log "═══ auto-swap-and-bench starting ═══"
+log "target:     $TARGET"
+log "swap space: $SWAP_SPACE"
+log "interval:   ${CHECK_INTERVAL_SEC}s, max ${MAX_HOURS}h"
+notify "watching $TARGET → will swap $SWAP_SPACE then bench"
+START=$(date +%s)
+DEADLINE=$(( START + MAX_HOURS * 3600 ))
+n_polls=0
+HF_AUTH="${HF_TOKEN:-}"
+# ── Step 1: poll for adapter ───────────────────────────────────────────────
+while [[ $(date +%s) -lt $DEADLINE ]]; do
+    n_polls=$((n_polls + 1))
+    api=$(curl -fsS --max-time 20 \
+        ${HF_AUTH:+-H "Authorization: Bearer $HF_AUTH"} \
+        "https://huggingface.co/api/models/${TARGET}" 2>/dev/null || echo "")
+    has_adapter=0
+    if [[ -n "$api" ]]; then
+        has_adapter=$(echo "$api" | python3 -c "
+import json, sys
+try: d = json.load(sys.stdin)
+except Exception: print(0); sys.exit(0)
+sib = [s.get('rfilename','') for s in d.get('siblings', [])]
+print(1 if any('adapter' in s for s in sib) else 0)
+" 2>/dev/null | head -1 | tr -d ' \n')
+        has_adapter=${has_adapter:-0}
+    fi
+    if [[ "$has_adapter" == "1" ]]; then
+        log "✓ adapter detected on $TARGET after $n_polls polls"
+        break
+    fi
+    if (( n_polls % 12 == 0 )); then
+        elapsed_min=$(( ($(date +%s) - START) / 60 ))
+        log "poll $n_polls: no adapter yet (elapsed ${elapsed_min}m)"
+    fi
+    sleep "$CHECK_INTERVAL_SEC"
+done
+if [[ "${has_adapter:-0}" != "1" ]]; then
+    log "deadline reached without adapter — exiting"
+    notify "deadline ${MAX_HOURS}h hit, no adapter detected"
+    exit 1
+fi
+touch "$MARKER"
+notify "checkpoint detected → starting swap + bench chain"
+# ── Step 2: swap LoRA on ZeroGPU ────────────────────────────────────────────
+log ""
+log "── Step 2: swap-zerogpu-lora.sh $TARGET ──"
+bash "$HFB/swap-zerogpu-lora.sh" "$TARGET" 2>&1 | tee -a "$LOG"
+# ── Step 3: wait for Space stage=RUNNING ────────────────────────────────────
+log ""
+log "── Step 3: waiting up to ${SPACE_BUILD_WAIT_MIN}m for $SWAP_SPACE → RUNNING ──"
+build_start=$(date +%s)
+build_deadline=$(( build_start + SPACE_BUILD_WAIT_MIN * 60 ))
+build_ok=0
+while [[ $(date +%s) -lt $build_deadline ]]; do
+    stage=$(curl -fsS --max-time 15 \
+        ${HF_AUTH:+-H "Authorization: Bearer $HF_AUTH"} \
+        "https://huggingface.co/api/spaces/${SWAP_SPACE}" 2>/dev/null \
+        | python3 -c "
+import json, sys
+try: d = json.load(sys.stdin)
+except Exception: print('UNKNOWN'); sys.exit(0)
+print(d.get('runtime', {}).get('stage', 'UNKNOWN'))
+" 2>/dev/null | tr -d ' \n')
+    elapsed=$(( ($(date +%s) - build_start) / 60 ))
+    log "  [${elapsed}m] stage=$stage"
+    if [[ "$stage" == "RUNNING" ]]; then
+        build_ok=1
+        break
+    fi
+    sleep 30
+done
+if [[ "$build_ok" != "1" ]]; then
+    log "⚠ Space did not reach RUNNING within ${SPACE_BUILD_WAIT_MIN}m — proceeding anyway"
+    notify "⚠ $SWAP_SPACE slow to rebuild, bench may fail on B endpoint"
+else
+    log "✓ Space is RUNNING — LoRA swap complete"
+    notify "ZeroGPU swap complete — running bench"
+fi
+# ── Step 4: smoke-test new endpoint (best-effort) ───────────────────────────
+log ""
+log "── Step 4: smoke test ──"
+SMOKE_URL="https://${SWAP_SPACE//\//-}.hf.space/api/predict"
+smoke=$(curl -fsS --max-time 30 -X POST -H "Content-Type: application/json" \
+    -d '{"data":["ping","hello world",16,0.1]}' "$SMOKE_URL" 2>&1 | head -c 200 || echo "smoke_fail")
+log "  smoke response: $smoke"
+# ── Step 5: bench ───────────────────────────────────────────────────────────
+log ""
+log "── Step 5: bench-v1-vs-v15.sh ──"
+notify "firing bench-v1-vs-v15 (~6-8 hr)"
+bash "$HFB/bench-v1-vs-v15.sh" 2>&1 | tee -a "$HOME/.surrogate/logs/bench-v1-vs-v15.log"
+# ── Step 6: route via post-bench-decide ─────────────────────────────────────
+log ""
+log "── Step 6: post-bench-decide ──"
+LATEST_SUM=$(ls -t "$HOME/.surrogate/eval/bench-v1-vs-v15-"*"/summary.json" 2>/dev/null | head -1)
+if [[ -n "$LATEST_SUM" ]]; then
+    log "  using summary: $LATEST_SUM"
+    bash "$HFB/post-bench-decide.sh" "$LATEST_SUM" 2>&1 \
+        | tee -a "$HOME/.surrogate/logs/post-bench-decide.log"
+else
+    log "  ⚠ no summary.json found — skipping decide step"
+    notify "⚠ bench finished but summary.json missing, manual decide needed"
+fi
+log ""
+log "═══ auto-swap-and-bench done ═══"
+notify "pipeline complete — see $HOME/.surrogate/logs/post-bench-decide.log"

bin/v2/swap-zerogpu-lora.sh ADDED Viewed

	@@ -0,0 +1,96 @@

+#!/usr/bin/env bash
+# Surrogate-1 — Swap LORA_REPO env on 2 PRO ZeroGPU Spaces + factory_reboot.
+#
+# Each ZeroGPU Space's app.py reads LORA_REPO from os.environ. To deploy a
+# new LoRA adapter (e.g., v1.1-extended) we just need to:
+#   1. PUT new value to Space's variables endpoint
+#   2. factory_reboot the Space — new container reads the new env var
+#
+# Strategy: swap ONE Space to the new LoRA, keep the OTHER on the previous
+# LoRA. That way bench can hit both endpoints in parallel and get per-model
+# scores in the same wall-clock time.
+#
+# Usage:
+#   bash bin/v2/swap-zerogpu-lora.sh axentx/surrogate-1-7B-v1.1-extended
+#
+#   # custom mode — both spaces same LoRA:
+#   SWAP_BOTH=1 bash bin/v2/swap-zerogpu-lora.sh axentx/...
+#
+#   # custom — only surrogate1 swap, ashirato keeps v1:
+#   ONLY=surrogate1 bash bin/v2/swap-zerogpu-lora.sh axentx/...
+set -uo pipefail
+[[ -f "$HOME/.hermes/.env" ]] && { set -a; source "$HOME/.hermes/.env" 2>/dev/null; set +a; }
+NEW_LORA="${1:?need LORA_REPO arg, e.g. axentx/surrogate-1-7B-v1.1-extended}"
+OLD_LORA="${OLD_LORA:-axentx/surrogate-1-coder-7b-v1}"
+SWAP_BOTH="${SWAP_BOTH:-0}"
+ONLY="${ONLY:-}"
+LOG="$HOME/.surrogate/logs/swap-zerogpu-lora.log"
+mkdir -p "$(dirname "$LOG")"
+log() { echo "[$(date '+%Y-%m-%dT%H:%M:%S')] $*" | tee -a "$LOG"; }
+notify() {
+    [[ -z "${DISCORD_WEBHOOK:-}" ]] && return
+    curl -s -X POST -H "Content-Type: application/json" \
+        -d "{\"content\":\"🔁 swap-zerogpu-lora: $1\"}" \
+        "$DISCORD_WEBHOOK" >/dev/null 2>&1 || true
+}
+# Map: account → (space, hf_token)
+declare -a SPACES
+if [[ "$ONLY" == "ashirato" ]]; then
+    SPACES=("ashirato/surrogate-1-zero-gpu|$HF_TOKEN_PRO_WRITE|$NEW_LORA")
+elif [[ "$ONLY" == "surrogate1" ]]; then
+    SPACES=("surrogate1/surrogate-1-zero-gpu|$HF_TOKEN_PRO|$NEW_LORA")
+elif [[ "$SWAP_BOTH" == "1" ]]; then
+    SPACES=(
+        "ashirato/surrogate-1-zero-gpu|$HF_TOKEN_PRO_WRITE|$NEW_LORA"
+        "surrogate1/surrogate-1-zero-gpu|$HF_TOKEN_PRO|$NEW_LORA"
+    )
+else
+    # Default: swap ONE (surrogate1), keep ashirato on $OLD_LORA so bench
+    # can hit both as A/B endpoints in parallel.
+    SPACES=(
+        "ashirato/surrogate-1-zero-gpu|$HF_TOKEN_PRO_WRITE|$OLD_LORA"
+        "surrogate1/surrogate-1-zero-gpu|$HF_TOKEN_PRO|$NEW_LORA"
+    )
+fi
+log "═══ swap-zerogpu-lora ═══"
+log "new lora: $NEW_LORA"
+log "old lora: $OLD_LORA"
+log "mode: $([[ "$SWAP_BOTH" == "1" ]] && echo "both→new" || ([[ -n "$ONLY" ]] && echo "only=$ONLY" || echo "ashirato=old, surrogate1=new (A/B)"))"
+set_var() {
+    local space="$1" tok="$2" key="$3" val="$4"
+    # Try update first, fall back to create
+    curl -s -X POST -H "Authorization: Bearer $tok" \
+        -H "Content-Type: application/json" \
+        -d "{\"key\":\"$key\",\"value\":\"$val\",\"description\":\"adapter swap $(date -u +%Y%m%dT%H%MZ)\"}" \
+        "https://huggingface.co/api/spaces/$space/variables" 2>&1 | head -c 200
+    echo ""
+}
+reboot_space() {
+    local space="$1" tok="$2"
+    curl -s -X POST -H "Authorization: Bearer $tok" \
+        "https://huggingface.co/api/spaces/$space/restart?factory=true" 2>&1 \
+        | python3 -c "import json,sys; d=json.load(sys.stdin); print(f'  stage={d.get(chr(34)+chr(115)+chr(116)+chr(97)+chr(103)+chr(101)+chr(34) if False else \"stage\")}')"  2>/dev/null \
+        || echo "  reboot triggered"
+}
+for entry in "${SPACES[@]}"; do
+    IFS='|' read -r space tok lora <<< "$entry"
+    log ""
+    log "── $space → LORA_REPO=$lora ──"
+    log "  setting env var..."
+    set_var "$space" "$tok" "LORA_REPO" "$lora" | tee -a "$LOG"
+    log "  triggering factory_reboot..."
+    reboot_space "$space" "$tok" | tee -a "$LOG"
+done
+log ""
+log "═══ swap done — Spaces rebuilding ═══"
+log "ETA: ~3-5 min for build + model reload"
+log "verify with: curl -s https://huggingface.co/api/spaces/<space> | jq .runtime.stage"
+notify "swap fired: ${NEW_LORA##*/} on $(echo "${SPACES[*]}" | tr ' ' '\n' | wc -l | tr -d ' ') Space(s) (~3-5min ETA)"