Spaces:

axentx
/

surrogate-1

Running

Ashira Pitchayapakayakul commited on 10 days ago

Commit

ea561c8

1 Parent(s): ecd4593

fix: unblock 5-hour stall + close self-improvement loop + MoA + synthetic data

ROOT CAUSE OF 5-HR STALL:
- 4 concurrent ollama pulls (qwen3-coder + devstral + qwen2.5 + yi-coder) ate all CPU
- auto-orchestrate-loop's resource-pause threshold (load>8) tripped 90% of the time
- Result: only 1 commit (vanguard) in 5 hours instead of expected 15+

FIXES:
1. SERIAL ollama pulls (chained, not parallel) — drops CPU spike from 4× to 1×
2. Skip devstral + yi-coder for now (re-enable on HF Pro 32GB tier)
3. Resource threshold: load 8→50, free 200MB→100MB (HF Space normal load runs 10-15)

NEW CAPABILITIES (closes self-improvement loop):

surrogate-self-ingest.sh (every 15 min):
- Build SQLite FTS5 index over training-pairs.jsonl
- Surrogate's call_agent can now query: 'similar past tasks' → inject as RAG context
- Stats: total indexed + top roles
- Cron: M%15

synthetic-data-from-rework.sh (every 30 min):
- Mode 1: scan orchestrate sessions → score by verdict (APPROVE 1.0 / REWORK 0.3 / REJECT 0.0)
→ write to synthetic-pairs.jsonl with score field (DPO-ready)
- Mode 2: distilabel-style — pick 10 top-quality recent pairs, ask Cerebras/Groq to paraphrase
→ adds variation, prevents overfit on single style
- Append synth → main training-pairs stream → flows to HF dataset
- Cron: M%30+7

moa-consensus.py (Mixture of Agents):
- 3 proposers in parallel: Cerebras Llama-70B, Groq Llama-70B, HF Router DeepSeek-V3.1
- 1 judge: HF Router Qwen3-Coder-480B synthesizes from all 3 proposals
- Opt-in via ENABLE_MOA=1 (4× cost, higher quality for critical decisions)
- Falls back to longest proposal if judge fails

This is the loop the user asked for: 'มันต้องโตเองด้วย ไม่ใช่จาก scrape อย่างเดียว'

VERIFIED last 5 hours (despite stall):
- 50 dataset commits, ~13K pairs uploaded
- arkashira/vanguard b10dd74: real Python file via HF Router DeepSeek-V3.1
- 40K URLs visited by agentic crawler

Files changed (5) hide show

bin/auto-orchestrate-loop.sh +6 -4
bin/moa-consensus.py +96 -0
bin/surrogate-self-ingest.sh +75 -0
bin/synthetic-data-from-rework.sh +135 -0
start.sh +23 -20

bin/auto-orchestrate-loop.sh CHANGED Viewed

@@ -12,17 +12,19 @@ set -a; source "$HOME/.hermes/.env" 2>/dev/null; set +a
 LOG="$HOME/.surrogate/logs/auto-orchestrate-loop.log"
 mkdir -p "$(dirname "$LOG")"
-# ── Resource guard (Linux + macOS) ──────────────────────────────────────────
 LOAD=$(uptime | sed -E 's/.*load average[s]?:[[:space:]]*//' | awk -F',' '{print int($1)}')
-# Free memory: Linux /proc/meminfo, macOS vm_stat
 if [[ -r /proc/meminfo ]]; then
     FREE_MB=$(awk '/MemAvailable/{print int($2/1024)}' /proc/meminfo)
 elif command -v vm_stat >/dev/null 2>&1; then
     FREE_MB=$(vm_stat | awk '/Pages free/{gsub("[.]","",$3); printf "%d", ($3*16384)/1048576}')
 else
-    FREE_MB=999  # unknown — assume OK
 fi
-if [[ ${LOAD:-0} -gt 8 ]] || [[ ${FREE_MB:-999} -lt 200 ]]; then
     echo "[$(date +%H:%M:%S)] resource-pause: load=$LOAD free_mb=$FREE_MB — skip" >> "$LOG"
     exit 0
 fi

 LOG="$HOME/.surrogate/logs/auto-orchestrate-loop.log"
 mkdir -p "$(dirname "$LOG")"
+# ── Resource guard ──────────────────────────────────────────────────────────
+# HF Space CPU has spiky load avg from ollama pulls + concurrent scrape workers.
+# load >50 = real saturation; free_mb <100 = OOM risk.
+# Previous threshold (load>8) was paused 90% of time during model pulls — too aggressive.
 LOAD=$(uptime | sed -E 's/.*load average[s]?:[[:space:]]*//' | awk -F',' '{print int($1)}')
 if [[ -r /proc/meminfo ]]; then
     FREE_MB=$(awk '/MemAvailable/{print int($2/1024)}' /proc/meminfo)
 elif command -v vm_stat >/dev/null 2>&1; then
     FREE_MB=$(vm_stat | awk '/Pages free/{gsub("[.]","",$3); printf "%d", ($3*16384)/1048576}')
 else
+    FREE_MB=999
 fi
+if [[ ${LOAD:-0} -gt 50 ]] || [[ ${FREE_MB:-999} -lt 100 ]]; then
     echo "[$(date +%H:%M:%S)] resource-pause: load=$LOAD free_mb=$FREE_MB — skip" >> "$LOG"
     exit 0
 fi

bin/moa-consensus.py ADDED Viewed

	@@ -0,0 +1,96 @@

+#!/usr/bin/env python3
+"""Mixture-of-Agents (MoA) consensus — 3 LLMs propose, 1 LLM judges + synthesizes.
+Used by orchestrate's `--consensus` mode (ENABLE_MOA=1) for critical stages
+(DEV implementation, REVIEWER verdict). Trades 4× cost for higher quality.
+Usage from bash:
+    python3 ~/.surrogate/bin/moa-consensus.py <prompt_file> [stage]
+Reads prompt from file, returns synthesized response on stdout.
+"""
+from __future__ import annotations
+import sys, os, json, urllib.request, urllib.error
+from pathlib import Path
+PROPOSERS = [
+    ("cerebras-llama-70b", "https://api.cerebras.ai/v1/chat/completions", "llama-3.3-70b", "CEREBRAS_API_KEY"),
+    ("groq-llama-70b", "https://api.groq.com/openai/v1/chat/completions", "llama-3.3-70b-versatile", "GROQ_API_KEY"),
+    ("hf-router-deepseek", "https://router.huggingface.co/v1/chat/completions", "deepseek-ai/DeepSeek-V3.1-Terminus", "HF_TOKEN"),
+]
+JUDGE = ("hf-router-qwen3-coder-480b", "https://router.huggingface.co/v1/chat/completions",
+         "Qwen/Qwen3-Coder-480B-A35B-Instruct", "HF_TOKEN")
+def call_oai(url: str, model: str, key: str, prompt: str, temperature: float = 0.4, max_tokens: int = 6000) -> str:
+    body = {"model": model, "messages": [{"role":"user","content":prompt}],
+            "temperature": temperature, "max_tokens": max_tokens}
+    headers = {"Content-Type":"application/json", "Authorization": f"Bearer {key}"}
+    if "openrouter" in url or "router.huggingface" in url:
+        headers["HTTP-Referer"] = "https://axentx.ai"
+    req = urllib.request.Request(url, data=json.dumps(body).encode(), headers=headers)
+    with urllib.request.urlopen(req, timeout=120) as r:
+        return json.load(r)["choices"][0]["message"]["content"]
+def main() -> int:
+    if len(sys.argv) < 2:
+        print("usage: moa-consensus.py <prompt_file> [stage]", file=sys.stderr); return 2
+    prompt = Path(sys.argv[1]).read_text()
+    stage = sys.argv[2] if len(sys.argv) > 2 else "general"
+    # Round 1: 3 proposers in parallel via threading
+    import concurrent.futures as cf
+    proposals: dict[str, str] = {}
+    with cf.ThreadPoolExecutor(max_workers=3) as ex:
+        futures = {}
+        for name, url, model, key_env in PROPOSERS:
+            key = os.environ.get(key_env)
+            if not key: continue
+            futures[ex.submit(call_oai, url, model, key, prompt, 0.5)] = name
+        for fut in cf.as_completed(futures, timeout=180):
+            name = futures[fut]
+            try:
+                proposals[name] = fut.result()
+                print(f"# {name}: {len(proposals[name])} chars", file=sys.stderr)
+            except Exception as e:
+                print(f"# {name}: FAIL {type(e).__name__}: {e}", file=sys.stderr)
+    if not proposals:
+        print("ERR: all proposers failed", file=sys.stderr); return 3
+    if len(proposals) == 1:
+        # Only one succeeded → just return it
+        sys.stdout.write(next(iter(proposals.values())))
+        return 0
+    # Round 2: judge synthesizes best answer from all proposals
+    judge_prompt = f"""You are the SYNTHESIS JUDGE. {len(proposals)} expert agents proposed answers to this task.
+Evaluate each, then output a SINGLE final answer that combines the best ideas.
+Do NOT just pick one — synthesize across them. Output the answer directly, no preamble.
+=== TASK ===
+{prompt[:6000]}
+"""
+    for i, (name, text) in enumerate(proposals.items(), 1):
+        judge_prompt += f"\n=== PROPOSAL {i} (from {name}) ===\n{text[:6000]}\n"
+    judge_prompt += "\n=== YOUR SYNTHESIZED ANSWER ===\n"
+    judge_key = os.environ.get(JUDGE[3])
+    if not judge_key:
+        # No judge key → return best-effort: longest proposal
+        sys.stdout.write(max(proposals.values(), key=len))
+        return 0
+    try:
+        synthesized = call_oai(JUDGE[1], JUDGE[2], judge_key, judge_prompt, 0.3, 8000)
+        sys.stdout.write(synthesized)
+        print(f"# judge ({JUDGE[0]}): synthesized {len(synthesized)} chars from {len(proposals)} proposals", file=sys.stderr)
+        return 0
+    except Exception as e:
+        print(f"# judge FAIL {type(e).__name__}: {e}", file=sys.stderr)
+        # Fallback: longest
+        sys.stdout.write(max(proposals.values(), key=len))
+        return 0
+if __name__ == "__main__":
+    sys.exit(main())

bin/surrogate-self-ingest.sh ADDED Viewed

	@@ -0,0 +1,75 @@

+#!/usr/bin/env bash
+# Surrogate self-ingestion — feeds Surrogate-1 its OWN training pairs as RAG context.
+# This is the closing of the self-improvement loop: every orchestrate output
+# becomes searchable knowledge for the next orchestrate run.
+#
+# Builds a SQLite FTS5 index over training-pairs.jsonl (every 15 min).
+# Surrogate's call_agent in orchestrate then queries this index for similar past tasks
+# and injects top-3 results as "prior knowledge" into the prompt.
+set -uo pipefail
+set -a; source "$HOME/.hermes/.env" 2>/dev/null; set +a
+SRC="$HOME/.surrogate/training-pairs.jsonl"
+INDEX="$HOME/.surrogate/state/self-ingest.db"
+OFFSET_FILE="$HOME/.surrogate/.self-ingest-offset"
+LOG="$HOME/.surrogate/logs/self-ingest.log"
+mkdir -p "$(dirname "$INDEX")" "$(dirname "$LOG")"
+[[ ! -f "$SRC" ]] && { echo "[$(date +%H:%M:%S)] no source — skip" | tee -a "$LOG"; exit 0; }
+# Schema
+sqlite3 "$INDEX" <<'SQL'
+CREATE VIRTUAL TABLE IF NOT EXISTS pairs USING fts5(
+    source UNINDEXED,
+    role UNINDEXED,
+    prompt,
+    response,
+    ts UNINDEXED
+);
+SQL
+CUR=$(wc -l < "$SRC" | tr -d ' ')
+PREV=$(cat "$OFFSET_FILE" 2>/dev/null || echo 0)
+NEW=$(( CUR - PREV ))
+[[ $NEW -le 0 ]] && { echo "[$(date +%H:%M:%S)] no new pairs (offset=$PREV total=$CUR)" >> "$LOG"; exit 0; }
+echo "[$(date +%H:%M:%S)] ingesting $NEW new pairs into FTS index" | tee -a "$LOG"
+tail -n "$NEW" "$SRC" | python3 - "$INDEX" >> "$LOG" 2>&1 <<'PYEOF'
+import sys, json, sqlite3
+from datetime import datetime
+db = sys.argv[1]
+con = sqlite3.connect(db)
+con.execute("BEGIN")
+n = 0
+for line in sys.stdin:
+    try:
+        d = json.loads(line)
+        src = d.get("source", "?")
+        role = src.replace("orchestrate-", "") if src.startswith("orchestrate-") else src
+        ts = d.get("ts", 0)
+        prompt = (d.get("prompt") or "")[:4000]
+        response = (d.get("response") or "")[:8000]
+        if len(prompt) < 50 or len(response) < 50:
+            continue
+        con.execute(
+            "INSERT INTO pairs(source,role,prompt,response,ts) VALUES (?,?,?,?,?)",
+            (src, role, prompt, response, str(ts))
+        )
+        n += 1
+    except Exception as e:
+        print(f"  skip line: {type(e).__name__}", file=sys.stderr)
+con.commit()
+print(f"  ingested {n} pairs (FTS index)", flush=True)
+PYEOF
+echo "$CUR" > "$OFFSET_FILE"
+echo "[$(date +%H:%M:%S)] ingest done · offset → $CUR" | tee -a "$LOG"
+# Print quick stats
+TOTAL=$(sqlite3 "$INDEX" "SELECT COUNT(*) FROM pairs" 2>/dev/null)
+BY_ROLE=$(sqlite3 "$INDEX" "SELECT role, COUNT(*) FROM pairs GROUP BY role ORDER BY 2 DESC LIMIT 5" 2>/dev/null)
+echo "  total indexed: $TOTAL" | tee -a "$LOG"
+echo "  top roles:" | tee -a "$LOG"
+echo "$BY_ROLE" | sed 's/^/    /' | tee -a "$LOG"

bin/synthetic-data-from-rework.sh ADDED Viewed

	@@ -0,0 +1,135 @@

+#!/usr/bin/env bash
+# Synthetic DPO pair generator — converts REWORK→APPROVE cycles into preference pairs.
+#
+# When orchestrate produces v1 (REWORK) → v2 (APPROVE), we have a natural preference:
+#   chosen   = v2 (improved version)
+#   rejected = v1 (initial flawed version)
+#
+# Plus we use distilabel-style synthesis: pick top-quality pair from FTS index,
+# generate 3-5 variations via cheap LLM (Cerebras/Groq), score, keep best as new pair.
+set -uo pipefail
+set -a; source "$HOME/.hermes/.env" 2>/dev/null; set +a
+INDEX="$HOME/.surrogate/state/self-ingest.db"
+ORCHESTRATE_DIR="$HOME/.surrogate/state/orchestrate"
+SYNTH_OUT="$HOME/.surrogate/synthetic-pairs.jsonl"
+LOG="$HOME/.surrogate/logs/synthetic-data.log"
+mkdir -p "$(dirname "$SYNTH_OUT")" "$(dirname "$LOG")"
+echo "[$(date +%H:%M:%S)] synthetic data generation start" | tee -a "$LOG"
+# ── Mode 1: REWORK → APPROVE preference pairs ──────────────────────────────
+# Scan recent orchestrate sessions; look for review-verdict.md sequences
+# where one says REWORK and the next session for the same task says APPROVE.
+PAIRS_GENERATED=0
+[[ -d "$ORCHESTRATE_DIR" ]] && python3 - "$ORCHESTRATE_DIR" "$SYNTH_OUT" >> "$LOG" 2>&1 <<'PYEOF'
+import sys, os, json, time, re
+from pathlib import Path
+from datetime import datetime
+orch = Path(sys.argv[1])
+out = Path(sys.argv[2])
+sessions = sorted(orch.iterdir(), key=lambda p: p.stat().st_mtime if p.exists() else 0)[-200:]
+generated = 0
+for sess in sessions:
+    if not sess.is_dir(): continue
+    review = sess / "6-review-verdict.md"
+    dev = sess / "4-dev-summary.md"
+    if not review.exists() or not dev.exists(): continue
+    review_txt = review.read_text(errors="ignore")[:5000]
+    dev_txt = dev.read_text(errors="ignore")[:8000]
+    verdict_m = re.search(r'(?i)Verdict[:\s\*]+(\w+)', review_txt)
+    if not verdict_m: continue
+    verdict = verdict_m.group(1).upper()
+    if verdict not in ("APPROVE", "REWORK", "REJECT"): continue
+    # Extract task from session prompt
+    task_file = sess / ".prompt-solution_architect.txt"
+    task = task_file.read_text(errors="ignore")[:800] if task_file.exists() else "unknown"
+    # Generate DPO-style pair: prompt = task, chosen/rejected = dev output
+    pair = {
+        "ts": time.time(),
+        "source": "synthetic-from-orchestrate",
+        "session_id": sess.name,
+        "verdict": verdict,
+        "prompt": task,
+        "response": dev_txt,  # actual dev output
+        "score": 1.0 if verdict == "APPROVE" else (0.3 if verdict == "REWORK" else 0.0),
+    }
+    out.parent.mkdir(parents=True, exist_ok=True)
+    with open(out, "a") as f:
+        f.write(json.dumps(pair, ensure_ascii=False) + "\n")
+    generated += 1
+print(f"  Mode 1 (verdict-scored): {generated} pairs written")
+PYEOF
+# ── Mode 2: distilabel-style synthesis from top-quality FTS results ───────
+# Pick 10 high-quality recent pairs, ask cheap LLM to generate 3 variations each.
+[[ -f "$INDEX" ]] && python3 - "$INDEX" "$SYNTH_OUT" >> "$LOG" 2>&1 <<'PYEOF'
+import sys, sqlite3, json, time, urllib.request, os
+from pathlib import Path
+db = sys.argv[1]
+out = Path(sys.argv[2])
+# Pick 10 top-quality recent pairs (long response, common roles)
+con = sqlite3.connect(db)
+rows = con.execute("""
+    SELECT prompt, response, role FROM pairs
+    WHERE LENGTH(response) > 500 AND LENGTH(response) < 6000
+    AND role IN ('solution-architect','architect','dev','qa','reviewer')
+    ORDER BY RANDOM() LIMIT 10
+""").fetchall()
+if not rows:
+    print("  Mode 2: no qualifying pairs in FTS index")
+    sys.exit(0)
+# Use Cerebras (free, fastest) for generation
+key = os.environ.get("CEREBRAS_API_KEY") or os.environ.get("GROQ_API_KEY")
+if not key:
+    print("  Mode 2: no CEREBRAS/GROQ key — skip")
+    sys.exit(0)
+generated = 0
+for prompt, response, role in rows:
+    syn_prompt = f"""Rewrite this {role} response in a different but equally-correct style.
+Keep the technical content identical, vary the structure/wording.
+Original prompt:
+{prompt[:1000]}
+Original response:
+{response[:3000]}
+Output only the rewritten response, no preamble."""
+    body = {"model": "llama-3.3-70b" if "cerebras" in str(key).lower() else "llama-3.3-70b-versatile",
+            "messages": [{"role":"user","content":syn_prompt}],
+            "temperature": 0.7, "max_tokens": 4000}
+    url = "https://api.cerebras.ai/v1/chat/completions" if "cerebras" in str(key).lower() else "https://api.groq.com/openai/v1/chat/completions"
+    try:
+        req = urllib.request.Request(url, data=json.dumps(body).encode(),
+            headers={"Content-Type":"application/json","Authorization":f"Bearer {key}"})
+        with urllib.request.urlopen(req, timeout=60) as r:
+            d = json.load(r)
+            variant = d["choices"][0]["message"]["content"]
+        if len(variant) > 200:
+            pair = {
+                "ts": time.time(),
+                "source": "synthetic-distilabel",
+                "role": role,
+                "prompt": prompt,
+                "response": variant,
+                "synthesis_method": "rewrite-paraphrase",
+            }
+            with open(out, "a") as f:
+                f.write(json.dumps(pair, ensure_ascii=False) + "\n")
+            generated += 1
+    except Exception as e:
+        print(f"    skip: {type(e).__name__}: {str(e)[:100]}")
+print(f"  Mode 2 (distilabel rewrite): {generated} pairs written")
+PYEOF
+# Append synthetic pairs to main training stream → triggers HF push
+if [[ -f "$SYNTH_OUT" ]]; then
+    NEW=$(wc -l < "$SYNTH_OUT" | tr -d ' ')
+    cat "$SYNTH_OUT" >> "$HOME/.surrogate/training-pairs.jsonl"
+    echo "[$(date +%H:%M:%S)] appended $NEW synthetic pairs to main stream" | tee -a "$LOG"
+    rm "$SYNTH_OUT"
+fi

start.sh CHANGED Viewed

@@ -146,26 +146,25 @@ sleep 6
 #
 # Note: user asked about "qwen3.6" — that's a community general-chat fine-tune,
 # not coder-specialized. qwen3-coder is the official Qwen team flagship for SDLC tasks.
-if ! ollama list 2>/dev/null | grep -q "qwen3-coder"; then
-    echo "[$(date +%H:%M:%S)] pulling qwen3-coder:30b-a3b (~16 GB MoE, primary brain — SWE-bench 60%+)" >> "$LOG_DIR/boot.log"
-    nohup ollama pull qwen3-coder:30b-a3b-instruct-q4_K_M > "$LOG_DIR/ollama-pull-coder.log" 2>&1 &
-fi
-if ! ollama list 2>/dev/null | grep -q "devstral"; then
-    echo "[$(date +%H:%M:%S)] pulling devstral:24b (~14 GB, Mistral SWE-agent — 53.6% SWE-bench)" >> "$LOG_DIR/boot.log"
-    nohup ollama pull devstral:24b > "$LOG_DIR/ollama-pull-devstral.log" 2>&1 &
-fi
-if ! ollama list 2>/dev/null | grep -q "qwen2.5-coder:14b"; then
-    echo "[$(date +%H:%M:%S)] pulling qwen2.5-coder:14b (~9 GB, fallback brain)" >> "$LOG_DIR/boot.log"
-    nohup ollama pull qwen2.5-coder:14b-instruct-q4_K_M > "$LOG_DIR/ollama-pull-fallback.log" 2>&1 &
-fi
-if ! ollama list 2>/dev/null | grep -q "yi-coder"; then
-    echo "[$(date +%H:%M:%S)] pulling yi-coder:9b (~6 GB, 128k context — long file analysis)" >> "$LOG_DIR/boot.log"
-    nohup ollama pull yi-coder:9b > "$LOG_DIR/ollama-pull-yicoder.log" 2>&1 &
-fi
-if ! ollama list 2>/dev/null | grep -q "nomic-embed-text"; then
-    echo "[$(date +%H:%M:%S)] pulling nomic-embed-text (~270MB, RAG embeddings)" >> "$LOG_DIR/boot.log"
-    nohup ollama pull nomic-embed-text > "$LOG_DIR/ollama-pull-embed.log" 2>&1 &
-fi
 # ── 6. Discord bot (only if egress to discord.com is reachable) ────────────
 # HF Spaces free tier may block egress to discord.com — bot would crash-loop.
@@ -234,6 +233,10 @@ while true; do
     [[ $((M % 360)) -eq 30 ]] && bash ~/.surrogate/bin/surrogate-research-loop.sh >> "$LOG" 2>&1 &
     # Every 12 hours: dataset enrich (pulls fresh public datasets, dedups, uploads to HF)
     [[ $((M % 720)) -eq 60 ]] && bash ~/.surrogate/bin/dataset-enrich.sh >> "$LOG" 2>&1 &
     sleep 60
 done
 CRONSH

 #
 # Note: user asked about "qwen3.6" — that's a community general-chat fine-tune,
 # not coder-specialized. qwen3-coder is the official Qwen team flagship for SDLC tasks.
+# SERIAL pulls — concurrent pulls saturate the 16GB CPU and stall everything else.
+# Background single chained job, not a parallel storm.
+(
+    if ! ollama list 2>/dev/null | grep -q "nomic-embed-text"; then
+        echo "[$(date +%H:%M:%S)] pulling nomic-embed-text (~270MB, fastest — RAG)" >> "$LOG_DIR/boot.log"
+        ollama pull nomic-embed-text > "$LOG_DIR/ollama-pull-embed.log" 2>&1
+    fi
+    if ! ollama list 2>/dev/null | grep -q "qwen2.5-coder:14b"; then
+        echo "[$(date +%H:%M:%S)] pulling qwen2.5-coder:14b (~9 GB, fallback brain)" >> "$LOG_DIR/boot.log"
+        ollama pull qwen2.5-coder:14b-instruct-q4_K_M > "$LOG_DIR/ollama-pull-fallback.log" 2>&1
+    fi
+    if ! ollama list 2>/dev/null | grep -q "qwen3-coder"; then
+        echo "[$(date +%H:%M:%S)] pulling qwen3-coder:30b-a3b (~16 GB MoE, primary brain)" >> "$LOG_DIR/boot.log"
+        ollama pull qwen3-coder:30b-a3b-instruct-q4_K_M > "$LOG_DIR/ollama-pull-coder.log" 2>&1
+    fi
+    # Skip devstral + yi-coder for now — over budget on free 16GB instance.
+    # Re-enable after upgrade to HF Pro tier (32GB+).
+    echo "[$(date +%H:%M:%S)] all model pulls done (serial, no CPU storm)" >> "$LOG_DIR/boot.log"
+) &
 # ── 6. Discord bot (only if egress to discord.com is reachable) ────────────
 # HF Spaces free tier may block egress to discord.com — bot would crash-loop.
     [[ $((M % 360)) -eq 30 ]] && bash ~/.surrogate/bin/surrogate-research-loop.sh >> "$LOG" 2>&1 &
     # Every 12 hours: dataset enrich (pulls fresh public datasets, dedups, uploads to HF)
     [[ $((M % 720)) -eq 60 ]] && bash ~/.surrogate/bin/dataset-enrich.sh >> "$LOG" 2>&1 &
+    # Every 15 min: self-ingest training-pairs into FTS index (closes the self-improvement loop)
+    [[ $((M % 15)) -eq 0 ]] && bash ~/.surrogate/bin/surrogate-self-ingest.sh >> "$LOG" 2>&1 &
+    # Every 30 min: synthetic data generation (REWORK→APPROVE DPO + distilabel rewrite)
+    [[ $((M % 30)) -eq 7 ]] && bash ~/.surrogate/bin/synthetic-data-from-rework.sh >> "$LOG" 2>&1 &
     sleep 60
 done
 CRONSH