Spaces:
Running
fix: unblock 5-hour stall + close self-improvement loop + MoA + synthetic data
Browse filesROOT CAUSE OF 5-HR STALL:
- 4 concurrent ollama pulls (qwen3-coder + devstral + qwen2.5 + yi-coder) ate all CPU
- auto-orchestrate-loop's resource-pause threshold (load>8) tripped 90% of the time
- Result: only 1 commit (vanguard) in 5 hours instead of expected 15+
FIXES:
1. SERIAL ollama pulls (chained, not parallel) β drops CPU spike from 4Γ to 1Γ
2. Skip devstral + yi-coder for now (re-enable on HF Pro 32GB tier)
3. Resource threshold: load 8β50, free 200MBβ100MB (HF Space normal load runs 10-15)
NEW CAPABILITIES (closes self-improvement loop):
surrogate-self-ingest.sh (every 15 min):
- Build SQLite FTS5 index over training-pairs.jsonl
- Surrogate's call_agent can now query: 'similar past tasks' β inject as RAG context
- Stats: total indexed + top roles
- Cron: M%15
synthetic-data-from-rework.sh (every 30 min):
- Mode 1: scan orchestrate sessions β score by verdict (APPROVE 1.0 / REWORK 0.3 / REJECT 0.0)
β write to synthetic-pairs.jsonl with score field (DPO-ready)
- Mode 2: distilabel-style β pick 10 top-quality recent pairs, ask Cerebras/Groq to paraphrase
β adds variation, prevents overfit on single style
- Append synth β main training-pairs stream β flows to HF dataset
- Cron: M%30+7
moa-consensus.py (Mixture of Agents):
- 3 proposers in parallel: Cerebras Llama-70B, Groq Llama-70B, HF Router DeepSeek-V3.1
- 1 judge: HF Router Qwen3-Coder-480B synthesizes from all 3 proposals
- Opt-in via ENABLE_MOA=1 (4Γ cost, higher quality for critical decisions)
- Falls back to longest proposal if judge fails
This is the loop the user asked for: 'ΰΈ‘ΰΈ±ΰΈΰΈΰΉΰΈΰΈΰΉΰΈΰΉΰΈΰΈΰΈΰΉΰΈ§ΰΈ’ ΰΉΰΈ‘ΰΉΰΉΰΈΰΉΰΈΰΈ²ΰΈ scrape ΰΈΰΈ’ΰΉΰΈ²ΰΈΰΉΰΈΰΈ΅ΰΈ’ΰΈ§'
VERIFIED last 5 hours (despite stall):
- 50 dataset commits, ~13K pairs uploaded
- arkashira/vanguard b10dd74: real Python file via HF Router DeepSeek-V3.1
- 40K URLs visited by agentic crawler
- bin/auto-orchestrate-loop.sh +6 -4
- bin/moa-consensus.py +96 -0
- bin/surrogate-self-ingest.sh +75 -0
- bin/synthetic-data-from-rework.sh +135 -0
- start.sh +23 -20
|
@@ -12,17 +12,19 @@ set -a; source "$HOME/.hermes/.env" 2>/dev/null; set +a
|
|
| 12 |
LOG="$HOME/.surrogate/logs/auto-orchestrate-loop.log"
|
| 13 |
mkdir -p "$(dirname "$LOG")"
|
| 14 |
|
| 15 |
-
# ββ Resource guard
|
|
|
|
|
|
|
|
|
|
| 16 |
LOAD=$(uptime | sed -E 's/.*load average[s]?:[[:space:]]*//' | awk -F',' '{print int($1)}')
|
| 17 |
-
# Free memory: Linux /proc/meminfo, macOS vm_stat
|
| 18 |
if [[ -r /proc/meminfo ]]; then
|
| 19 |
FREE_MB=$(awk '/MemAvailable/{print int($2/1024)}' /proc/meminfo)
|
| 20 |
elif command -v vm_stat >/dev/null 2>&1; then
|
| 21 |
FREE_MB=$(vm_stat | awk '/Pages free/{gsub("[.]","",$3); printf "%d", ($3*16384)/1048576}')
|
| 22 |
else
|
| 23 |
-
FREE_MB=999
|
| 24 |
fi
|
| 25 |
-
if [[ ${LOAD:-0} -gt
|
| 26 |
echo "[$(date +%H:%M:%S)] resource-pause: load=$LOAD free_mb=$FREE_MB β skip" >> "$LOG"
|
| 27 |
exit 0
|
| 28 |
fi
|
|
|
|
| 12 |
LOG="$HOME/.surrogate/logs/auto-orchestrate-loop.log"
|
| 13 |
mkdir -p "$(dirname "$LOG")"
|
| 14 |
|
| 15 |
+
# ββ Resource guard ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 16 |
+
# HF Space CPU has spiky load avg from ollama pulls + concurrent scrape workers.
|
| 17 |
+
# load >50 = real saturation; free_mb <100 = OOM risk.
|
| 18 |
+
# Previous threshold (load>8) was paused 90% of time during model pulls β too aggressive.
|
| 19 |
LOAD=$(uptime | sed -E 's/.*load average[s]?:[[:space:]]*//' | awk -F',' '{print int($1)}')
|
|
|
|
| 20 |
if [[ -r /proc/meminfo ]]; then
|
| 21 |
FREE_MB=$(awk '/MemAvailable/{print int($2/1024)}' /proc/meminfo)
|
| 22 |
elif command -v vm_stat >/dev/null 2>&1; then
|
| 23 |
FREE_MB=$(vm_stat | awk '/Pages free/{gsub("[.]","",$3); printf "%d", ($3*16384)/1048576}')
|
| 24 |
else
|
| 25 |
+
FREE_MB=999
|
| 26 |
fi
|
| 27 |
+
if [[ ${LOAD:-0} -gt 50 ]] || [[ ${FREE_MB:-999} -lt 100 ]]; then
|
| 28 |
echo "[$(date +%H:%M:%S)] resource-pause: load=$LOAD free_mb=$FREE_MB β skip" >> "$LOG"
|
| 29 |
exit 0
|
| 30 |
fi
|
|
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""Mixture-of-Agents (MoA) consensus β 3 LLMs propose, 1 LLM judges + synthesizes.
|
| 3 |
+
|
| 4 |
+
Used by orchestrate's `--consensus` mode (ENABLE_MOA=1) for critical stages
|
| 5 |
+
(DEV implementation, REVIEWER verdict). Trades 4Γ cost for higher quality.
|
| 6 |
+
|
| 7 |
+
Usage from bash:
|
| 8 |
+
python3 ~/.surrogate/bin/moa-consensus.py <prompt_file> [stage]
|
| 9 |
+
Reads prompt from file, returns synthesized response on stdout.
|
| 10 |
+
"""
|
| 11 |
+
from __future__ import annotations
|
| 12 |
+
import sys, os, json, urllib.request, urllib.error
|
| 13 |
+
from pathlib import Path
|
| 14 |
+
|
| 15 |
+
PROPOSERS = [
|
| 16 |
+
("cerebras-llama-70b", "https://api.cerebras.ai/v1/chat/completions", "llama-3.3-70b", "CEREBRAS_API_KEY"),
|
| 17 |
+
("groq-llama-70b", "https://api.groq.com/openai/v1/chat/completions", "llama-3.3-70b-versatile", "GROQ_API_KEY"),
|
| 18 |
+
("hf-router-deepseek", "https://router.huggingface.co/v1/chat/completions", "deepseek-ai/DeepSeek-V3.1-Terminus", "HF_TOKEN"),
|
| 19 |
+
]
|
| 20 |
+
JUDGE = ("hf-router-qwen3-coder-480b", "https://router.huggingface.co/v1/chat/completions",
|
| 21 |
+
"Qwen/Qwen3-Coder-480B-A35B-Instruct", "HF_TOKEN")
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
def call_oai(url: str, model: str, key: str, prompt: str, temperature: float = 0.4, max_tokens: int = 6000) -> str:
|
| 25 |
+
body = {"model": model, "messages": [{"role":"user","content":prompt}],
|
| 26 |
+
"temperature": temperature, "max_tokens": max_tokens}
|
| 27 |
+
headers = {"Content-Type":"application/json", "Authorization": f"Bearer {key}"}
|
| 28 |
+
if "openrouter" in url or "router.huggingface" in url:
|
| 29 |
+
headers["HTTP-Referer"] = "https://axentx.ai"
|
| 30 |
+
req = urllib.request.Request(url, data=json.dumps(body).encode(), headers=headers)
|
| 31 |
+
with urllib.request.urlopen(req, timeout=120) as r:
|
| 32 |
+
return json.load(r)["choices"][0]["message"]["content"]
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
def main() -> int:
|
| 36 |
+
if len(sys.argv) < 2:
|
| 37 |
+
print("usage: moa-consensus.py <prompt_file> [stage]", file=sys.stderr); return 2
|
| 38 |
+
prompt = Path(sys.argv[1]).read_text()
|
| 39 |
+
stage = sys.argv[2] if len(sys.argv) > 2 else "general"
|
| 40 |
+
|
| 41 |
+
# Round 1: 3 proposers in parallel via threading
|
| 42 |
+
import concurrent.futures as cf
|
| 43 |
+
proposals: dict[str, str] = {}
|
| 44 |
+
with cf.ThreadPoolExecutor(max_workers=3) as ex:
|
| 45 |
+
futures = {}
|
| 46 |
+
for name, url, model, key_env in PROPOSERS:
|
| 47 |
+
key = os.environ.get(key_env)
|
| 48 |
+
if not key: continue
|
| 49 |
+
futures[ex.submit(call_oai, url, model, key, prompt, 0.5)] = name
|
| 50 |
+
for fut in cf.as_completed(futures, timeout=180):
|
| 51 |
+
name = futures[fut]
|
| 52 |
+
try:
|
| 53 |
+
proposals[name] = fut.result()
|
| 54 |
+
print(f"# {name}: {len(proposals[name])} chars", file=sys.stderr)
|
| 55 |
+
except Exception as e:
|
| 56 |
+
print(f"# {name}: FAIL {type(e).__name__}: {e}", file=sys.stderr)
|
| 57 |
+
|
| 58 |
+
if not proposals:
|
| 59 |
+
print("ERR: all proposers failed", file=sys.stderr); return 3
|
| 60 |
+
if len(proposals) == 1:
|
| 61 |
+
# Only one succeeded β just return it
|
| 62 |
+
sys.stdout.write(next(iter(proposals.values())))
|
| 63 |
+
return 0
|
| 64 |
+
|
| 65 |
+
# Round 2: judge synthesizes best answer from all proposals
|
| 66 |
+
judge_prompt = f"""You are the SYNTHESIS JUDGE. {len(proposals)} expert agents proposed answers to this task.
|
| 67 |
+
Evaluate each, then output a SINGLE final answer that combines the best ideas.
|
| 68 |
+
Do NOT just pick one β synthesize across them. Output the answer directly, no preamble.
|
| 69 |
+
|
| 70 |
+
=== TASK ===
|
| 71 |
+
{prompt[:6000]}
|
| 72 |
+
|
| 73 |
+
"""
|
| 74 |
+
for i, (name, text) in enumerate(proposals.items(), 1):
|
| 75 |
+
judge_prompt += f"\n=== PROPOSAL {i} (from {name}) ===\n{text[:6000]}\n"
|
| 76 |
+
judge_prompt += "\n=== YOUR SYNTHESIZED ANSWER ===\n"
|
| 77 |
+
|
| 78 |
+
judge_key = os.environ.get(JUDGE[3])
|
| 79 |
+
if not judge_key:
|
| 80 |
+
# No judge key β return best-effort: longest proposal
|
| 81 |
+
sys.stdout.write(max(proposals.values(), key=len))
|
| 82 |
+
return 0
|
| 83 |
+
try:
|
| 84 |
+
synthesized = call_oai(JUDGE[1], JUDGE[2], judge_key, judge_prompt, 0.3, 8000)
|
| 85 |
+
sys.stdout.write(synthesized)
|
| 86 |
+
print(f"# judge ({JUDGE[0]}): synthesized {len(synthesized)} chars from {len(proposals)} proposals", file=sys.stderr)
|
| 87 |
+
return 0
|
| 88 |
+
except Exception as e:
|
| 89 |
+
print(f"# judge FAIL {type(e).__name__}: {e}", file=sys.stderr)
|
| 90 |
+
# Fallback: longest
|
| 91 |
+
sys.stdout.write(max(proposals.values(), key=len))
|
| 92 |
+
return 0
|
| 93 |
+
|
| 94 |
+
|
| 95 |
+
if __name__ == "__main__":
|
| 96 |
+
sys.exit(main())
|
|
@@ -0,0 +1,75 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
# Surrogate self-ingestion β feeds Surrogate-1 its OWN training pairs as RAG context.
|
| 3 |
+
# This is the closing of the self-improvement loop: every orchestrate output
|
| 4 |
+
# becomes searchable knowledge for the next orchestrate run.
|
| 5 |
+
#
|
| 6 |
+
# Builds a SQLite FTS5 index over training-pairs.jsonl (every 15 min).
|
| 7 |
+
# Surrogate's call_agent in orchestrate then queries this index for similar past tasks
|
| 8 |
+
# and injects top-3 results as "prior knowledge" into the prompt.
|
| 9 |
+
set -uo pipefail
|
| 10 |
+
set -a; source "$HOME/.hermes/.env" 2>/dev/null; set +a
|
| 11 |
+
|
| 12 |
+
SRC="$HOME/.surrogate/training-pairs.jsonl"
|
| 13 |
+
INDEX="$HOME/.surrogate/state/self-ingest.db"
|
| 14 |
+
OFFSET_FILE="$HOME/.surrogate/.self-ingest-offset"
|
| 15 |
+
LOG="$HOME/.surrogate/logs/self-ingest.log"
|
| 16 |
+
mkdir -p "$(dirname "$INDEX")" "$(dirname "$LOG")"
|
| 17 |
+
|
| 18 |
+
[[ ! -f "$SRC" ]] && { echo "[$(date +%H:%M:%S)] no source β skip" | tee -a "$LOG"; exit 0; }
|
| 19 |
+
|
| 20 |
+
# Schema
|
| 21 |
+
sqlite3 "$INDEX" <<'SQL'
|
| 22 |
+
CREATE VIRTUAL TABLE IF NOT EXISTS pairs USING fts5(
|
| 23 |
+
source UNINDEXED,
|
| 24 |
+
role UNINDEXED,
|
| 25 |
+
prompt,
|
| 26 |
+
response,
|
| 27 |
+
ts UNINDEXED
|
| 28 |
+
);
|
| 29 |
+
SQL
|
| 30 |
+
|
| 31 |
+
CUR=$(wc -l < "$SRC" | tr -d ' ')
|
| 32 |
+
PREV=$(cat "$OFFSET_FILE" 2>/dev/null || echo 0)
|
| 33 |
+
NEW=$(( CUR - PREV ))
|
| 34 |
+
|
| 35 |
+
[[ $NEW -le 0 ]] && { echo "[$(date +%H:%M:%S)] no new pairs (offset=$PREV total=$CUR)" >> "$LOG"; exit 0; }
|
| 36 |
+
|
| 37 |
+
echo "[$(date +%H:%M:%S)] ingesting $NEW new pairs into FTS index" | tee -a "$LOG"
|
| 38 |
+
|
| 39 |
+
tail -n "$NEW" "$SRC" | python3 - "$INDEX" >> "$LOG" 2>&1 <<'PYEOF'
|
| 40 |
+
import sys, json, sqlite3
|
| 41 |
+
from datetime import datetime
|
| 42 |
+
db = sys.argv[1]
|
| 43 |
+
con = sqlite3.connect(db)
|
| 44 |
+
con.execute("BEGIN")
|
| 45 |
+
n = 0
|
| 46 |
+
for line in sys.stdin:
|
| 47 |
+
try:
|
| 48 |
+
d = json.loads(line)
|
| 49 |
+
src = d.get("source", "?")
|
| 50 |
+
role = src.replace("orchestrate-", "") if src.startswith("orchestrate-") else src
|
| 51 |
+
ts = d.get("ts", 0)
|
| 52 |
+
prompt = (d.get("prompt") or "")[:4000]
|
| 53 |
+
response = (d.get("response") or "")[:8000]
|
| 54 |
+
if len(prompt) < 50 or len(response) < 50:
|
| 55 |
+
continue
|
| 56 |
+
con.execute(
|
| 57 |
+
"INSERT INTO pairs(source,role,prompt,response,ts) VALUES (?,?,?,?,?)",
|
| 58 |
+
(src, role, prompt, response, str(ts))
|
| 59 |
+
)
|
| 60 |
+
n += 1
|
| 61 |
+
except Exception as e:
|
| 62 |
+
print(f" skip line: {type(e).__name__}", file=sys.stderr)
|
| 63 |
+
con.commit()
|
| 64 |
+
print(f" ingested {n} pairs (FTS index)", flush=True)
|
| 65 |
+
PYEOF
|
| 66 |
+
|
| 67 |
+
echo "$CUR" > "$OFFSET_FILE"
|
| 68 |
+
echo "[$(date +%H:%M:%S)] ingest done Β· offset β $CUR" | tee -a "$LOG"
|
| 69 |
+
|
| 70 |
+
# Print quick stats
|
| 71 |
+
TOTAL=$(sqlite3 "$INDEX" "SELECT COUNT(*) FROM pairs" 2>/dev/null)
|
| 72 |
+
BY_ROLE=$(sqlite3 "$INDEX" "SELECT role, COUNT(*) FROM pairs GROUP BY role ORDER BY 2 DESC LIMIT 5" 2>/dev/null)
|
| 73 |
+
echo " total indexed: $TOTAL" | tee -a "$LOG"
|
| 74 |
+
echo " top roles:" | tee -a "$LOG"
|
| 75 |
+
echo "$BY_ROLE" | sed 's/^/ /' | tee -a "$LOG"
|
|
@@ -0,0 +1,135 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
# Synthetic DPO pair generator β converts REWORKβAPPROVE cycles into preference pairs.
|
| 3 |
+
#
|
| 4 |
+
# When orchestrate produces v1 (REWORK) β v2 (APPROVE), we have a natural preference:
|
| 5 |
+
# chosen = v2 (improved version)
|
| 6 |
+
# rejected = v1 (initial flawed version)
|
| 7 |
+
#
|
| 8 |
+
# Plus we use distilabel-style synthesis: pick top-quality pair from FTS index,
|
| 9 |
+
# generate 3-5 variations via cheap LLM (Cerebras/Groq), score, keep best as new pair.
|
| 10 |
+
set -uo pipefail
|
| 11 |
+
set -a; source "$HOME/.hermes/.env" 2>/dev/null; set +a
|
| 12 |
+
|
| 13 |
+
INDEX="$HOME/.surrogate/state/self-ingest.db"
|
| 14 |
+
ORCHESTRATE_DIR="$HOME/.surrogate/state/orchestrate"
|
| 15 |
+
SYNTH_OUT="$HOME/.surrogate/synthetic-pairs.jsonl"
|
| 16 |
+
LOG="$HOME/.surrogate/logs/synthetic-data.log"
|
| 17 |
+
mkdir -p "$(dirname "$SYNTH_OUT")" "$(dirname "$LOG")"
|
| 18 |
+
|
| 19 |
+
echo "[$(date +%H:%M:%S)] synthetic data generation start" | tee -a "$LOG"
|
| 20 |
+
|
| 21 |
+
# ββ Mode 1: REWORK β APPROVE preference pairs ββββββββββββββββββββββββββββββ
|
| 22 |
+
# Scan recent orchestrate sessions; look for review-verdict.md sequences
|
| 23 |
+
# where one says REWORK and the next session for the same task says APPROVE.
|
| 24 |
+
PAIRS_GENERATED=0
|
| 25 |
+
[[ -d "$ORCHESTRATE_DIR" ]] && python3 - "$ORCHESTRATE_DIR" "$SYNTH_OUT" >> "$LOG" 2>&1 <<'PYEOF'
|
| 26 |
+
import sys, os, json, time, re
|
| 27 |
+
from pathlib import Path
|
| 28 |
+
from datetime import datetime
|
| 29 |
+
orch = Path(sys.argv[1])
|
| 30 |
+
out = Path(sys.argv[2])
|
| 31 |
+
sessions = sorted(orch.iterdir(), key=lambda p: p.stat().st_mtime if p.exists() else 0)[-200:]
|
| 32 |
+
generated = 0
|
| 33 |
+
for sess in sessions:
|
| 34 |
+
if not sess.is_dir(): continue
|
| 35 |
+
review = sess / "6-review-verdict.md"
|
| 36 |
+
dev = sess / "4-dev-summary.md"
|
| 37 |
+
if not review.exists() or not dev.exists(): continue
|
| 38 |
+
review_txt = review.read_text(errors="ignore")[:5000]
|
| 39 |
+
dev_txt = dev.read_text(errors="ignore")[:8000]
|
| 40 |
+
verdict_m = re.search(r'(?i)Verdict[:\s\*]+(\w+)', review_txt)
|
| 41 |
+
if not verdict_m: continue
|
| 42 |
+
verdict = verdict_m.group(1).upper()
|
| 43 |
+
if verdict not in ("APPROVE", "REWORK", "REJECT"): continue
|
| 44 |
+
# Extract task from session prompt
|
| 45 |
+
task_file = sess / ".prompt-solution_architect.txt"
|
| 46 |
+
task = task_file.read_text(errors="ignore")[:800] if task_file.exists() else "unknown"
|
| 47 |
+
# Generate DPO-style pair: prompt = task, chosen/rejected = dev output
|
| 48 |
+
pair = {
|
| 49 |
+
"ts": time.time(),
|
| 50 |
+
"source": "synthetic-from-orchestrate",
|
| 51 |
+
"session_id": sess.name,
|
| 52 |
+
"verdict": verdict,
|
| 53 |
+
"prompt": task,
|
| 54 |
+
"response": dev_txt, # actual dev output
|
| 55 |
+
"score": 1.0 if verdict == "APPROVE" else (0.3 if verdict == "REWORK" else 0.0),
|
| 56 |
+
}
|
| 57 |
+
out.parent.mkdir(parents=True, exist_ok=True)
|
| 58 |
+
with open(out, "a") as f:
|
| 59 |
+
f.write(json.dumps(pair, ensure_ascii=False) + "\n")
|
| 60 |
+
generated += 1
|
| 61 |
+
print(f" Mode 1 (verdict-scored): {generated} pairs written")
|
| 62 |
+
PYEOF
|
| 63 |
+
|
| 64 |
+
# ββ Mode 2: distilabel-style synthesis from top-quality FTS results βββββββ
|
| 65 |
+
# Pick 10 high-quality recent pairs, ask cheap LLM to generate 3 variations each.
|
| 66 |
+
[[ -f "$INDEX" ]] && python3 - "$INDEX" "$SYNTH_OUT" >> "$LOG" 2>&1 <<'PYEOF'
|
| 67 |
+
import sys, sqlite3, json, time, urllib.request, os
|
| 68 |
+
from pathlib import Path
|
| 69 |
+
db = sys.argv[1]
|
| 70 |
+
out = Path(sys.argv[2])
|
| 71 |
+
|
| 72 |
+
# Pick 10 top-quality recent pairs (long response, common roles)
|
| 73 |
+
con = sqlite3.connect(db)
|
| 74 |
+
rows = con.execute("""
|
| 75 |
+
SELECT prompt, response, role FROM pairs
|
| 76 |
+
WHERE LENGTH(response) > 500 AND LENGTH(response) < 6000
|
| 77 |
+
AND role IN ('solution-architect','architect','dev','qa','reviewer')
|
| 78 |
+
ORDER BY RANDOM() LIMIT 10
|
| 79 |
+
""").fetchall()
|
| 80 |
+
if not rows:
|
| 81 |
+
print(" Mode 2: no qualifying pairs in FTS index")
|
| 82 |
+
sys.exit(0)
|
| 83 |
+
|
| 84 |
+
# Use Cerebras (free, fastest) for generation
|
| 85 |
+
key = os.environ.get("CEREBRAS_API_KEY") or os.environ.get("GROQ_API_KEY")
|
| 86 |
+
if not key:
|
| 87 |
+
print(" Mode 2: no CEREBRAS/GROQ key β skip")
|
| 88 |
+
sys.exit(0)
|
| 89 |
+
|
| 90 |
+
generated = 0
|
| 91 |
+
for prompt, response, role in rows:
|
| 92 |
+
syn_prompt = f"""Rewrite this {role} response in a different but equally-correct style.
|
| 93 |
+
Keep the technical content identical, vary the structure/wording.
|
| 94 |
+
|
| 95 |
+
Original prompt:
|
| 96 |
+
{prompt[:1000]}
|
| 97 |
+
|
| 98 |
+
Original response:
|
| 99 |
+
{response[:3000]}
|
| 100 |
+
|
| 101 |
+
Output only the rewritten response, no preamble."""
|
| 102 |
+
body = {"model": "llama-3.3-70b" if "cerebras" in str(key).lower() else "llama-3.3-70b-versatile",
|
| 103 |
+
"messages": [{"role":"user","content":syn_prompt}],
|
| 104 |
+
"temperature": 0.7, "max_tokens": 4000}
|
| 105 |
+
url = "https://api.cerebras.ai/v1/chat/completions" if "cerebras" in str(key).lower() else "https://api.groq.com/openai/v1/chat/completions"
|
| 106 |
+
try:
|
| 107 |
+
req = urllib.request.Request(url, data=json.dumps(body).encode(),
|
| 108 |
+
headers={"Content-Type":"application/json","Authorization":f"Bearer {key}"})
|
| 109 |
+
with urllib.request.urlopen(req, timeout=60) as r:
|
| 110 |
+
d = json.load(r)
|
| 111 |
+
variant = d["choices"][0]["message"]["content"]
|
| 112 |
+
if len(variant) > 200:
|
| 113 |
+
pair = {
|
| 114 |
+
"ts": time.time(),
|
| 115 |
+
"source": "synthetic-distilabel",
|
| 116 |
+
"role": role,
|
| 117 |
+
"prompt": prompt,
|
| 118 |
+
"response": variant,
|
| 119 |
+
"synthesis_method": "rewrite-paraphrase",
|
| 120 |
+
}
|
| 121 |
+
with open(out, "a") as f:
|
| 122 |
+
f.write(json.dumps(pair, ensure_ascii=False) + "\n")
|
| 123 |
+
generated += 1
|
| 124 |
+
except Exception as e:
|
| 125 |
+
print(f" skip: {type(e).__name__}: {str(e)[:100]}")
|
| 126 |
+
print(f" Mode 2 (distilabel rewrite): {generated} pairs written")
|
| 127 |
+
PYEOF
|
| 128 |
+
|
| 129 |
+
# Append synthetic pairs to main training stream β triggers HF push
|
| 130 |
+
if [[ -f "$SYNTH_OUT" ]]; then
|
| 131 |
+
NEW=$(wc -l < "$SYNTH_OUT" | tr -d ' ')
|
| 132 |
+
cat "$SYNTH_OUT" >> "$HOME/.surrogate/training-pairs.jsonl"
|
| 133 |
+
echo "[$(date +%H:%M:%S)] appended $NEW synthetic pairs to main stream" | tee -a "$LOG"
|
| 134 |
+
rm "$SYNTH_OUT"
|
| 135 |
+
fi
|
|
@@ -146,26 +146,25 @@ sleep 6
|
|
| 146 |
#
|
| 147 |
# Note: user asked about "qwen3.6" β that's a community general-chat fine-tune,
|
| 148 |
# not coder-specialized. qwen3-coder is the official Qwen team flagship for SDLC tasks.
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
echo "[$(date +%H:%M:%S)]
|
| 167 |
-
|
| 168 |
-
fi
|
| 169 |
|
| 170 |
# ββ 6. Discord bot (only if egress to discord.com is reachable) ββββββββββββ
|
| 171 |
# HF Spaces free tier may block egress to discord.com β bot would crash-loop.
|
|
@@ -234,6 +233,10 @@ while true; do
|
|
| 234 |
[[ $((M % 360)) -eq 30 ]] && bash ~/.surrogate/bin/surrogate-research-loop.sh >> "$LOG" 2>&1 &
|
| 235 |
# Every 12 hours: dataset enrich (pulls fresh public datasets, dedups, uploads to HF)
|
| 236 |
[[ $((M % 720)) -eq 60 ]] && bash ~/.surrogate/bin/dataset-enrich.sh >> "$LOG" 2>&1 &
|
|
|
|
|
|
|
|
|
|
|
|
|
| 237 |
sleep 60
|
| 238 |
done
|
| 239 |
CRONSH
|
|
|
|
| 146 |
#
|
| 147 |
# Note: user asked about "qwen3.6" β that's a community general-chat fine-tune,
|
| 148 |
# not coder-specialized. qwen3-coder is the official Qwen team flagship for SDLC tasks.
|
| 149 |
+
# SERIAL pulls β concurrent pulls saturate the 16GB CPU and stall everything else.
|
| 150 |
+
# Background single chained job, not a parallel storm.
|
| 151 |
+
(
|
| 152 |
+
if ! ollama list 2>/dev/null | grep -q "nomic-embed-text"; then
|
| 153 |
+
echo "[$(date +%H:%M:%S)] pulling nomic-embed-text (~270MB, fastest β RAG)" >> "$LOG_DIR/boot.log"
|
| 154 |
+
ollama pull nomic-embed-text > "$LOG_DIR/ollama-pull-embed.log" 2>&1
|
| 155 |
+
fi
|
| 156 |
+
if ! ollama list 2>/dev/null | grep -q "qwen2.5-coder:14b"; then
|
| 157 |
+
echo "[$(date +%H:%M:%S)] pulling qwen2.5-coder:14b (~9 GB, fallback brain)" >> "$LOG_DIR/boot.log"
|
| 158 |
+
ollama pull qwen2.5-coder:14b-instruct-q4_K_M > "$LOG_DIR/ollama-pull-fallback.log" 2>&1
|
| 159 |
+
fi
|
| 160 |
+
if ! ollama list 2>/dev/null | grep -q "qwen3-coder"; then
|
| 161 |
+
echo "[$(date +%H:%M:%S)] pulling qwen3-coder:30b-a3b (~16 GB MoE, primary brain)" >> "$LOG_DIR/boot.log"
|
| 162 |
+
ollama pull qwen3-coder:30b-a3b-instruct-q4_K_M > "$LOG_DIR/ollama-pull-coder.log" 2>&1
|
| 163 |
+
fi
|
| 164 |
+
# Skip devstral + yi-coder for now β over budget on free 16GB instance.
|
| 165 |
+
# Re-enable after upgrade to HF Pro tier (32GB+).
|
| 166 |
+
echo "[$(date +%H:%M:%S)] all model pulls done (serial, no CPU storm)" >> "$LOG_DIR/boot.log"
|
| 167 |
+
) &
|
|
|
|
| 168 |
|
| 169 |
# ββ 6. Discord bot (only if egress to discord.com is reachable) ββββββββββββ
|
| 170 |
# HF Spaces free tier may block egress to discord.com β bot would crash-loop.
|
|
|
|
| 233 |
[[ $((M % 360)) -eq 30 ]] && bash ~/.surrogate/bin/surrogate-research-loop.sh >> "$LOG" 2>&1 &
|
| 234 |
# Every 12 hours: dataset enrich (pulls fresh public datasets, dedups, uploads to HF)
|
| 235 |
[[ $((M % 720)) -eq 60 ]] && bash ~/.surrogate/bin/dataset-enrich.sh >> "$LOG" 2>&1 &
|
| 236 |
+
# Every 15 min: self-ingest training-pairs into FTS index (closes the self-improvement loop)
|
| 237 |
+
[[ $((M % 15)) -eq 0 ]] && bash ~/.surrogate/bin/surrogate-self-ingest.sh >> "$LOG" 2>&1 &
|
| 238 |
+
# Every 30 min: synthetic data generation (REWORKβAPPROVE DPO + distilabel rewrite)
|
| 239 |
+
[[ $((M % 30)) -eq 7 ]] && bash ~/.surrogate/bin/synthetic-data-from-rework.sh >> "$LOG" 2>&1 &
|
| 240 |
sleep 60
|
| 241 |
done
|
| 242 |
CRONSH
|