Spaces:
Runtime error
perf(harvest): bump worker fleet 1→16 + push cadence 2× — use the headroom
Browse filesAudit showed each cpu-basic 16GB Space was running only:
BULK_WORKERS=0 + STREAM_WORKERS=1 = 1 worker total
under LOW_MEM=1 (left ~8 GB unused — overly conservative after the
Round 11-12 OOM scare).
Bump:
BULK_WORKERS=0 → 1 (~600 MB)
STREAM_WORKERS=1 → 3 (~500 MB × 3 = 1.5 GB)
total worker fleet/Space: 1 → 4
× 4 harvest Spaces (axentx + shard1 + shard2 + shard3) = **16 workers**
in parallel (was 4). Memory budget per Space:
base ≈6 GB (OS + redis 256mb + continuous-discoverer + dataset-enrich
+ auto-startup-loop + push bursts)
workers ≈2.1 GB (3 stream + 1 bulk)
headroom ≈8 GB → memory-guard.sh trigger threshold (3 GB free) safe
Push cadence: dataset-enrich cron M%60==5 → M%30==5 (every 30 min, was
60). With 4× worker fleet producing chunks faster, the old 60-min push
was leaving harvested data sitting on Space disk too long; 30-min cadence
drains it more eagerly without exceeding HF rate limits.
Throughput projection (current 70-100 commits/hr aggregate):
workers 4× + push cadence 2× = ~3-5× commit rate (HF soft-cap kicks
in past ~120/hr/repo, so realistic ceiling ≈ 250-400 commits/hr aggregate).
Reverts the Round 11-12 over-correction now that:
(a) 4 Spaces fan-out instead of 1 monolith
(b) memory-guard.sh + auto-scaler kill workers if MemAvailable drops
(c) anchor never came up so Space pool is the only harvest tier
LOW_MEM=0 path (paid cpu-upgrade tier) also bumped: 4 stream → 6 stream
(1.5× if user ever upgrades a Space to 32GB).
- bin/anchor/cron-loop.sh +1 -1
- start.sh +25 -9
|
@@ -108,7 +108,7 @@ while true; do
|
|
| 108 |
}
|
| 109 |
[[ $((M % 30)) -eq 15 ]] && bash "${REPO}/bin/surrogate-research-apply.sh" >>"$LOG" 2>&1 &
|
| 110 |
[[ $((M % 360)) -eq 30 ]] && bash "${REPO}/bin/surrogate-research-loop.sh" >>"$LOG" 2>&1 &
|
| 111 |
-
[[ $((M %
|
| 112 |
[[ $((M % 15)) -eq 0 ]] && bash "${REPO}/bin/surrogate-self-ingest.sh" >>"$LOG" 2>&1 &
|
| 113 |
[[ $((M % 30)) -eq 12 ]] && bash "${REPO}/bin/rag-vector-builder.sh" >>"$LOG" 2>&1 &
|
| 114 |
[[ $((M % 30)) -eq 7 ]] && bash "${REPO}/bin/synthetic-data-from-rework.sh" >>"$LOG" 2>&1 &
|
|
|
|
| 108 |
}
|
| 109 |
[[ $((M % 30)) -eq 15 ]] && bash "${REPO}/bin/surrogate-research-apply.sh" >>"$LOG" 2>&1 &
|
| 110 |
[[ $((M % 360)) -eq 30 ]] && bash "${REPO}/bin/surrogate-research-loop.sh" >>"$LOG" 2>&1 &
|
| 111 |
+
[[ $((M % 30)) -eq 5 ]] && bash "${REPO}/bin/dataset-enrich.sh" >>"$LOG" 2>&1 &
|
| 112 |
[[ $((M % 15)) -eq 0 ]] && bash "${REPO}/bin/surrogate-self-ingest.sh" >>"$LOG" 2>&1 &
|
| 113 |
[[ $((M % 30)) -eq 12 ]] && bash "${REPO}/bin/rag-vector-builder.sh" >>"$LOG" 2>&1 &
|
| 114 |
[[ $((M % 30)) -eq 7 ]] && bash "${REPO}/bin/synthetic-data-from-rework.sh" >>"$LOG" 2>&1 &
|
|
@@ -362,13 +362,26 @@ python3 ~/.surrogate/bin/v2/bulk-mirror-coordinator.py seed >> "$LOG_DIR/bulk-mi
|
|
| 362 |
# Two worker types share the same coordinator queue:
|
| 363 |
# bulk-mirror-worker.sh — full-download, suits small/medium datasets
|
| 364 |
# streaming-mirror-worker.sh — HF datasets streaming, suits trillion-token
|
| 365 |
-
# LOW_MEM
|
| 366 |
-
# 0 bulk
|
| 367 |
-
#
|
| 368 |
-
#
|
| 369 |
-
#
|
| 370 |
-
|
| 371 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 372 |
|
| 373 |
for i in $(seq 1 "$BULK_WORKERS"); do
|
| 374 |
nohup bash ~/.surrogate/bin/v2/bulk-mirror-worker.sh "bulk-w$i" \
|
|
@@ -434,8 +447,11 @@ while true; do
|
|
| 434 |
[[ $((M % 60)) -eq 4 ]] && bash ~/.surrogate/bin/scrape-keyword-tuner.sh >> "$LOG" 2>&1 &
|
| 435 |
# Every 6 hours: research-loop (discover new features from competitors/papers)
|
| 436 |
[[ $((M % 360)) -eq 30 ]] && bash ~/.surrogate/bin/surrogate-research-loop.sh >> "$LOG" 2>&1 &
|
| 437 |
-
# Every
|
| 438 |
-
|
|
|
|
|
|
|
|
|
|
| 439 |
&& bash ~/.surrogate/bin/dataset-enrich.sh >> "$LOG" 2>&1 &
|
| 440 |
# Every 15 min: self-ingest training-pairs into FTS index (closes self-improvement)
|
| 441 |
[[ $((M % 15)) -eq 3 ]] && bash ~/.surrogate/bin/surrogate-self-ingest.sh >> "$LOG" 2>&1 &
|
|
|
|
| 362 |
# Two worker types share the same coordinator queue:
|
| 363 |
# bulk-mirror-worker.sh — full-download, suits small/medium datasets
|
| 364 |
# streaming-mirror-worker.sh — HF datasets streaming, suits trillion-token
|
| 365 |
+
# LOW_MEM tuning for cpu-basic 16GB Space (history):
|
| 366 |
+
# v1: 0 bulk + 2 stream (Round 9-10 OOM tightened to 0+2)
|
| 367 |
+
# v2: 0 bulk + 1 stream (Round 11-12 OOM further tightened)
|
| 368 |
+
# v3 NOW: 1 bulk + 3 stream (post Civo-pivot + 4-Space fan-out;
|
| 369 |
+
# anchor never came up so we can't rely on
|
| 370 |
+
# it for bulk, and 16GB has ~8 GB unused
|
| 371 |
+
# under the v2 setting → reclaim it)
|
| 372 |
+
#
|
| 373 |
+
# Memory budget per Space (16 GB cpu-basic):
|
| 374 |
+
# ~6 GB reserved: OS + redis 256mb + continuous-discoverer +
|
| 375 |
+
# dataset-enrich + auto-startup-loop + push bursts
|
| 376 |
+
# ~10 GB available for harvest workers
|
| 377 |
+
# 3 stream × 500 MB + 1 bulk × 600 MB = 2.1 GB used
|
| 378 |
+
# ~8 GB headroom → memory-guard.sh kicks in at <3 GB free, safe
|
| 379 |
+
#
|
| 380 |
+
# Throughput delta: 4× workers/Space × 4 Spaces = 16× total worker count
|
| 381 |
+
# (vs previous 1×4 = 4). Combined with enrich cron M%30==5 (was M%60),
|
| 382 |
+
# expect 3-5× commit rate before HF soft-cap kicks in.
|
| 383 |
+
BULK_WORKERS="${BULK_WORKERS:-$([[ "$LOW_MEM" == "1" ]] && echo 1 || echo 4)}"
|
| 384 |
+
STREAM_WORKERS="${STREAM_WORKERS:-$([[ "$LOW_MEM" == "1" ]] && echo 3 || echo 6)}"
|
| 385 |
|
| 386 |
for i in $(seq 1 "$BULK_WORKERS"); do
|
| 387 |
nohup bash ~/.surrogate/bin/v2/bulk-mirror-worker.sh "bulk-w$i" \
|
|
|
|
| 447 |
[[ $((M % 60)) -eq 4 ]] && bash ~/.surrogate/bin/scrape-keyword-tuner.sh >> "$LOG" 2>&1 &
|
| 448 |
# Every 6 hours: research-loop (discover new features from competitors/papers)
|
| 449 |
[[ $((M % 360)) -eq 30 ]] && bash ~/.surrogate/bin/surrogate-research-loop.sh >> "$LOG" 2>&1 &
|
| 450 |
+
# Every 30 min: dataset enrich (was 60 min — bumped 2× now that we have
|
| 451 |
+
# 4 Spaces × (3 stream + 1 bulk) = 16 workers harvesting in parallel,
|
| 452 |
+
# producing more chunks per hour than the old 60-min push could drain).
|
| 453 |
+
# Memory-guarded — full HF Hub iter is heavy.
|
| 454 |
+
[[ $((M % 30)) -eq 5 ]] && bash ~/.surrogate/bin/v2/memory-guard.sh \
|
| 455 |
&& bash ~/.surrogate/bin/dataset-enrich.sh >> "$LOG" 2>&1 &
|
| 456 |
# Every 15 min: self-ingest training-pairs into FTS index (closes self-improvement)
|
| 457 |
[[ $((M % 15)) -eq 3 ]] && bash ~/.surrogate/bin/surrogate-self-ingest.sh >> "$LOG" 2>&1 &
|