Spaces:
Runtime error
fix: 8 shards (was 4) + round-robin axentx projects + 3min cooldown
Browse filesUSER: 'ทำพร้อมกันทั้งหมด ต้องประสานกันได้ ต้องโตให้ไวที่สุด'
DAEMONS RUNNING IN PARALLEL (31 total, coordinated via central state):
- 8 dataset/source: bulk-ingest-parallel (4\u21928 shards now), discoverer,
GitHub crawler, agentic crawler, scrape-continuous, domain-scrape,
refresh-cve, scrape-sre-postmortems
- 7 dev/orchestrate: auto-orchestrate-continuous (4 workers), dev-loop,
research-apply, research-loop, daemon, work-queue, moa-consensus
- 3 index/RAG: self-ingest FTS5, rag-vector-builder, skill-synthesis
- 3 output/sync: push-training-to-hf, synthetic-data, daily-summary
- 4 infra: status-server, discord-bot, redis, ollama
- 4 helpers (called per orchestrate stage)
COORDINATION VIA CENTRAL STATE:
- dedup.db (every writer flows through)
- hf-dataset-frontier.db (discoverer)
- github-frontier.db (GH crawler)
- agentic-frontier.db (web crawler)
- self-ingest.db (FTS5 RAG)
- rag-vectors.db (vector RAG)
- training-pairs.jsonl (single sink)
CHANGES THIS PUSH:
1. bulk-ingest-parallel: 4 \u2192 8 shards (2\u00d7 throughput)
Cooldown 5min \u2192 3min (faster cycles)
Stagger startup 30s \u2192 15s (faster spinup)
2. auto-orchestrate-loop: round-robin projects (was random)
PROBLEM: 6 commits today all in arkashira/surrogate, 0 in 4 other repos
random.shuffle kept biased toward same repo
FIX: persistent cursor at ~/.surrogate/state/orchestrate-project-cursor
Each call: rotate through (Costinel \u2192 vanguard \u2192 arkship \u2192 surrogate \u2192
workio \u2192 hermes-toolbelt \u2192 cycle)
Result: 4 parallel workers \u00d7 round-robin = even coverage of 6 axentx repos
EXPECTED IMPACT:
- Day-1 ingest: 400K/h \u2192 800K/h (2\u00d7 shards)
- axentx commits: spread evenly across 6 repos
- Same hardware, better parallelism + balance
- bin/auto-orchestrate-loop.sh +15 -1
- bin/bulk-ingest-parallel.sh +4 -4
|
@@ -47,7 +47,21 @@ PROJECTS = [p for p in PROJECTS if (p/'.git').exists()]
|
|
| 47 |
if not PROJECTS:
|
| 48 |
print("{}"); exit()
|
| 49 |
|
| 50 |
-
random.shuffle
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
for proj in PROJECTS:
|
| 52 |
cmd = ['rg', '--no-heading', '-n', '-m', '5',
|
| 53 |
'--type', 'py', '--type', 'ts', '--type', 'go', '--type', 'sh',
|
|
|
|
| 47 |
if not PROJECTS:
|
| 48 |
print("{}"); exit()
|
| 49 |
|
| 50 |
+
# ROUND-ROBIN across projects (instead of random.shuffle which kept hitting same repo).
|
| 51 |
+
# Persistent counter at ~/.surrogate/state/orchestrate-project-cursor — increments each run.
|
| 52 |
+
# Result: every 6 runs covers all 6 axentx repos evenly.
|
| 53 |
+
cursor_file = Path.home() / '.surrogate/state/orchestrate-project-cursor'
|
| 54 |
+
cursor_file.parent.mkdir(parents=True, exist_ok=True)
|
| 55 |
+
try:
|
| 56 |
+
cursor = int(cursor_file.read_text().strip())
|
| 57 |
+
except Exception:
|
| 58 |
+
cursor = 0
|
| 59 |
+
PROJECTS = sorted(PROJECTS, key=lambda p: p.name) # stable order
|
| 60 |
+
cursor = cursor % len(PROJECTS)
|
| 61 |
+
# Rotate so current cursor's project is first
|
| 62 |
+
PROJECTS = PROJECTS[cursor:] + PROJECTS[:cursor]
|
| 63 |
+
cursor_file.write_text(str((cursor + 1) % len(PROJECTS)))
|
| 64 |
+
|
| 65 |
for proj in PROJECTS:
|
| 66 |
cmd = ['rg', '--no-heading', '-n', '-m', '5',
|
| 67 |
'--type', 'py', '--type', 'ts', '--type', 'go', '--type', 'sh',
|
|
@@ -10,8 +10,8 @@ set -a; source "$HOME/.hermes/.env" 2>/dev/null; set +a
|
|
| 10 |
LOG="$HOME/.surrogate/logs/bulk-ingest-parallel.log"
|
| 11 |
mkdir -p "$(dirname "$LOG")"
|
| 12 |
|
| 13 |
-
NUM_SHARDS="${INGEST_SHARDS:-
|
| 14 |
-
SHARD_COOLDOWN="${SHARD_COOLDOWN:-
|
| 15 |
|
| 16 |
echo "[$(date +%H:%M:%S)] bulk-ingest-parallel start (shards=$NUM_SHARDS)" | tee -a "$LOG"
|
| 17 |
|
|
@@ -28,9 +28,9 @@ shard_loop() {
|
|
| 28 |
done
|
| 29 |
}
|
| 30 |
|
| 31 |
-
# Stagger startup
|
| 32 |
for i in $(seq 0 $((NUM_SHARDS - 1))); do
|
| 33 |
shard_loop "$i" "$NUM_SHARDS" &
|
| 34 |
-
sleep
|
| 35 |
done
|
| 36 |
wait
|
|
|
|
| 10 |
LOG="$HOME/.surrogate/logs/bulk-ingest-parallel.log"
|
| 11 |
mkdir -p "$(dirname "$LOG")"
|
| 12 |
|
| 13 |
+
NUM_SHARDS="${INGEST_SHARDS:-8}"
|
| 14 |
+
SHARD_COOLDOWN="${SHARD_COOLDOWN:-180}" # 3 min between shard cycles (was 5)
|
| 15 |
|
| 16 |
echo "[$(date +%H:%M:%S)] bulk-ingest-parallel start (shards=$NUM_SHARDS)" | tee -a "$LOG"
|
| 17 |
|
|
|
|
| 28 |
done
|
| 29 |
}
|
| 30 |
|
| 31 |
+
# Stagger startup 15s apart (was 30s) to spin up faster
|
| 32 |
for i in $(seq 0 $((NUM_SHARDS - 1))); do
|
| 33 |
shard_loop "$i" "$NUM_SHARDS" &
|
| 34 |
+
sleep 15
|
| 35 |
done
|
| 36 |
wait
|