ashirato commited on
Commit
ce077ec
Β·
1 Parent(s): 154f078

start.sh: LOW_MEM=1 short-circuits to status server only (kill all bg daemons)

Browse files

The earlier patch (a5c37dd) gated only the 5 boot-time harvest launchers
behind LOW_MEM. Watchdog history showed Spaces stayed green for ~25 min
after that fix (#12-#16 all 6/6) but went HTTP-hung again at sweep #17.
Investigation found 15+ MORE nohup'd daemons below the gated section that
collectively still walk the Space into the 16 GB CPU-Basic cap within an
hour even with the harvest launchers off:

scrape-daemon, agentic-crawler, github-agentic-crawler, self-heal-watchdog,
gh-actions-ticker, llm-burst-generator, bulk-ingest-parallel, parquet-direct-
ingest, skill-synthesis-daemon, bulk-mirror-worker (Γ—N), streaming-mirror-
worker (Γ—N), continuous-discoverer, plus the hermes-cron.sh while-loop
itself which spawns regression-test, abstract-cot-compressor, etc.

This patch adds an early-return right after the .env write: when LOW_MEM=1
(default on CPU-Basic), exec the status server immediately and skip every
background process below. The Space's only responsibility on free tier is
to serve /cursor/* advance to harvest workers; everything that USED to be
launched here is now scheduled on GCP via hermes-jobs.json (171 jobs as of
this commit).

Re-enable in-Space mode by setting LOW_MEM=0 once the Space is on a paid
tier (cpu-upgrade β‰₯ 32 GB) or migrated to a larger anchor.

Files changed (1) hide show
  1. start.sh +28 -0
start.sh CHANGED
@@ -151,6 +151,34 @@ chmod 600 ~/.hermes/.env
151
  echo "[$(date +%H:%M:%S)] .env written ($(wc -l < ~/.hermes/.env) keys, perms 600)"
152
  # Trace OFF for the rest of boot β€” we already have line numbers above and won't need them post-secrets.
153
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
154
  # ── 3. Git config + clone axentx repos for auto-orchestrate auto-commit ────
155
  # Disable interactive prompts globally so failed-auth git ops fail fast.
156
  export GIT_TERMINAL_PROMPT=0
 
151
  echo "[$(date +%H:%M:%S)] .env written ($(wc -l < ~/.hermes/.env) keys, perms 600)"
152
  # Trace OFF for the rest of boot β€” we already have line numbers above and won't need them post-secrets.
153
 
154
+ # ── LOW_MEM short-circuit β€” skip ALL background daemons, exec status server ──
155
+ # CPU-Basic Space cap is 16 GB. Even after gating the 5 boot-time harvest
156
+ # launchers, the Space kept hitting 16 GB cap and going hung at HTTP layer
157
+ # every ~30-40 min. Investigation found 15+ MORE nohup'd background daemons
158
+ # below this point (scrape, agentic-crawler, github-crawler, self-heal, cron
159
+ # loop, bulk-mirror workers, streaming-mirror workers, parquet-ingest, etc.)
160
+ # that collectively grow into the cap within an hour.
161
+ #
162
+ # In LOW_MEM=1 mode the Space's only job is the FastAPI status server on
163
+ # :7860 that serves harvest cursor advance to remote workers. Everything
164
+ # else (harvest, mirroring, agent pipeline, training pushes, dataset enrich)
165
+ # now runs on the GCP daemon fleet β€” see hermes-jobs.json (171 jobs scheduled
166
+ # via hermes-scheduler-daemon as of 2026-05-02).
167
+ #
168
+ # Set LOW_MEM=0 to re-enable in-Space launchers when on a paid tier (β‰₯32GB).
169
+ if [[ "$LOW_MEM" == "1" ]]; then
170
+ echo "[$(date +%H:%M:%S)] LOW_MEM=1 β†’ skipping all bg daemons + cron, going straight to :7860 status server" | tee -a "$LOG_DIR/boot.log"
171
+ set +x # silence trace
172
+ # Verify deps before exec β€” print what's missing rather than silent crash
173
+ if python3 -c "import fastapi, uvicorn" 2>/dev/null; then
174
+ echo "[$(date +%H:%M:%S)] starting uvicorn :7860 (LOW_MEM fast-path)" | tee -a "$LOG_DIR/boot.log"
175
+ exec python3 ~/.surrogate/bin/hermes-status-server.py
176
+ else
177
+ echo "❌ fastapi/uvicorn not importable β€” falling back to plain http.server"
178
+ exec python3 -m http.server 7860 --bind 0.0.0.0
179
+ fi
180
+ fi
181
+
182
  # ── 3. Git config + clone axentx repos for auto-orchestrate auto-commit ────
183
  # Disable interactive prompts globally so failed-auth git ops fail fast.
184
  export GIT_TERMINAL_PROMPT=0