Somrat Sorkar commited on
Commit
7e83c86
Β·
unverified Β·
2 Parent(s): cf780c9d567052

Merge pull request #14 from anurag162008/main

Browse files

feat: self-healing gateway, zero-loss config sync, and backup-safe startup

Files changed (6) hide show
  1. .env.example +10 -1
  2. CHANGELOG.md +38 -0
  3. README.md +5 -3
  4. multi-provider-key-rotator.cjs +21 -6
  5. openclaw-sync.py +185 -21
  6. start.sh +127 -55
.env.example CHANGED
@@ -132,7 +132,10 @@ LLM_MODEL=anthropic/claude-sonnet-4-5
132
  # have multiple accounts or want to spread rate-limit quota.
133
  #
134
  # Pattern: <PROVIDER>_API_KEYS=key1,key2,key3
135
- # Fallback order: plural pool β†’ singular key β†’ LLM_API_KEY
 
 
 
136
  #
137
  # Uncomment and fill in only the providers you use:
138
  #
@@ -285,6 +288,12 @@ KEEP_ALIVE_INTERVAL=300
285
  # Workspace auto-sync interval (seconds). Default: 180.
286
  SYNC_INTERVAL=180
287
 
 
 
 
 
 
 
288
  # Webhooks: Standard POST notifications for lifecycle events
289
  # WEBHOOK_URL=https://your-webhook-endpoint.com/log
290
 
 
132
  # have multiple accounts or want to spread rate-limit quota.
133
  #
134
  # Pattern: <PROVIDER>_API_KEYS=key1,key2,key3
135
+ # Fallback order: plural pool β†’ singular key β†’ LLM_API_KEY (optional)
136
+ # Set false only if you want to disable global LLM_API_KEY fallback
137
+ # across providers.
138
+ LLM_API_KEY_FALLBACK_ENABLED=true
139
  #
140
  # Uncomment and fill in only the providers you use:
141
  #
 
288
  # Workspace auto-sync interval (seconds). Default: 180.
289
  SYNC_INTERVAL=180
290
 
291
+ # Check openclaw.json for settings changes this often (seconds). Default: 1.
292
+ OPENCLAW_CONFIG_WATCH_INTERVAL=1
293
+
294
+ # Wait for openclaw.json to stay valid and unchanged before syncing. Default: 3.
295
+ OPENCLAW_CONFIG_SETTLE_SECONDS=3
296
+
297
  # Webhooks: Standard POST notifications for lifecycle events
298
  # WEBHOOK_URL=https://your-webhook-endpoint.com/log
299
 
CHANGELOG.md CHANGED
@@ -2,6 +2,44 @@
2
 
3
  All notable changes to this project will be documented in this file.
4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ## [1.4.0] - 2026-04-25
6
 
7
  ### Added
 
2
 
3
  All notable changes to this project will be documented in this file.
4
 
5
+ ## [1.5.0] - 2026-05-13
6
+
7
+ ### Added
8
+
9
+ - **NVIDIA key-rotation support** β€” added `nvidia-key-rotator.cjs` wiring and startup integration so deployments can rotate NVIDIA credentials similarly to other provider key-rotation flows.
10
+ - **Cloudflare keep-alive automation** β€” added/expanded `cloudflare-keepalive-setup.py` flow and startup wiring to provision keep-alive through Cloudflare Worker automation instead of the older UptimeRobot-first approach.
11
+ - **Sync metadata marker model** β€” introduced a structured workspace marker `(file_count, total_size, newest_mtime, metadata_hash)` to support stronger change introspection in sync code.
12
+
13
+ ### Changed
14
+
15
+ - **Workspace sync script rename finalized** β€” `workspace-sync.py` flow was migrated to `openclaw-sync.py` in Docker/startup/docs so restore/sync behavior is centralized under one script.
16
+ - **Sync trigger behavior hardened for config churn** β€” OpenClaw config sync now debounces until JSON settles before immediate sync, reducing false/partial syncs during rapid config writes.
17
+ - **Gateway restart flow now saves state first** β€” restart path was updated to run a pre-restart one-shot state sync so gateway reloads are less likely to drop recent state.
18
+ - **Shutdown backup now uses a two-step pass** β€” graceful shutdown now attempts `sync-once-settled` then a final `sync-once` pass to better capture last-second writes.
19
+ - **Telegram allowlist simplified** β€” consolidated Telegram allowlist into `TELEGRAM_ALLOWED_USERS` and aligned docs/examples.
20
+ - **Plugin startup behavior aligned** β€” startup-installed plugins are synced into `plugins.allow` before gateway launch so runtime-installed plugins are recognized cleanly.
21
+ - **Cloudflare proxy path matured** β€” multiple iterations improved fetch/proxy behavior (header handling, endpoint scoping, API root routing, URL parsing, and logging noise reduction), then simplified unstable undici patching paths.
22
+ - **Health dashboard polish** β€” sync timestamps now show local time, footer credits were corrected, and status rendering/docs were updated for the Cloudflare keep-alive model.
23
+ - **CI workflow churn documented** β€” GitHub workflow files for HF sync were added/renamed/cleaned multiple times as space/repo naming stabilized.
24
+
25
+ ### Fixed
26
+
27
+ - **Missed rapid backup updates** β€” sync logic now relies on content fingerprint checks for no-op decisions so same-second or quick successive changes are less likely to be skipped.
28
+ - **Non-deterministic metadata hashing** β€” metadata hashing now iterates paths deterministically to avoid hash jitter from traversal order.
29
+ - **Transient file race sync failures** β€” sync fingerprinting/snapshot copy paths now tolerate transient `OSError` (file rotated/deleted mid-scan) instead of aborting the whole sync pass.
30
+ - **State restore migration edge cases** β€” restore flow includes migration/cleanup behavior for legacy hidden state paths and stale backup entries.
31
+ - **Startup/env robustness** β€” fixed shell export formatting/syntax issues (e.g., NVIDIA/XAI lines) and unbound-variable pitfalls in startup scripts.
32
+ - **Proxy runtime errors and noise** β€” fixed specific proxy runtime issues (including `UND_ERR_INVALID_ARG`, fetch duplex handling, and upstream error visibility) and reduced noisy stdout logs that interfered with clean process output.
33
+ - **HF workflow/repo reference mismatches** β€” corrected and later cleaned workflow repository references during repo migration/restructure.
34
+
35
+ ### Docs
36
+
37
+ - README/.env/security docs were refreshed across multiple commits to reflect:
38
+ - Cloudflare keep-alive replacing UptimeRobot setup path,
39
+ - updated secrets and startup environment behavior,
40
+ - provider/key-rotation options,
41
+ - backup/sync behavior and troubleshooting guidance.
42
+
43
  ## [1.4.0] - 2026-04-25
44
 
45
  ### Added
README.md CHANGED
@@ -162,7 +162,9 @@ HuggingClaw automatically syncs your workspace (chats, settings, sessions) to a
162
  | Variable | Default | Description |
163
  | :--- | :--- | :--- |
164
  | `HF_TOKEN` | β€” | HF token with **Write** access |
165
- | `SYNC_INTERVAL` | `180` | Backup frequency in seconds |
 
 
166
 
167
  ## πŸ“¦ Ephemeral Package Re-install *(Optional)*
168
 
@@ -248,10 +250,10 @@ GEMINI_API_KEYS=AIza-key1,AIza-key2
248
  **Fallback chain** (per provider):
249
  1. `{PROVIDER}_API_KEYS` β€” comma-separated pool *(preferred)*
250
  2. `{PROVIDER}_API_KEY` β€” single dedicated key
251
- 3. `LLM_API_KEY` β€” universal fallback *(default for all providers)*
252
 
253
  > [!TIP]
254
- > If you only set `LLM_API_KEY`, all providers use it as a fallback automatically β€” no extra config needed. Add per-provider pools only when you need multi-key rotation.
255
 
256
  Supported per-provider variables: `ANTHROPIC_API_KEYS`, `OPENAI_API_KEYS`, `GEMINI_API_KEYS`, `DEEPSEEK_API_KEYS`, `GROQ_API_KEYS`, `MISTRAL_API_KEYS`, `OPENROUTER_API_KEYS`, `XAI_API_KEYS`, `NVIDIA_API_KEYS`, `COHERE_API_KEYS`, `TOGETHER_API_KEYS`, `CEREBRAS_API_KEYS`, and more β€” see `.env.example` for the full list.
257
 
 
162
  | Variable | Default | Description |
163
  | :--- | :--- | :--- |
164
  | `HF_TOKEN` | β€” | HF token with **Write** access |
165
+ | `SYNC_INTERVAL` | `180` | Full backup frequency in seconds |
166
+ | `OPENCLAW_CONFIG_WATCH_INTERVAL` | `1` | How often to check `openclaw.json` for immediate settings sync |
167
+ | `OPENCLAW_CONFIG_SETTLE_SECONDS` | `3` | How long `openclaw.json` must stay valid and unchanged before syncing |
168
 
169
  ## πŸ“¦ Ephemeral Package Re-install *(Optional)*
170
 
 
250
  **Fallback chain** (per provider):
251
  1. `{PROVIDER}_API_KEYS` β€” comma-separated pool *(preferred)*
252
  2. `{PROVIDER}_API_KEY` β€” single dedicated key
253
+ 3. `LLM_API_KEY` β€” universal fallback *(enabled by default; disable with `LLM_API_KEY_FALLBACK_ENABLED=false`)*
254
 
255
  > [!TIP]
256
+ > By default, `LLM_API_KEY` fallback is enabled for compatibility. Set `LLM_API_KEY_FALLBACK_ENABLED=false` if you want strict provider-only activation.
257
 
258
  Supported per-provider variables: `ANTHROPIC_API_KEYS`, `OPENAI_API_KEYS`, `GEMINI_API_KEYS`, `DEEPSEEK_API_KEYS`, `GROQ_API_KEYS`, `MISTRAL_API_KEYS`, `OPENROUTER_API_KEYS`, `XAI_API_KEYS`, `NVIDIA_API_KEYS`, `COHERE_API_KEYS`, `TOGETHER_API_KEYS`, `CEREBRAS_API_KEYS`, and more β€” see `.env.example` for the full list.
259
 
multi-provider-key-rotator.cjs CHANGED
@@ -8,7 +8,7 @@
8
  *
9
  * For each provider you can supply a comma-separated pool:
10
  * ANTHROPIC_API_KEYS=key1,key2,key3
11
- * Falls back to the singular env var, then to LLM_API_KEY.
12
  *
13
  * Keys are rotated round-robin per provider independently.
14
  *
@@ -19,13 +19,20 @@
19
  const http = require('node:http');
20
  const https = require('node:https');
21
 
 
 
 
 
 
22
  // ─── Provider definitions ────────────────────────────────────────────────────
23
  //
24
  // hostname – regex tested against the request hostname (case-insensitive)
25
  // envPlural – env var that holds a comma-separated key pool (preferred)
26
  // envSingular – env var that holds a single key (fallback)
27
  //
28
- // LLM_API_KEY is the final fallback for every provider.
 
 
29
  //
30
  const PROVIDERS = [
31
  {
@@ -175,6 +182,8 @@ function normalizeKeys(...inputs) {
175
 
176
  // Build per-provider key pools + rotation indices
177
  const providerState = PROVIDERS.map(p => {
 
 
178
  const dedicatedKeys = normalizeKeys(
179
  process.env[p.envPlural] || '',
180
  process.env[p.envSingular] || '',
@@ -182,10 +191,14 @@ const providerState = PROVIDERS.map(p => {
182
  const hasDedicated = dedicatedKeys.length > 0;
183
  const keys = hasDedicated
184
  ? dedicatedKeys
185
- : normalizeKeys(process.env.LLM_API_KEY || '');
 
 
 
 
186
 
187
  if (hasDedicated) {
188
- console.log(`[key-rotator] ${p.name}: ${keys.length} key${keys.length === 1 ? '' : 's'}`);
189
  } else if (!keys.length) {
190
  console.warn(`[key-rotator] No keys for provider "${p.name}"`);
191
  }
@@ -202,7 +215,9 @@ const fallbackCount = providerState.filter(p => {
202
  return dedicated.length === 0 && p.keys.length > 0;
203
  }).length;
204
  if (fallbackCount > 0) {
205
- console.log(`[key-rotator] ${fallbackCount} provider(s) using LLM_API_KEY fallback`);
 
 
206
  }
207
 
208
  // ─── Runtime helpers ─────────────────────────────────────────────────────────
@@ -332,4 +347,4 @@ patchFetch();
332
  patchHttpModule(http);
333
  patchHttpModule(https);
334
 
335
- console.log('[key-rotator] loaded β€” all providers active');
 
8
  *
9
  * For each provider you can supply a comma-separated pool:
10
  * ANTHROPIC_API_KEYS=key1,key2,key3
11
+ * Falls back to the singular env var, and optionally to LLM_API_KEY.
12
  *
13
  * Keys are rotated round-robin per provider independently.
14
  *
 
19
  const http = require('node:http');
20
  const https = require('node:https');
21
 
22
+ // This file is preloaded through NODE_OPTIONS, so it also runs inside npm and
23
+ // OpenClaw helper subprocesses that may emit machine-readable JSON on stdout.
24
+ // Keep rotator diagnostics on stderr to avoid corrupting those stdout streams.
25
+ const log = (...args) => console.error(...args);
26
+
27
  // ─── Provider definitions ────────────────────────────────────────────────────
28
  //
29
  // hostname – regex tested against the request hostname (case-insensitive)
30
  // envPlural – env var that holds a comma-separated key pool (preferred)
31
  // envSingular – env var that holds a single key (fallback)
32
  //
33
+ // LLM_API_KEY fallback can be controlled via:
34
+ // LLM_API_KEY_FALLBACK_ENABLED=true|false
35
+ // Default is enabled for backwards compatibility.
36
  //
37
  const PROVIDERS = [
38
  {
 
182
 
183
  // Build per-provider key pools + rotation indices
184
  const providerState = PROVIDERS.map(p => {
185
+ const llmFallbackRaw = String(process.env.LLM_API_KEY_FALLBACK_ENABLED || '').trim().toLowerCase();
186
+ const llmFallbackEnabled = !/^(0|false|no|off)$/.test(llmFallbackRaw);
187
  const dedicatedKeys = normalizeKeys(
188
  process.env[p.envPlural] || '',
189
  process.env[p.envSingular] || '',
 
191
  const hasDedicated = dedicatedKeys.length > 0;
192
  const keys = hasDedicated
193
  ? dedicatedKeys
194
+ : (
195
+ llmFallbackEnabled
196
+ ? normalizeKeys(process.env.LLM_API_KEY || '')
197
+ : []
198
+ );
199
 
200
  if (hasDedicated) {
201
+ log(`[key-rotator] ${p.name}: ${keys.length} key${keys.length === 1 ? '' : 's'}`);
202
  } else if (!keys.length) {
203
  console.warn(`[key-rotator] No keys for provider "${p.name}"`);
204
  }
 
215
  return dedicated.length === 0 && p.keys.length > 0;
216
  }).length;
217
  if (fallbackCount > 0) {
218
+ log(`[key-rotator] ${fallbackCount} provider(s) using LLM_API_KEY fallback`);
219
+ } else if (process.env.LLM_API_KEY && /^(0|false|no|off)$/i.test(String(process.env.LLM_API_KEY_FALLBACK_ENABLED || ''))) {
220
+ log('[key-rotator] LLM_API_KEY fallback disabled (set LLM_API_KEY_FALLBACK_ENABLED=true to re-enable)');
221
  }
222
 
223
  // ─── Runtime helpers ─────────────────────────────────────────────────────────
 
347
  patchHttpModule(http);
348
  patchHttpModule(https);
349
 
350
+ log('[key-rotator] loaded β€” all providers active');
openclaw-sync.py CHANGED
@@ -7,6 +7,7 @@ credentials inside a private HF dataset without embedding HF tokens in git
7
  remotes or requiring a manual HF_USERNAME secret.
8
  """
9
 
 
10
  import hashlib
11
  import json
12
  import logging
@@ -17,6 +18,7 @@ import sys
17
  import tempfile
18
  import threading
19
  import time
 
20
  from pathlib import Path
21
 
22
  os.environ.setdefault("HF_HUB_DISABLE_PROGRESS_BARS", "1")
@@ -34,10 +36,20 @@ from huggingface_hub.errors import HfHubHTTPError, RepositoryNotFoundError
34
  logging.getLogger("huggingface_hub").setLevel(logging.ERROR)
35
 
36
  OPENCLAW_HOME = Path("/home/node/.openclaw")
 
37
  WORKSPACE = OPENCLAW_HOME / "workspace"
38
  STATUS_FILE = Path("/tmp/sync-status.json")
 
39
  INTERVAL = int(os.environ.get("SYNC_INTERVAL", "180"))
40
  INITIAL_DELAY = int(os.environ.get("SYNC_START_DELAY", "10"))
 
 
 
 
 
 
 
 
41
  HF_TOKEN = os.environ.get("HF_TOKEN", "").strip()
42
  HF_USERNAME = os.environ.get("HF_USERNAME", "").strip()
43
  SPACE_AUTHOR_NAME = os.environ.get("SPACE_AUTHOR_NAME", "").strip()
@@ -58,6 +70,7 @@ EXCLUDED_STATE_NAMES = {
58
  "openclaw-app",
59
  "gateway.log",
60
  "browser",
 
61
  }
62
  WHATSAPP_CREDS_DIR = OPENCLAW_HOME / "credentials" / "whatsapp" / "default"
63
  WHATSAPP_BACKUP_DIR = STATE_DIR / "credentials" / "whatsapp" / "default"
@@ -65,6 +78,7 @@ RESET_MARKER = WORKSPACE / ".reset_credentials"
65
  HF_API = HfApi(token=HF_TOKEN) if HF_TOKEN else None
66
  STOP_EVENT = threading.Event()
67
  _REPO_ID_CACHE: str | None = None
 
68
 
69
 
70
  def write_status(status: str, message: str) -> None:
@@ -78,6 +92,13 @@ def write_status(status: str, message: str) -> None:
78
  tmp_path.replace(STATUS_FILE)
79
 
80
 
 
 
 
 
 
 
 
81
  def count_files(path: Path) -> int:
82
  if not path.exists():
83
  return 0
@@ -250,14 +271,27 @@ def _should_exclude(rel_posix: str, path: Path) -> bool:
250
  return False
251
 
252
 
253
- def metadata_marker(root: Path) -> tuple[int, int, int]:
254
- if not root.exists():
 
 
 
 
 
255
  return (0, 0, 0)
256
 
 
 
 
 
 
 
 
257
  file_count = 0
258
  total_size = 0
259
  newest_mtime = 0
260
- for path in root.rglob("*"):
 
261
  if not path.is_file():
262
  continue
263
  rel = path.relative_to(root).as_posix()
@@ -268,9 +302,17 @@ def metadata_marker(root: Path) -> tuple[int, int, int]:
268
  except OSError:
269
  continue
270
  file_count += 1
271
- total_size += int(stat.st_size)
272
- newest_mtime = max(newest_mtime, int(stat.st_mtime_ns))
273
- return (file_count, total_size, newest_mtime)
 
 
 
 
 
 
 
 
274
 
275
 
276
  def fingerprint_dir(root: Path) -> str:
@@ -283,9 +325,16 @@ def fingerprint_dir(root: Path) -> str:
283
  if _should_exclude(rel, path):
284
  continue
285
  hasher.update(rel.encode("utf-8"))
286
- with path.open("rb") as handle:
287
- for chunk in iter(lambda: handle.read(1024 * 1024), b""):
288
- hasher.update(chunk)
 
 
 
 
 
 
 
289
  return hasher.hexdigest()
290
 
291
 
@@ -301,7 +350,13 @@ def create_snapshot_dir(source_root: Path) -> Path:
301
  target.mkdir(parents=True, exist_ok=True)
302
  continue
303
  target.parent.mkdir(parents=True, exist_ok=True)
304
- shutil.copy2(path, target)
 
 
 
 
 
 
305
  return staging_root
306
 
307
 
@@ -364,18 +419,17 @@ def restore_workspace() -> bool:
364
  return False
365
 
366
 
367
- def sync_once(
368
  last_fingerprint: str | None = None,
369
- last_marker: tuple[int, int, int] | None = None,
370
- ) -> tuple[str, tuple[int, int, int]]:
371
  if not HF_TOKEN:
372
  write_status("disabled", "HF_TOKEN is not configured.")
373
- return (last_fingerprint or "", last_marker or (0, 0, 0))
374
 
375
  snapshot_state_into_workspace()
376
  repo_id = ensure_repo_exists()
377
  current_marker = metadata_marker(WORKSPACE)
378
-
379
  if last_marker is not None and current_marker == last_marker:
380
  write_status("synced", "No workspace changes detected.")
381
  return (last_fingerprint or "", current_marker)
@@ -412,14 +466,81 @@ def sync_once(
412
  return (current_fingerprint, current_marker)
413
 
414
 
 
 
 
 
 
 
 
 
 
 
 
 
 
415
  def handle_signal(_sig, _frame) -> None:
416
  STOP_EVENT.set()
417
 
418
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
419
  def loop() -> int:
420
  signal.signal(signal.SIGTERM, handle_signal)
421
  signal.signal(signal.SIGINT, handle_signal)
422
 
 
 
423
  try:
424
  repo_id = resolve_backup_namespace()
425
  write_status("configured", f"Backup loop active for {repo_id} with {INTERVAL}s interval.")
@@ -431,24 +552,56 @@ def loop() -> int:
431
  time.sleep(INITIAL_DELAY)
432
  print(f"Workspace sync started: every {INTERVAL}s -> {repo_id}")
433
 
434
- # Take a fingerprint of the workspace AS RESTORED (after snapshotting state)
435
- # so the first loop iteration only uploads if something genuinely changed.
436
- # Previously this was None, which forced an unconditional upload every restart
437
- # β€” even when restore had failed silently and the workspace was empty.
 
 
 
 
 
438
  snapshot_state_into_workspace()
439
  last_fingerprint = fingerprint_dir(WORKSPACE)
440
  last_marker = metadata_marker(WORKSPACE)
441
- print("Initial workspace fingerprint captured.")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
442
 
443
  while not STOP_EVENT.is_set():
444
  try:
 
445
  last_fingerprint, last_marker = sync_once(last_fingerprint, last_marker)
 
 
 
 
 
 
 
 
446
  except Exception as exc:
447
  write_status("error", f"Sync failed: {exc}")
448
  print(f"Workspace sync failed: {exc}")
 
449
 
450
- if STOP_EVENT.wait(INTERVAL):
 
451
  break
 
 
452
 
453
  return 0
454
 
@@ -469,6 +622,17 @@ def main() -> int:
469
  write_status("error", f"Shutdown sync failed: {exc}")
470
  print(f"Workspace sync: shutdown sync failed: {exc}")
471
  return 1
 
 
 
 
 
 
 
 
 
 
 
472
  if command == "loop":
473
  return loop()
474
 
 
7
  remotes or requiring a manual HF_USERNAME secret.
8
  """
9
 
10
+ import fcntl
11
  import hashlib
12
  import json
13
  import logging
 
18
  import tempfile
19
  import threading
20
  import time
21
+ from typing import TypeAlias
22
  from pathlib import Path
23
 
24
  os.environ.setdefault("HF_HUB_DISABLE_PROGRESS_BARS", "1")
 
36
  logging.getLogger("huggingface_hub").setLevel(logging.ERROR)
37
 
38
  OPENCLAW_HOME = Path("/home/node/.openclaw")
39
+ OPENCLAW_CONFIG_FILE = OPENCLAW_HOME / "openclaw.json"
40
  WORKSPACE = OPENCLAW_HOME / "workspace"
41
  STATUS_FILE = Path("/tmp/sync-status.json")
42
+ SYNC_LOCK_FILE = Path("/tmp/huggingclaw-sync.lock")
43
  INTERVAL = int(os.environ.get("SYNC_INTERVAL", "180"))
44
  INITIAL_DELAY = int(os.environ.get("SYNC_START_DELAY", "10"))
45
+ CONFIG_WATCH_INTERVAL = max(
46
+ 0.5,
47
+ float(os.environ.get("OPENCLAW_CONFIG_WATCH_INTERVAL", "1")),
48
+ )
49
+ CONFIG_SETTLE_SECONDS = max(
50
+ 0.0,
51
+ float(os.environ.get("OPENCLAW_CONFIG_SETTLE_SECONDS", "3")),
52
+ )
53
  HF_TOKEN = os.environ.get("HF_TOKEN", "").strip()
54
  HF_USERNAME = os.environ.get("HF_USERNAME", "").strip()
55
  SPACE_AUTHOR_NAME = os.environ.get("SPACE_AUTHOR_NAME", "").strip()
 
70
  "openclaw-app",
71
  "gateway.log",
72
  "browser",
73
+ "npm",
74
  }
75
  WHATSAPP_CREDS_DIR = OPENCLAW_HOME / "credentials" / "whatsapp" / "default"
76
  WHATSAPP_BACKUP_DIR = STATE_DIR / "credentials" / "whatsapp" / "default"
 
78
  HF_API = HfApi(token=HF_TOKEN) if HF_TOKEN else None
79
  STOP_EVENT = threading.Event()
80
  _REPO_ID_CACHE: str | None = None
81
+ WorkspaceMarker: TypeAlias = tuple[int, int, int, str]
82
 
83
 
84
  def write_status(status: str, message: str) -> None:
 
92
  tmp_path.replace(STATUS_FILE)
93
 
94
 
95
+ def read_status() -> dict[str, str]:
96
+ try:
97
+ return json.loads(STATUS_FILE.read_text(encoding="utf-8"))
98
+ except Exception:
99
+ return {}
100
+
101
+
102
  def count_files(path: Path) -> int:
103
  if not path.exists():
104
  return 0
 
271
  return False
272
 
273
 
274
+ def file_marker(path: Path) -> tuple[int, int, int]:
275
+ try:
276
+ stat = path.stat()
277
+ except OSError:
278
+ return (0, 0, 0)
279
+
280
+ if not path.is_file():
281
  return (0, 0, 0)
282
 
283
+ return (1, int(stat.st_size), int(stat.st_mtime_ns))
284
+
285
+
286
+ def metadata_marker(root: Path) -> WorkspaceMarker:
287
+ if not root.exists():
288
+ return (0, 0, 0, "")
289
+
290
  file_count = 0
291
  total_size = 0
292
  newest_mtime = 0
293
+ metadata_hasher = hashlib.sha256()
294
+ for path in sorted(root.rglob("*")):
295
  if not path.is_file():
296
  continue
297
  rel = path.relative_to(root).as_posix()
 
302
  except OSError:
303
  continue
304
  file_count += 1
305
+ size = int(stat.st_size)
306
+ mtime_ns = int(stat.st_mtime_ns)
307
+ total_size += size
308
+ newest_mtime = max(newest_mtime, mtime_ns)
309
+ metadata_hasher.update(rel.encode("utf-8"))
310
+ metadata_hasher.update(b"\0")
311
+ metadata_hasher.update(str(size).encode("ascii"))
312
+ metadata_hasher.update(b"\0")
313
+ metadata_hasher.update(str(mtime_ns).encode("ascii"))
314
+ metadata_hasher.update(b"\0")
315
+ return (file_count, total_size, newest_mtime, metadata_hasher.hexdigest())
316
 
317
 
318
  def fingerprint_dir(root: Path) -> str:
 
325
  if _should_exclude(rel, path):
326
  continue
327
  hasher.update(rel.encode("utf-8"))
328
+ try:
329
+ with path.open("rb") as handle:
330
+ for chunk in iter(lambda: handle.read(1024 * 1024), b""):
331
+ hasher.update(chunk)
332
+ except (FileNotFoundError, IsADirectoryError, NotADirectoryError):
333
+ # Fingerprint must represent a complete view of the workspace.
334
+ # Retry next sync pass instead of silently hashing a partial tree.
335
+ raise RuntimeError(
336
+ f"Workspace changed while hashing {rel}; retrying next sync pass."
337
+ )
338
  return hasher.hexdigest()
339
 
340
 
 
350
  target.mkdir(parents=True, exist_ok=True)
351
  continue
352
  target.parent.mkdir(parents=True, exist_ok=True)
353
+ try:
354
+ shutil.copy2(path, target)
355
+ except (FileNotFoundError, IsADirectoryError, NotADirectoryError):
356
+ # Do not upload a partial snapshot; let caller retry on next loop.
357
+ raise RuntimeError(
358
+ f"Snapshot changed while copying {rel_posix}; retrying next sync pass."
359
+ )
360
  return staging_root
361
 
362
 
 
419
  return False
420
 
421
 
422
+ def _sync_once_unlocked(
423
  last_fingerprint: str | None = None,
424
+ last_marker: WorkspaceMarker | None = None,
425
+ ) -> tuple[str, WorkspaceMarker]:
426
  if not HF_TOKEN:
427
  write_status("disabled", "HF_TOKEN is not configured.")
428
+ return (last_fingerprint or "", last_marker or (0, 0, 0, ""))
429
 
430
  snapshot_state_into_workspace()
431
  repo_id = ensure_repo_exists()
432
  current_marker = metadata_marker(WORKSPACE)
 
433
  if last_marker is not None and current_marker == last_marker:
434
  write_status("synced", "No workspace changes detected.")
435
  return (last_fingerprint or "", current_marker)
 
466
  return (current_fingerprint, current_marker)
467
 
468
 
469
+ def sync_once(
470
+ last_fingerprint: str | None = None,
471
+ last_marker: WorkspaceMarker | None = None,
472
+ ) -> tuple[str, WorkspaceMarker]:
473
+ SYNC_LOCK_FILE.parent.mkdir(parents=True, exist_ok=True)
474
+ with SYNC_LOCK_FILE.open("w", encoding="utf-8") as lock_handle:
475
+ fcntl.flock(lock_handle, fcntl.LOCK_EX)
476
+ try:
477
+ return _sync_once_unlocked(last_fingerprint, last_marker)
478
+ finally:
479
+ fcntl.flock(lock_handle, fcntl.LOCK_UN)
480
+
481
+
482
  def handle_signal(_sig, _frame) -> None:
483
  STOP_EVENT.set()
484
 
485
 
486
+ def is_valid_json_file(path: Path) -> bool:
487
+ if not path.exists():
488
+ return True
489
+
490
+ try:
491
+ json.loads(path.read_text(encoding="utf-8"))
492
+ return True
493
+ except Exception:
494
+ return False
495
+
496
+
497
+ def wait_for_config_settle(config_marker: tuple[int, int, int]) -> tuple[str, tuple[int, int, int]]:
498
+ stable_since = time.monotonic()
499
+ current_marker = config_marker
500
+
501
+ while not STOP_EVENT.is_set():
502
+ latest_marker = file_marker(OPENCLAW_CONFIG_FILE)
503
+ if latest_marker != current_marker:
504
+ current_marker = latest_marker
505
+ stable_since = time.monotonic()
506
+
507
+ if (
508
+ time.monotonic() - stable_since >= CONFIG_SETTLE_SECONDS
509
+ and is_valid_json_file(OPENCLAW_CONFIG_FILE)
510
+ ):
511
+ return ("settled", current_marker)
512
+
513
+ if STOP_EVENT.wait(CONFIG_WATCH_INTERVAL):
514
+ return ("stopped", current_marker)
515
+
516
+ return ("stopped", current_marker)
517
+
518
+
519
+ def wait_for_sync_trigger(config_marker: tuple[int, int, int]) -> tuple[str, tuple[int, int, int]]:
520
+ deadline = time.monotonic() + max(0, INTERVAL)
521
+
522
+ while not STOP_EVENT.is_set():
523
+ current_config_marker = file_marker(OPENCLAW_CONFIG_FILE)
524
+ if current_config_marker != config_marker:
525
+ return wait_for_config_settle(current_config_marker)
526
+
527
+ remaining = deadline - time.monotonic()
528
+ if remaining <= 0:
529
+ return ("interval", current_config_marker)
530
+
531
+ wait_seconds = min(CONFIG_WATCH_INTERVAL, remaining)
532
+ if STOP_EVENT.wait(wait_seconds):
533
+ return ("stopped", current_config_marker)
534
+
535
+ return ("stopped", config_marker)
536
+
537
+
538
  def loop() -> int:
539
  signal.signal(signal.SIGTERM, handle_signal)
540
  signal.signal(signal.SIGINT, handle_signal)
541
 
542
+ previous_status = read_status().get("status", "")
543
+
544
  try:
545
  repo_id = resolve_backup_namespace()
546
  write_status("configured", f"Backup loop active for {repo_id} with {INTERVAL}s interval.")
 
552
  time.sleep(INITIAL_DELAY)
553
  print(f"Workspace sync started: every {INTERVAL}s -> {repo_id}")
554
 
555
+ # Capture the restored dataset state before refreshing the embedded
556
+ # /home/node/.openclaw backup. Startup may have patched openclaw.json
557
+ # after restore (token/model/logging/channel toggles), and that patch only
558
+ # becomes part of the dataset once snapshot_state_into_workspace() copies it
559
+ # into workspace/huggingclaw-state/openclaw/. If the snapshot changes the
560
+ # workspace, seed the first sync with the pre-snapshot fingerprint so the
561
+ # updated openclaw.json is uploaded instead of being treated as the baseline.
562
+ pre_snapshot_fingerprint = fingerprint_dir(WORKSPACE)
563
+ pre_snapshot_marker = metadata_marker(WORKSPACE)
564
  snapshot_state_into_workspace()
565
  last_fingerprint = fingerprint_dir(WORKSPACE)
566
  last_marker = metadata_marker(WORKSPACE)
567
+
568
+ if last_fingerprint != pre_snapshot_fingerprint:
569
+ if previous_status == "error":
570
+ print(
571
+ "Initial state snapshot changed, but restore previously failed; "
572
+ "keeping current state as baseline to avoid overwriting the remote backup."
573
+ )
574
+ else:
575
+ last_fingerprint = pre_snapshot_fingerprint
576
+ last_marker = pre_snapshot_marker
577
+ print("Initial state snapshot changed; first sync will upload refreshed OpenClaw state.")
578
+ else:
579
+ print("Initial workspace fingerprint captured.")
580
+
581
+ config_marker = file_marker(OPENCLAW_CONFIG_FILE)
582
 
583
  while not STOP_EVENT.is_set():
584
  try:
585
+ sync_started_config_marker = file_marker(OPENCLAW_CONFIG_FILE)
586
  last_fingerprint, last_marker = sync_once(last_fingerprint, last_marker)
587
+ config_marker = file_marker(OPENCLAW_CONFIG_FILE)
588
+
589
+ if config_marker != sync_started_config_marker:
590
+ trigger, config_marker = wait_for_config_settle(config_marker)
591
+ if trigger == "stopped":
592
+ break
593
+ print("OpenClaw config changed during sync; syncing again after it settled.")
594
+ continue
595
  except Exception as exc:
596
  write_status("error", f"Sync failed: {exc}")
597
  print(f"Workspace sync failed: {exc}")
598
+ config_marker = file_marker(OPENCLAW_CONFIG_FILE)
599
 
600
+ trigger, config_marker = wait_for_sync_trigger(config_marker)
601
+ if trigger == "stopped":
602
  break
603
+ if trigger == "settled":
604
+ print("OpenClaw config changed and settled; syncing immediately.")
605
 
606
  return 0
607
 
 
622
  write_status("error", f"Shutdown sync failed: {exc}")
623
  print(f"Workspace sync: shutdown sync failed: {exc}")
624
  return 1
625
+ if command == "sync-once-settled":
626
+ try:
627
+ trigger, _ = wait_for_config_settle(file_marker(OPENCLAW_CONFIG_FILE))
628
+ if trigger == "stopped":
629
+ return 1
630
+ sync_once()
631
+ return 0
632
+ except Exception as exc:
633
+ write_status("error", f"Settled sync failed: {exc}")
634
+ print(f"Workspace sync: settled sync failed: {exc}")
635
+ return 1
636
  if command == "loop":
637
  return loop()
638
 
start.sh CHANGED
@@ -11,6 +11,14 @@ umask 0077
11
  OPENCLAW_VERSION="${OPENCLAW_VERSION:-latest}"
12
  OPENCLAW_APP_DIR="/home/node/.openclaw/openclaw-app"
13
  OPENCLAW_RUNTIME_VERSION=""
 
 
 
 
 
 
 
 
14
  WHATSAPP_ENABLED="${WHATSAPP_ENABLED:-false}"
15
  WHATSAPP_ENABLED_NORMALIZED=$(printf '%s' "$WHATSAPP_ENABLED" | tr '[:upper:]' '[:lower:]')
16
  SYNC_INTERVAL="${SYNC_INTERVAL:-180}"
@@ -445,22 +453,32 @@ if [ -f "$EXISTING_CONFIG" ]; then
445
  --arg consoleLevel "$OPENCLAW_CONSOLE_LOG_LEVEL" \
446
  --arg consoleStyle "$OPENCLAW_CONSOLE_LOG_STYLE" \
447
  --argjson desired "$CONFIG_JSON" \
 
 
 
 
448
  --argjson whatsappEnabled "$WHATSAPP_CONFIG_ENABLED" \
449
- '.gateway.auth.token = $token
 
450
  | .agents.defaults.model = $model
451
- | .logging.level = $fileLevel
452
- | .logging.consoleLevel = $consoleLevel
453
- | .logging.consoleStyle = $consoleStyle
454
  | .channels = ((.channels // {}) * ($desired.channels // {}))
455
  | .plugins.allow = (((.plugins.allow // []) + ($desired.plugins.allow // [])) | unique)
456
  | .plugins.deny = (((.plugins.deny // []) + ($desired.plugins.deny // [])) | unique)
457
- | .plugins.entries = ((.plugins.entries // {}) * ($desired.plugins.entries // {}))
458
  | if $whatsappEnabled then
459
- .plugins.entries.whatsapp.enabled = true
460
- | .channels.whatsapp = ($desired.channels.whatsapp // {"dmPolicy": "pairing"})
461
- else
 
 
 
462
  .plugins.entries.whatsapp.enabled = false
463
  | del(.channels.whatsapp)
 
 
464
  end' \
465
  "$EXISTING_CONFIG" 2>/dev/null)
466
 
@@ -522,10 +540,15 @@ fi
522
  # ── Trap SIGTERM for graceful shutdown ──
523
  graceful_shutdown() {
524
  echo "Shutting down..."
525
- if [ -f "/home/node/app/openclaw-sync.py" ]; then
526
  echo "Saving state before exit..."
 
 
 
527
  python3 /home/node/app/openclaw-sync.py sync-once || \
528
- echo "Warning: could not complete shutdown sync"
 
 
529
  fi
530
  kill $(jobs -p) 2>/dev/null
531
  exit 0
@@ -1000,59 +1023,108 @@ hc_finish_startup_commands
1000
  sync_installed_plugins_into_allow
1001
 
1002
  # ── Launch gateway ──
1003
- echo "Launching OpenClaw gateway on port 7860..."
 
 
 
 
 
 
 
 
 
 
 
 
 
1004
 
1005
- GATEWAY_ARGS=(gateway run --port 7860 --bind lan)
1006
- if [ "${GATEWAY_VERBOSE:-0}" = "1" ]; then
1007
- GATEWAY_ARGS+=(--verbose)
1008
- echo "Gateway verbose logging enabled (GATEWAY_VERBOSE=1)"
1009
- fi
1010
 
1011
- # Use stdbuf -oL -eL to ensure logs are not buffered and appear immediately
1012
- # in the console. NOTE: $! captures the LAST pipeline element (tee), not
1013
- # openclaw β€” fine for passing to `wait` (waits for the whole pipeline to
1014
- # finish), but kill -0 on it is uninformative. We probe TCP instead.
1015
- stdbuf -oL -eL openclaw "${GATEWAY_ARGS[@]}" 2>&1 | tee -a /home/node/.openclaw/gateway.log &
1016
- GATEWAY_PID=$!
1017
-
1018
- # Poll for the gateway to start listening on 7860. OpenClaw can take 20-30s
1019
- # on cold start (plugin install + auto-restore). Bail out early if the
1020
- # pipeline died.
1021
- GATEWAY_READY_TIMEOUT="${GATEWAY_READY_TIMEOUT:-90}"
1022
- ready=false
1023
- for ((i=0; i<GATEWAY_READY_TIMEOUT; i++)); do
1024
- if (echo > /dev/tcp/127.0.0.1/7860) 2>/dev/null; then
1025
- ready=true
1026
- break
1027
- fi
1028
- if ! kill -0 "$GATEWAY_PID" 2>/dev/null; then
1029
- break
1030
  fi
1031
- sleep 1
1032
- done
1033
 
1034
- if [ "$ready" != "true" ]; then
1035
- echo ""
1036
- echo "Gateway failed to start. Last 30 lines of log:"
1037
- echo "────────────────────────────────────────────"
1038
- tail -30 /home/node/.openclaw/gateway.log
1039
- exit 1
1040
- fi
 
 
 
1041
 
1042
- # 11. Start WhatsApp Guardian after the gateway is accepting connections
1043
- if [ "$WHATSAPP_ENABLED_NORMALIZED" = "true" ]; then
1044
  node /home/node/app/wa-guardian.js &
1045
  GUARDIAN_PID=$!
1046
  echo "WhatsApp Guardian started (PID: $GUARDIAN_PID)"
1047
- fi
1048
 
1049
- # 11.5 Warm up the managed browser so first browser actions have a live tab
1050
- warmup_browser
1051
 
1052
- # 12. Start Workspace Sync after startup settles
1053
- if [ -n "${HF_TOKEN:-}" ]; then
1054
- python3 -u /home/node/app/openclaw-sync.py loop &
1055
- fi
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1056
 
1057
- # Wait for gateway (allows trap to fire)
1058
- wait $GATEWAY_PID
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  OPENCLAW_VERSION="${OPENCLAW_VERSION:-latest}"
12
  OPENCLAW_APP_DIR="/home/node/.openclaw/openclaw-app"
13
  OPENCLAW_RUNTIME_VERSION=""
14
+ OPENCLAW_FILE_LOG_LEVEL_CONFIGURED=false
15
+ OPENCLAW_CONSOLE_LOG_LEVEL_CONFIGURED=false
16
+ OPENCLAW_CONSOLE_LOG_STYLE_CONFIGURED=false
17
+ WHATSAPP_ENABLED_CONFIGURED=false
18
+ [ "${OPENCLAW_FILE_LOG_LEVEL+x}" = "x" ] && OPENCLAW_FILE_LOG_LEVEL_CONFIGURED=true
19
+ [ "${OPENCLAW_CONSOLE_LOG_LEVEL+x}" = "x" ] && OPENCLAW_CONSOLE_LOG_LEVEL_CONFIGURED=true
20
+ [ "${OPENCLAW_CONSOLE_LOG_STYLE+x}" = "x" ] && OPENCLAW_CONSOLE_LOG_STYLE_CONFIGURED=true
21
+ [ "${WHATSAPP_ENABLED+x}" = "x" ] && WHATSAPP_ENABLED_CONFIGURED=true
22
  WHATSAPP_ENABLED="${WHATSAPP_ENABLED:-false}"
23
  WHATSAPP_ENABLED_NORMALIZED=$(printf '%s' "$WHATSAPP_ENABLED" | tr '[:upper:]' '[:lower:]')
24
  SYNC_INTERVAL="${SYNC_INTERVAL:-180}"
 
453
  --arg consoleLevel "$OPENCLAW_CONSOLE_LOG_LEVEL" \
454
  --arg consoleStyle "$OPENCLAW_CONSOLE_LOG_STYLE" \
455
  --argjson desired "$CONFIG_JSON" \
456
+ --argjson fileLogConfigured "$OPENCLAW_FILE_LOG_LEVEL_CONFIGURED" \
457
+ --argjson consoleLogConfigured "$OPENCLAW_CONSOLE_LOG_LEVEL_CONFIGURED" \
458
+ --argjson consoleStyleConfigured "$OPENCLAW_CONSOLE_LOG_STYLE_CONFIGURED" \
459
+ --argjson whatsappConfigured "$WHATSAPP_ENABLED_CONFIGURED" \
460
  --argjson whatsappEnabled "$WHATSAPP_CONFIG_ENABLED" \
461
+ '(.channels.whatsapp // {}) as $existingWhatsapp
462
+ | .gateway.auth.token = $token
463
  | .agents.defaults.model = $model
464
+ | if $fileLogConfigured then .logging.level = $fileLevel else . end
465
+ | if $consoleLogConfigured then .logging.consoleLevel = $consoleLevel else . end
466
+ | if $consoleStyleConfigured then .logging.consoleStyle = $consoleStyle else . end
467
  | .channels = ((.channels // {}) * ($desired.channels // {}))
468
  | .plugins.allow = (((.plugins.allow // []) + ($desired.plugins.allow // [])) | unique)
469
  | .plugins.deny = (((.plugins.deny // []) + ($desired.plugins.deny // [])) | unique)
470
+ | .plugins.entries = (($desired.plugins.entries // {}) * (.plugins.entries // {}))
471
  | if $whatsappEnabled then
472
+ ($desired.channels.whatsapp // {"dmPolicy": "pairing"}) as $desiredWhatsapp
473
+ | .plugins.entries.whatsapp.enabled = true
474
+ | .channels.whatsapp = (($existingWhatsapp * $desiredWhatsapp)
475
+ | if ($existingWhatsapp | has("dmPolicy")) then .dmPolicy = $existingWhatsapp.dmPolicy else . end
476
+ | if ($existingWhatsapp | has("allowFrom")) then .allowFrom = $existingWhatsapp.allowFrom else . end)
477
+ elif $whatsappConfigured then
478
  .plugins.entries.whatsapp.enabled = false
479
  | del(.channels.whatsapp)
480
+ else
481
+ .
482
  end' \
483
  "$EXISTING_CONFIG" 2>/dev/null)
484
 
 
540
  # ── Trap SIGTERM for graceful shutdown ──
541
  graceful_shutdown() {
542
  echo "Shutting down..."
543
+ if [ -f "/home/node/app/openclaw-sync.py" ] && [ -n "${HF_TOKEN:-}" ]; then
544
  echo "Saving state before exit..."
545
+ timeout 8s python3 /home/node/app/openclaw-sync.py sync-once-settled || \
546
+ echo "Warning: could not complete settled shutdown sync"
547
+ sleep 1
548
  python3 /home/node/app/openclaw-sync.py sync-once || \
549
+ echo "Warning: could not complete final shutdown sync"
550
+ elif [ -f "/home/node/app/openclaw-sync.py" ]; then
551
+ echo "HF_TOKEN not set; skipping shutdown backup sync."
552
  fi
553
  kill $(jobs -p) 2>/dev/null
554
  exit 0
 
1023
  sync_installed_plugins_into_allow
1024
 
1025
  # ── Launch gateway ──
1026
+ GATEWAY_RESTART_DELAY="${GATEWAY_RESTART_DELAY:-2}"
1027
+ GATEWAY_MAX_RESTARTS="${GATEWAY_MAX_RESTARTS:-0}"
1028
+ GATEWAY_RESTART_COUNT=0
1029
+ SYNC_LOOP_PID=""
1030
+ GUARDIAN_PID=""
1031
+
1032
+ sync_before_gateway_restart() {
1033
+ [ -n "${HF_TOKEN:-}" ] || return 0
1034
+ [ -f "/home/node/app/openclaw-sync.py" ] || return 0
1035
+
1036
+ echo "Gateway stopped; saving latest OpenClaw state before restart..."
1037
+ python3 /home/node/app/openclaw-sync.py sync-once-settled || \
1038
+ echo "Warning: could not sync settled state before gateway restart"
1039
+ }
1040
 
1041
+ start_background_sync_once() {
1042
+ [ -n "${HF_TOKEN:-}" ] || return 0
 
 
 
1043
 
1044
+ if [ -n "$SYNC_LOOP_PID" ] && kill -0 "$SYNC_LOOP_PID" 2>/dev/null; then
1045
+ return 0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1046
  fi
 
 
1047
 
1048
+ python3 -u /home/node/app/openclaw-sync.py loop &
1049
+ SYNC_LOOP_PID=$!
1050
+ }
1051
+
1052
+ start_guardian_once() {
1053
+ [ "$WHATSAPP_ENABLED_NORMALIZED" = "true" ] || return 0
1054
+
1055
+ if [ -n "$GUARDIAN_PID" ] && kill -0 "$GUARDIAN_PID" 2>/dev/null; then
1056
+ return 0
1057
+ fi
1058
 
 
 
1059
  node /home/node/app/wa-guardian.js &
1060
  GUARDIAN_PID=$!
1061
  echo "WhatsApp Guardian started (PID: $GUARDIAN_PID)"
1062
+ }
1063
 
1064
+ while true; do
1065
+ echo "Launching OpenClaw gateway on port 7860..."
1066
 
1067
+ GATEWAY_ARGS=(gateway run --port 7860 --bind lan)
1068
+ if [ "${GATEWAY_VERBOSE:-0}" = "1" ]; then
1069
+ GATEWAY_ARGS+=(--verbose)
1070
+ echo "Gateway verbose logging enabled (GATEWAY_VERBOSE=1)"
1071
+ fi
1072
+
1073
+ # Use stdbuf -oL -eL to ensure logs are not buffered and appear immediately
1074
+ # in the console. NOTE: $! captures the LAST pipeline element (tee), not
1075
+ # openclaw β€” fine for passing to `wait` (waits for the whole pipeline to
1076
+ # finish), but kill -0 on it is uninformative. We probe TCP instead.
1077
+ stdbuf -oL -eL openclaw "${GATEWAY_ARGS[@]}" 2>&1 | tee -a /home/node/.openclaw/gateway.log &
1078
+ GATEWAY_PID=$!
1079
+
1080
+ # Poll for the gateway to start listening on 7860. OpenClaw can take 20-30s
1081
+ # on cold start (plugin install + auto-restore). Bail out early if the
1082
+ # pipeline died.
1083
+ GATEWAY_READY_TIMEOUT="${GATEWAY_READY_TIMEOUT:-90}"
1084
+ ready=false
1085
+ for ((i=0; i<GATEWAY_READY_TIMEOUT; i++)); do
1086
+ if (echo > /dev/tcp/127.0.0.1/7860) 2>/dev/null; then
1087
+ ready=true
1088
+ break
1089
+ fi
1090
+ if ! kill -0 "$GATEWAY_PID" 2>/dev/null; then
1091
+ break
1092
+ fi
1093
+ sleep 1
1094
+ done
1095
+
1096
+ if [ "$ready" != "true" ]; then
1097
+ echo ""
1098
+ echo "Gateway failed to start. Last 30 lines of log:"
1099
+ echo "────────────────────────────────────────────"
1100
+ tail -30 /home/node/.openclaw/gateway.log
1101
+ exit 1
1102
+ fi
1103
 
1104
+ # 11. Start WhatsApp Guardian after the gateway is accepting connections
1105
+ start_guardian_once
1106
+
1107
+ # 11.5 Warm up the managed browser so first browser actions have a live tab
1108
+ warmup_browser
1109
+
1110
+ # 12. Start Workspace Sync after startup settles. Keep only one loop active;
1111
+ # config edits can make OpenClaw exit/reload, and the gateway loop below will
1112
+ # relaunch it without rerunning all startup code.
1113
+ start_background_sync_once
1114
+
1115
+ set +e
1116
+ wait "$GATEWAY_PID"
1117
+ GATEWAY_EXIT_CODE=$?
1118
+ set -e
1119
+
1120
+ sync_before_gateway_restart
1121
+
1122
+ GATEWAY_RESTART_COUNT=$((GATEWAY_RESTART_COUNT + 1))
1123
+ if [ "$GATEWAY_MAX_RESTARTS" != "0" ] && [ "$GATEWAY_RESTART_COUNT" -ge "$GATEWAY_MAX_RESTARTS" ]; then
1124
+ echo "Gateway exited with code ${GATEWAY_EXIT_CODE}; restart limit (${GATEWAY_MAX_RESTARTS}) reached."
1125
+ exit "$GATEWAY_EXIT_CODE"
1126
+ fi
1127
+
1128
+ echo "Gateway exited with code ${GATEWAY_EXIT_CODE}; restarting in ${GATEWAY_RESTART_DELAY}s..."
1129
+ sleep "$GATEWAY_RESTART_DELAY"
1130
+ done