anurag008w commited on
Commit
ac1c48d
·
1 Parent(s): 33980b3

Set LLM_API_KEY fallback default to enabled

Browse files
Files changed (6) hide show
  1. .env.example +4 -1
  2. CHANGELOG.md +38 -0
  3. README.md +2 -2
  4. multi-provider-key-rotator.cjs +13 -3
  5. openclaw-sync.py +40 -16
  6. start.sh +7 -2
.env.example CHANGED
@@ -132,7 +132,10 @@ LLM_MODEL=anthropic/claude-sonnet-4-5
132
  # have multiple accounts or want to spread rate-limit quota.
133
  #
134
  # Pattern: <PROVIDER>_API_KEYS=key1,key2,key3
135
- # Fallback order: plural pool → singular key → LLM_API_KEY
 
 
 
136
  #
137
  # Uncomment and fill in only the providers you use:
138
  #
 
132
  # have multiple accounts or want to spread rate-limit quota.
133
  #
134
  # Pattern: <PROVIDER>_API_KEYS=key1,key2,key3
135
+ # Fallback order: plural pool → singular key → LLM_API_KEY (optional)
136
+ # Set false only if you want to disable global LLM_API_KEY fallback
137
+ # across providers.
138
+ LLM_API_KEY_FALLBACK_ENABLED=true
139
  #
140
  # Uncomment and fill in only the providers you use:
141
  #
CHANGELOG.md CHANGED
@@ -2,6 +2,44 @@
2
 
3
  All notable changes to this project will be documented in this file.
4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ## [1.4.0] - 2026-04-25
6
 
7
  ### Added
 
2
 
3
  All notable changes to this project will be documented in this file.
4
 
5
+ ## [1.5.0] - 2026-05-13
6
+
7
+ ### Added
8
+
9
+ - **NVIDIA key-rotation support** — added `nvidia-key-rotator.cjs` wiring and startup integration so deployments can rotate NVIDIA credentials similarly to other provider key-rotation flows.
10
+ - **Cloudflare keep-alive automation** — added/expanded `cloudflare-keepalive-setup.py` flow and startup wiring to provision keep-alive through Cloudflare Worker automation instead of the older UptimeRobot-first approach.
11
+ - **Sync metadata marker model** — introduced a structured workspace marker `(file_count, total_size, newest_mtime, metadata_hash)` to support stronger change introspection in sync code.
12
+
13
+ ### Changed
14
+
15
+ - **Workspace sync script rename finalized** — `workspace-sync.py` flow was migrated to `openclaw-sync.py` in Docker/startup/docs so restore/sync behavior is centralized under one script.
16
+ - **Sync trigger behavior hardened for config churn** — OpenClaw config sync now debounces until JSON settles before immediate sync, reducing false/partial syncs during rapid config writes.
17
+ - **Gateway restart flow now saves state first** — restart path was updated to run a pre-restart one-shot state sync so gateway reloads are less likely to drop recent state.
18
+ - **Shutdown backup now uses a two-step pass** — graceful shutdown now attempts `sync-once-settled` then a final `sync-once` pass to better capture last-second writes.
19
+ - **Telegram allowlist simplified** — consolidated Telegram allowlist into `TELEGRAM_ALLOWED_USERS` and aligned docs/examples.
20
+ - **Plugin startup behavior aligned** — startup-installed plugins are synced into `plugins.allow` before gateway launch so runtime-installed plugins are recognized cleanly.
21
+ - **Cloudflare proxy path matured** — multiple iterations improved fetch/proxy behavior (header handling, endpoint scoping, API root routing, URL parsing, and logging noise reduction), then simplified unstable undici patching paths.
22
+ - **Health dashboard polish** — sync timestamps now show local time, footer credits were corrected, and status rendering/docs were updated for the Cloudflare keep-alive model.
23
+ - **CI workflow churn documented** — GitHub workflow files for HF sync were added/renamed/cleaned multiple times as space/repo naming stabilized.
24
+
25
+ ### Fixed
26
+
27
+ - **Missed rapid backup updates** — sync logic now relies on content fingerprint checks for no-op decisions so same-second or quick successive changes are less likely to be skipped.
28
+ - **Non-deterministic metadata hashing** — metadata hashing now iterates paths deterministically to avoid hash jitter from traversal order.
29
+ - **Transient file race sync failures** — sync fingerprinting/snapshot copy paths now tolerate transient `OSError` (file rotated/deleted mid-scan) instead of aborting the whole sync pass.
30
+ - **State restore migration edge cases** — restore flow includes migration/cleanup behavior for legacy hidden state paths and stale backup entries.
31
+ - **Startup/env robustness** — fixed shell export formatting/syntax issues (e.g., NVIDIA/XAI lines) and unbound-variable pitfalls in startup scripts.
32
+ - **Proxy runtime errors and noise** — fixed specific proxy runtime issues (including `UND_ERR_INVALID_ARG`, fetch duplex handling, and upstream error visibility) and reduced noisy stdout logs that interfered with clean process output.
33
+ - **HF workflow/repo reference mismatches** — corrected and later cleaned workflow repository references during repo migration/restructure.
34
+
35
+ ### Docs
36
+
37
+ - README/.env/security docs were refreshed across multiple commits to reflect:
38
+ - Cloudflare keep-alive replacing UptimeRobot setup path,
39
+ - updated secrets and startup environment behavior,
40
+ - provider/key-rotation options,
41
+ - backup/sync behavior and troubleshooting guidance.
42
+
43
  ## [1.4.0] - 2026-04-25
44
 
45
  ### Added
README.md CHANGED
@@ -250,10 +250,10 @@ GEMINI_API_KEYS=AIza-key1,AIza-key2
250
  **Fallback chain** (per provider):
251
  1. `{PROVIDER}_API_KEYS` — comma-separated pool *(preferred)*
252
  2. `{PROVIDER}_API_KEY` — single dedicated key
253
- 3. `LLM_API_KEY` — universal fallback *(default for all providers)*
254
 
255
  > [!TIP]
256
- > If you only set `LLM_API_KEY`, all providers use it as a fallback automatically no extra config needed. Add per-provider pools only when you need multi-key rotation.
257
 
258
  Supported per-provider variables: `ANTHROPIC_API_KEYS`, `OPENAI_API_KEYS`, `GEMINI_API_KEYS`, `DEEPSEEK_API_KEYS`, `GROQ_API_KEYS`, `MISTRAL_API_KEYS`, `OPENROUTER_API_KEYS`, `XAI_API_KEYS`, `NVIDIA_API_KEYS`, `COHERE_API_KEYS`, `TOGETHER_API_KEYS`, `CEREBRAS_API_KEYS`, and more — see `.env.example` for the full list.
259
 
 
250
  **Fallback chain** (per provider):
251
  1. `{PROVIDER}_API_KEYS` — comma-separated pool *(preferred)*
252
  2. `{PROVIDER}_API_KEY` — single dedicated key
253
+ 3. `LLM_API_KEY` — universal fallback *(enabled by default; disable with `LLM_API_KEY_FALLBACK_ENABLED=false`)*
254
 
255
  > [!TIP]
256
+ > By default, `LLM_API_KEY` fallback is enabled for compatibility. Set `LLM_API_KEY_FALLBACK_ENABLED=false` if you want strict provider-only activation.
257
 
258
  Supported per-provider variables: `ANTHROPIC_API_KEYS`, `OPENAI_API_KEYS`, `GEMINI_API_KEYS`, `DEEPSEEK_API_KEYS`, `GROQ_API_KEYS`, `MISTRAL_API_KEYS`, `OPENROUTER_API_KEYS`, `XAI_API_KEYS`, `NVIDIA_API_KEYS`, `COHERE_API_KEYS`, `TOGETHER_API_KEYS`, `CEREBRAS_API_KEYS`, and more — see `.env.example` for the full list.
259
 
multi-provider-key-rotator.cjs CHANGED
@@ -8,7 +8,7 @@
8
  *
9
  * For each provider you can supply a comma-separated pool:
10
  * ANTHROPIC_API_KEYS=key1,key2,key3
11
- * Falls back to the singular env var, then to LLM_API_KEY.
12
  *
13
  * Keys are rotated round-robin per provider independently.
14
  *
@@ -30,7 +30,9 @@ const log = (...args) => console.error(...args);
30
  // envPlural – env var that holds a comma-separated key pool (preferred)
31
  // envSingular – env var that holds a single key (fallback)
32
  //
33
- // LLM_API_KEY is the final fallback for every provider.
 
 
34
  //
35
  const PROVIDERS = [
36
  {
@@ -180,6 +182,8 @@ function normalizeKeys(...inputs) {
180
 
181
  // Build per-provider key pools + rotation indices
182
  const providerState = PROVIDERS.map(p => {
 
 
183
  const dedicatedKeys = normalizeKeys(
184
  process.env[p.envPlural] || '',
185
  process.env[p.envSingular] || '',
@@ -187,7 +191,11 @@ const providerState = PROVIDERS.map(p => {
187
  const hasDedicated = dedicatedKeys.length > 0;
188
  const keys = hasDedicated
189
  ? dedicatedKeys
190
- : normalizeKeys(process.env.LLM_API_KEY || '');
 
 
 
 
191
 
192
  if (hasDedicated) {
193
  log(`[key-rotator] ${p.name}: ${keys.length} key${keys.length === 1 ? '' : 's'}`);
@@ -208,6 +216,8 @@ const fallbackCount = providerState.filter(p => {
208
  }).length;
209
  if (fallbackCount > 0) {
210
  log(`[key-rotator] ${fallbackCount} provider(s) using LLM_API_KEY fallback`);
 
 
211
  }
212
 
213
  // ─── Runtime helpers ─────────────────────────────────────────────────────────
 
8
  *
9
  * For each provider you can supply a comma-separated pool:
10
  * ANTHROPIC_API_KEYS=key1,key2,key3
11
+ * Falls back to the singular env var, and optionally to LLM_API_KEY.
12
  *
13
  * Keys are rotated round-robin per provider independently.
14
  *
 
30
  // envPlural – env var that holds a comma-separated key pool (preferred)
31
  // envSingular – env var that holds a single key (fallback)
32
  //
33
+ // LLM_API_KEY fallback can be controlled via:
34
+ // LLM_API_KEY_FALLBACK_ENABLED=true|false
35
+ // Default is enabled for backwards compatibility.
36
  //
37
  const PROVIDERS = [
38
  {
 
182
 
183
  // Build per-provider key pools + rotation indices
184
  const providerState = PROVIDERS.map(p => {
185
+ const llmFallbackRaw = String(process.env.LLM_API_KEY_FALLBACK_ENABLED || '').trim().toLowerCase();
186
+ const llmFallbackEnabled = !/^(0|false|no|off)$/.test(llmFallbackRaw);
187
  const dedicatedKeys = normalizeKeys(
188
  process.env[p.envPlural] || '',
189
  process.env[p.envSingular] || '',
 
191
  const hasDedicated = dedicatedKeys.length > 0;
192
  const keys = hasDedicated
193
  ? dedicatedKeys
194
+ : (
195
+ llmFallbackEnabled
196
+ ? normalizeKeys(process.env.LLM_API_KEY || '')
197
+ : []
198
+ );
199
 
200
  if (hasDedicated) {
201
  log(`[key-rotator] ${p.name}: ${keys.length} key${keys.length === 1 ? '' : 's'}`);
 
216
  }).length;
217
  if (fallbackCount > 0) {
218
  log(`[key-rotator] ${fallbackCount} provider(s) using LLM_API_KEY fallback`);
219
+ } else if (process.env.LLM_API_KEY && /^(0|false|no|off)$/i.test(String(process.env.LLM_API_KEY_FALLBACK_ENABLED || ''))) {
220
+ log('[key-rotator] LLM_API_KEY fallback disabled (set LLM_API_KEY_FALLBACK_ENABLED=true to re-enable)');
221
  }
222
 
223
  // ─── Runtime helpers ─────────────────────────────────────────────────────────
openclaw-sync.py CHANGED
@@ -18,6 +18,7 @@ import sys
18
  import tempfile
19
  import threading
20
  import time
 
21
  from pathlib import Path
22
 
23
  os.environ.setdefault("HF_HUB_DISABLE_PROGRESS_BARS", "1")
@@ -69,6 +70,7 @@ EXCLUDED_STATE_NAMES = {
69
  "openclaw-app",
70
  "gateway.log",
71
  "browser",
 
72
  }
73
  WHATSAPP_CREDS_DIR = OPENCLAW_HOME / "credentials" / "whatsapp" / "default"
74
  WHATSAPP_BACKUP_DIR = STATE_DIR / "credentials" / "whatsapp" / "default"
@@ -76,6 +78,7 @@ RESET_MARKER = WORKSPACE / ".reset_credentials"
76
  HF_API = HfApi(token=HF_TOKEN) if HF_TOKEN else None
77
  STOP_EVENT = threading.Event()
78
  _REPO_ID_CACHE: str | None = None
 
79
 
80
 
81
  def write_status(status: str, message: str) -> None:
@@ -280,14 +283,15 @@ def file_marker(path: Path) -> tuple[int, int, int]:
280
  return (1, int(stat.st_size), int(stat.st_mtime_ns))
281
 
282
 
283
- def metadata_marker(root: Path) -> tuple[int, int, int]:
284
  if not root.exists():
285
- return (0, 0, 0)
286
 
287
  file_count = 0
288
  total_size = 0
289
  newest_mtime = 0
290
- for path in root.rglob("*"):
 
291
  if not path.is_file():
292
  continue
293
  rel = path.relative_to(root).as_posix()
@@ -298,9 +302,17 @@ def metadata_marker(root: Path) -> tuple[int, int, int]:
298
  except OSError:
299
  continue
300
  file_count += 1
301
- total_size += int(stat.st_size)
302
- newest_mtime = max(newest_mtime, int(stat.st_mtime_ns))
303
- return (file_count, total_size, newest_mtime)
 
 
 
 
 
 
 
 
304
 
305
 
306
  def fingerprint_dir(root: Path) -> str:
@@ -313,9 +325,16 @@ def fingerprint_dir(root: Path) -> str:
313
  if _should_exclude(rel, path):
314
  continue
315
  hasher.update(rel.encode("utf-8"))
316
- with path.open("rb") as handle:
317
- for chunk in iter(lambda: handle.read(1024 * 1024), b""):
318
- hasher.update(chunk)
 
 
 
 
 
 
 
319
  return hasher.hexdigest()
320
 
321
 
@@ -331,7 +350,13 @@ def create_snapshot_dir(source_root: Path) -> Path:
331
  target.mkdir(parents=True, exist_ok=True)
332
  continue
333
  target.parent.mkdir(parents=True, exist_ok=True)
334
- shutil.copy2(path, target)
 
 
 
 
 
 
335
  return staging_root
336
 
337
 
@@ -396,16 +421,15 @@ def restore_workspace() -> bool:
396
 
397
  def _sync_once_unlocked(
398
  last_fingerprint: str | None = None,
399
- last_marker: tuple[int, int, int] | None = None,
400
- ) -> tuple[str, tuple[int, int, int]]:
401
  if not HF_TOKEN:
402
  write_status("disabled", "HF_TOKEN is not configured.")
403
- return (last_fingerprint or "", last_marker or (0, 0, 0))
404
 
405
  snapshot_state_into_workspace()
406
  repo_id = ensure_repo_exists()
407
  current_marker = metadata_marker(WORKSPACE)
408
-
409
  if last_marker is not None and current_marker == last_marker:
410
  write_status("synced", "No workspace changes detected.")
411
  return (last_fingerprint or "", current_marker)
@@ -444,8 +468,8 @@ def _sync_once_unlocked(
444
 
445
  def sync_once(
446
  last_fingerprint: str | None = None,
447
- last_marker: tuple[int, int, int] | None = None,
448
- ) -> tuple[str, tuple[int, int, int]]:
449
  SYNC_LOCK_FILE.parent.mkdir(parents=True, exist_ok=True)
450
  with SYNC_LOCK_FILE.open("w", encoding="utf-8") as lock_handle:
451
  fcntl.flock(lock_handle, fcntl.LOCK_EX)
 
18
  import tempfile
19
  import threading
20
  import time
21
+ from typing import TypeAlias
22
  from pathlib import Path
23
 
24
  os.environ.setdefault("HF_HUB_DISABLE_PROGRESS_BARS", "1")
 
70
  "openclaw-app",
71
  "gateway.log",
72
  "browser",
73
+ "npm",
74
  }
75
  WHATSAPP_CREDS_DIR = OPENCLAW_HOME / "credentials" / "whatsapp" / "default"
76
  WHATSAPP_BACKUP_DIR = STATE_DIR / "credentials" / "whatsapp" / "default"
 
78
  HF_API = HfApi(token=HF_TOKEN) if HF_TOKEN else None
79
  STOP_EVENT = threading.Event()
80
  _REPO_ID_CACHE: str | None = None
81
+ WorkspaceMarker: TypeAlias = tuple[int, int, int, str]
82
 
83
 
84
  def write_status(status: str, message: str) -> None:
 
283
  return (1, int(stat.st_size), int(stat.st_mtime_ns))
284
 
285
 
286
+ def metadata_marker(root: Path) -> WorkspaceMarker:
287
  if not root.exists():
288
+ return (0, 0, 0, "")
289
 
290
  file_count = 0
291
  total_size = 0
292
  newest_mtime = 0
293
+ metadata_hasher = hashlib.sha256()
294
+ for path in sorted(root.rglob("*")):
295
  if not path.is_file():
296
  continue
297
  rel = path.relative_to(root).as_posix()
 
302
  except OSError:
303
  continue
304
  file_count += 1
305
+ size = int(stat.st_size)
306
+ mtime_ns = int(stat.st_mtime_ns)
307
+ total_size += size
308
+ newest_mtime = max(newest_mtime, mtime_ns)
309
+ metadata_hasher.update(rel.encode("utf-8"))
310
+ metadata_hasher.update(b"\0")
311
+ metadata_hasher.update(str(size).encode("ascii"))
312
+ metadata_hasher.update(b"\0")
313
+ metadata_hasher.update(str(mtime_ns).encode("ascii"))
314
+ metadata_hasher.update(b"\0")
315
+ return (file_count, total_size, newest_mtime, metadata_hasher.hexdigest())
316
 
317
 
318
  def fingerprint_dir(root: Path) -> str:
 
325
  if _should_exclude(rel, path):
326
  continue
327
  hasher.update(rel.encode("utf-8"))
328
+ try:
329
+ with path.open("rb") as handle:
330
+ for chunk in iter(lambda: handle.read(1024 * 1024), b""):
331
+ hasher.update(chunk)
332
+ except (FileNotFoundError, IsADirectoryError, NotADirectoryError):
333
+ # Fingerprint must represent a complete view of the workspace.
334
+ # Retry next sync pass instead of silently hashing a partial tree.
335
+ raise RuntimeError(
336
+ f"Workspace changed while hashing {rel}; retrying next sync pass."
337
+ )
338
  return hasher.hexdigest()
339
 
340
 
 
350
  target.mkdir(parents=True, exist_ok=True)
351
  continue
352
  target.parent.mkdir(parents=True, exist_ok=True)
353
+ try:
354
+ shutil.copy2(path, target)
355
+ except (FileNotFoundError, IsADirectoryError, NotADirectoryError):
356
+ # Do not upload a partial snapshot; let caller retry on next loop.
357
+ raise RuntimeError(
358
+ f"Snapshot changed while copying {rel_posix}; retrying next sync pass."
359
+ )
360
  return staging_root
361
 
362
 
 
421
 
422
  def _sync_once_unlocked(
423
  last_fingerprint: str | None = None,
424
+ last_marker: WorkspaceMarker | None = None,
425
+ ) -> tuple[str, WorkspaceMarker]:
426
  if not HF_TOKEN:
427
  write_status("disabled", "HF_TOKEN is not configured.")
428
+ return (last_fingerprint or "", last_marker or (0, 0, 0, ""))
429
 
430
  snapshot_state_into_workspace()
431
  repo_id = ensure_repo_exists()
432
  current_marker = metadata_marker(WORKSPACE)
 
433
  if last_marker is not None and current_marker == last_marker:
434
  write_status("synced", "No workspace changes detected.")
435
  return (last_fingerprint or "", current_marker)
 
468
 
469
  def sync_once(
470
  last_fingerprint: str | None = None,
471
+ last_marker: WorkspaceMarker | None = None,
472
+ ) -> tuple[str, WorkspaceMarker]:
473
  SYNC_LOCK_FILE.parent.mkdir(parents=True, exist_ok=True)
474
  with SYNC_LOCK_FILE.open("w", encoding="utf-8") as lock_handle:
475
  fcntl.flock(lock_handle, fcntl.LOCK_EX)
start.sh CHANGED
@@ -540,10 +540,15 @@ fi
540
  # ── Trap SIGTERM for graceful shutdown ──
541
  graceful_shutdown() {
542
  echo "Shutting down..."
543
- if [ -f "/home/node/app/openclaw-sync.py" ]; then
544
  echo "Saving state before exit..."
 
 
 
545
  python3 /home/node/app/openclaw-sync.py sync-once || \
546
- echo "Warning: could not complete shutdown sync"
 
 
547
  fi
548
  kill $(jobs -p) 2>/dev/null
549
  exit 0
 
540
  # ── Trap SIGTERM for graceful shutdown ──
541
  graceful_shutdown() {
542
  echo "Shutting down..."
543
+ if [ -f "/home/node/app/openclaw-sync.py" ] && [ -n "${HF_TOKEN:-}" ]; then
544
  echo "Saving state before exit..."
545
+ timeout 8s python3 /home/node/app/openclaw-sync.py sync-once-settled || \
546
+ echo "Warning: could not complete settled shutdown sync"
547
+ sleep 1
548
  python3 /home/node/app/openclaw-sync.py sync-once || \
549
+ echo "Warning: could not complete final shutdown sync"
550
+ elif [ -f "/home/node/app/openclaw-sync.py" ]; then
551
+ echo "HF_TOKEN not set; skipping shutdown backup sync."
552
  fi
553
  kill $(jobs -p) 2>/dev/null
554
  exit 0