somratpro commited on
Commit
0f21014
Β·
1 Parent(s): c591c73

refactor: overhaul workspace sync to use robust huggingface_hub operations and local fingerprinting

Browse files
Files changed (5) hide show
  1. CHANGELOG.md +19 -0
  2. README.md +11 -10
  3. health-server.js +95 -16
  4. start.sh +17 -95
  5. workspace-sync.py +280 -226
CHANGELOG.md CHANGED
@@ -2,6 +2,25 @@
2
 
3
  All notable changes to this project will be documented in this file.
4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ## [1.3.0] - 2026-04-04
6
 
7
  ### Added
 
2
 
3
  All notable changes to this project will be documented in this file.
4
 
5
+ ## [1.4.0] - 2026-04-25
6
+
7
+ ### Added
8
+
9
+ - **Custom OpenAI-compatible provider registration** β€” HuggingClaw can now register a custom provider at startup with `CUSTOM_PROVIDER_NAME`, `CUSTOM_BASE_URL`, and `CUSTOM_MODEL_ID`, so you can point `LLM_MODEL` at your own OpenAI-compatible endpoint without modifying the OpenClaw CLI
10
+
11
+ ### Changed
12
+
13
+ - **HF backup flow simplified** β€” HuggingClaw now uses `huggingface_hub` directly for restore and sync, matching the safer dataset-based pattern used in Hugging8n
14
+ - **HF username no longer required in most cases** β€” backup namespace resolution now works from `HF_USERNAME`, `SPACE_AUTHOR_NAME`, or the authenticated HF token, so `HF_TOKEN` is usually enough on its own
15
+ - **Startup restore path modernized** β€” startup now restores workspace and hidden state through `workspace-sync.py restore` instead of configuring a token-bearing git remote
16
+ - **README refreshed for the new backup model** β€” documentation now describes token-only backup setup, the removed git sync assumptions, and the hardened dashboard helper behavior
17
+
18
+ ### Fixed
19
+
20
+ - **HF token exposure risk in git remotes** β€” removed the old authenticated remote URL pattern that could leave `HF_TOKEN` embedded in workspace git configuration
21
+ - **Backup status detection mismatch** β€” dashboard and startup summary now treat backup as enabled when `HF_TOKEN` is present, which matches the new auto-namespace flow
22
+ - **UptimeRobot setup hardening gap** β€” dashboard setup now supports explicit enable/disable control, request rate limiting, origin validation, and earlier API-key validation
23
+
24
  ## [1.3.0] - 2026-04-04
25
 
26
  ### Added
README.md CHANGED
@@ -50,7 +50,7 @@ secrets:
50
  - ⚑ **Zero Config:** Duplicate this Space and set **just three** secrets (LLM_API_KEY, LLM_MODEL, GATEWAY_TOKEN) – no other setup needed.
51
  - 🐳 **Fast Builds:** Uses a pre-built OpenClaw Docker image to deploy in minutes.
52
  - 🌐 **Built-In Browser:** Headless Chromium is included in the Space, so browser actions work from the start.
53
- - πŸ’Ύ **Workspace Backup:** Chats, settings, and WhatsApp session state sync to a private HF Dataset via the `huggingface_hub` (Git fallback), preserving data automatically.
54
  - ⏰ **External Keep-Alive:** Set up a one-time UptimeRobot monitor from the dashboard to help keep free HF Spaces awake.
55
  - πŸ‘₯ **Multi-User Messaging:** Support for Telegram (multi-user) and WhatsApp (pairing).
56
  - πŸ“Š **Visual Dashboard:** Beautiful Web UI to monitor uptime, sync status, and active models.
@@ -120,16 +120,16 @@ To use WhatsApp, enable the channel and scan the QR code from the Control UI (**
120
 
121
  ## πŸ’Ύ Workspace Backup *(Optional)*
122
 
123
- For persistent chat history and configuration, HuggingClaw can sync your workspace to a private HuggingFace Dataset. On first run, it will automatically create (or use) the Dataset repo `HF_USERNAME/SPACE-backup`, restore your workspace on startup, and sync changes periodically.
 
 
124
 
125
  | Variable | Default | Description |
126
  | :--- | :--- | :--- |
127
- | `HF_USERNAME` | β€” | Your HuggingFace username |
128
  | `HF_TOKEN` | β€” | HF token with write access |
129
  | `BACKUP_DATASET_NAME` | `huggingclaw-backup` | Dataset name for backup repo |
130
  | `SYNC_INTERVAL` | `180` | Sync interval in seconds |
131
- | `WORKSPACE_GIT_USER` | `openclaw@example.com` | Git commit email for syncs |
132
- | `WORKSPACE_GIT_NAME` | `OpenClaw Bot` | Git commit name for syncs |
133
 
134
  > [!TIP]
135
  > This backup also stores a hidden copy of your WhatsApp session credentials, allowing paired logins to survive Space restarts automatically.
@@ -155,6 +155,7 @@ What happens next:
155
  - You only need to do this once
156
 
157
  You do **not** need to add this key to Hugging Face Space Secrets.
 
158
 
159
  Note:
160
 
@@ -290,7 +291,7 @@ openclaw channels login --gateway https://YOUR_SPACE_NAME.hf.space
290
  HuggingClaw/
291
  β”œβ”€β”€ Dockerfile # Multi-stage build using pre-built OpenClaw image
292
  β”œβ”€β”€ start.sh # Config generator, validator, and orchestrator
293
- β”œβ”€β”€ workspace-sync.py # Syncs workspace to HF Datasets (with Git fallback)
294
  β”œβ”€β”€ health-server.js # /health endpoint for uptime checks
295
  β”œβ”€β”€ dns-fix.js # DNS-over-HTTPS fallback (for blocked domains)
296
  β”œβ”€β”€ .env.example # Environment variable reference
@@ -299,8 +300,8 @@ HuggingClaw/
299
  **Startup sequence:**
300
  1. Validate required secrets (fail fast with clear error).
301
  2. Check HF token (warn if expired or missing).
302
- 3. Auto-create backup dataset if missing.
303
- 4. Restore workspace from HF Dataset.
304
  5. Generate `openclaw.json` from environment variables.
305
  6. Print startup summary.
306
  7. Launch background tasks (auto-sync and optional channel helpers).
@@ -312,11 +313,11 @@ HuggingClaw/
312
 
313
  - **Missing secrets:** Ensure `LLM_API_KEY`, `LLM_MODEL`, and `GATEWAY_TOKEN` are set in your Space **Settings β†’ Secrets**.
314
  - **Telegram bot issues:** Verify your `TELEGRAM_BOT_TOKEN`. Check Space logs for lines like `πŸ“± Enabling Telegram`.
315
- - **Backup restore failing:** Make sure `HF_USERNAME` and `HF_TOKEN` are correct (token needs write access to your Dataset).
316
  - **Space keeps sleeping:** Open `/` and use `Keep Space Awake` to create the external monitor.
317
  - **Auth errors / proxy:** If you see reverse-proxy auth errors, add the logged IPs under `TRUSTED_PROXIES` (from logs `remote=x.x.x.x`).
318
  - **Control UI says too many failed authentication attempts:** Wait for the retry window to expire, then open the Space in an incognito window or clear site storage for your Space before logging in again with `GATEWAY_TOKEN`.
319
- - **WhatsApp lost its session after restart:** Make sure `HF_USERNAME` and `HF_TOKEN` are configured so the hidden session backup can be restored on boot.
320
  - **UI blocked (CORS):** Set `ALLOWED_ORIGINS=https://your-space-name.hf.space`.
321
  - **Version mismatches:** Pin a specific OpenClaw build with the `OPENCLAW_VERSION` Variable in HF Spaces, or `--build-arg OPENCLAW_VERSION=...` locally.
322
 
 
50
  - ⚑ **Zero Config:** Duplicate this Space and set **just three** secrets (LLM_API_KEY, LLM_MODEL, GATEWAY_TOKEN) – no other setup needed.
51
  - 🐳 **Fast Builds:** Uses a pre-built OpenClaw Docker image to deploy in minutes.
52
  - 🌐 **Built-In Browser:** Headless Chromium is included in the Space, so browser actions work from the start.
53
+ - πŸ’Ύ **Workspace Backup:** Chats, settings, and WhatsApp session state sync to a private HF Dataset via the `huggingface_hub`, preserving data automatically without storing your HF token in a git remote.
54
  - ⏰ **External Keep-Alive:** Set up a one-time UptimeRobot monitor from the dashboard to help keep free HF Spaces awake.
55
  - πŸ‘₯ **Multi-User Messaging:** Support for Telegram (multi-user) and WhatsApp (pairing).
56
  - πŸ“Š **Visual Dashboard:** Beautiful Web UI to monitor uptime, sync status, and active models.
 
120
 
121
  ## πŸ’Ύ Workspace Backup *(Optional)*
122
 
123
+ As of **v1.4.0**, HuggingClaw uses the safer API-based backup flow by default: no token-bearing git remote is configured, and `HF_USERNAME` is usually optional.
124
+
125
+ For persistent chat history and configuration, HuggingClaw can sync your workspace to a private HuggingFace Dataset. On first run, it will automatically create (or use) the Dataset repo `<your-account>/huggingclaw-backup`, restore your workspace on startup, and sync changes periodically. In most cases, `HF_USERNAME` is no longer required because HuggingClaw can derive the namespace from your token automatically.
126
 
127
  | Variable | Default | Description |
128
  | :--- | :--- | :--- |
129
+ | `HF_USERNAME` | β€” | Optional override for the HuggingFace username/namespace |
130
  | `HF_TOKEN` | β€” | HF token with write access |
131
  | `BACKUP_DATASET_NAME` | `huggingclaw-backup` | Dataset name for backup repo |
132
  | `SYNC_INTERVAL` | `180` | Sync interval in seconds |
 
 
133
 
134
  > [!TIP]
135
  > This backup also stores a hidden copy of your WhatsApp session credentials, allowing paired logins to survive Space restarts automatically.
 
155
  - You only need to do this once
156
 
157
  You do **not** need to add this key to Hugging Face Space Secrets.
158
+ The dashboard helper also rate-limits setup requests and rejects cross-origin submissions.
159
 
160
  Note:
161
 
 
291
  HuggingClaw/
292
  β”œβ”€β”€ Dockerfile # Multi-stage build using pre-built OpenClaw image
293
  β”œβ”€β”€ start.sh # Config generator, validator, and orchestrator
294
+ β”œβ”€β”€ workspace-sync.py # Syncs workspace/state to HF Datasets via huggingface_hub
295
  β”œβ”€β”€ health-server.js # /health endpoint for uptime checks
296
  β”œβ”€β”€ dns-fix.js # DNS-over-HTTPS fallback (for blocked domains)
297
  β”œβ”€β”€ .env.example # Environment variable reference
 
300
  **Startup sequence:**
301
  1. Validate required secrets (fail fast with clear error).
302
  2. Check HF token (warn if expired or missing).
303
+ 3. Resolve the backup namespace from `HF_USERNAME`, `SPACE_AUTHOR_NAME`, or the HF token.
304
+ 4. Auto-create backup dataset if missing and restore workspace/state from it.
305
  5. Generate `openclaw.json` from environment variables.
306
  6. Print startup summary.
307
  7. Launch background tasks (auto-sync and optional channel helpers).
 
313
 
314
  - **Missing secrets:** Ensure `LLM_API_KEY`, `LLM_MODEL`, and `GATEWAY_TOKEN` are set in your Space **Settings β†’ Secrets**.
315
  - **Telegram bot issues:** Verify your `TELEGRAM_BOT_TOKEN`. Check Space logs for lines like `πŸ“± Enabling Telegram`.
316
+ - **Backup restore failing:** Make sure `HF_TOKEN` is valid and has write access to your HF account dataset. Set `HF_USERNAME` only if auto-detection is not available in your environment.
317
  - **Space keeps sleeping:** Open `/` and use `Keep Space Awake` to create the external monitor.
318
  - **Auth errors / proxy:** If you see reverse-proxy auth errors, add the logged IPs under `TRUSTED_PROXIES` (from logs `remote=x.x.x.x`).
319
  - **Control UI says too many failed authentication attempts:** Wait for the retry window to expire, then open the Space in an incognito window or clear site storage for your Space before logging in again with `GATEWAY_TOKEN`.
320
+ - **WhatsApp lost its session after restart:** Make sure `HF_TOKEN` is configured so the hidden session backup can be restored on boot.
321
  - **UI blocked (CORS):** Set `ALLOWED_ORIGINS=https://your-space-name.hf.space`.
322
  - **Version mismatches:** Pin a specific OpenClaw build with the `OPENCLAW_VERSION` Variable in HF Spaces, or `--build-arg OPENCLAW_VERSION=...` locally.
323
 
health-server.js CHANGED
@@ -12,7 +12,7 @@ const LLM_MODEL = process.env.LLM_MODEL || "Not Set";
12
  const TELEGRAM_ENABLED = !!process.env.TELEGRAM_BOT_TOKEN;
13
  const WHATSAPP_ENABLED = /^true$/i.test(process.env.WHATSAPP_ENABLED || "");
14
  const WHATSAPP_STATUS_FILE = "/tmp/huggingclaw-wa-status.json";
15
- const HF_BACKUP_ENABLED = !!(process.env.HF_USERNAME && process.env.HF_TOKEN);
16
  const SYNC_INTERVAL = process.env.SYNC_INTERVAL || "600";
17
  const DASHBOARD_BASE = "/dashboard";
18
  const DASHBOARD_STATUS_PATH = `${DASHBOARD_BASE}/status`;
@@ -20,8 +20,16 @@ const DASHBOARD_HEALTH_PATH = `${DASHBOARD_BASE}/health`;
20
  const DASHBOARD_UPTIMEROBOT_PATH = `${DASHBOARD_BASE}/uptimerobot/setup`;
21
  const DASHBOARD_APP_BASE = `${DASHBOARD_BASE}/app`;
22
  const APP_BASE = "/app";
 
 
 
 
 
 
 
23
  const SPACE_VISIBILITY_TTL_MS = 10 * 60 * 1000;
24
  const spaceVisibilityCache = new Map();
 
25
 
26
  function parseRequestUrl(url) {
27
  try {
@@ -80,19 +88,59 @@ function appendForwarded(existingValue, nextValue) {
80
  return `${existingValue}, ${cleanNext}`;
81
  }
82
 
 
 
 
 
 
 
 
 
 
 
 
83
  function buildProxyHeaders(headers, remoteAddress) {
84
  return {
85
  ...headers,
86
- host: headers.host || `${GATEWAY_HOST}:${GATEWAY_PORT}`,
87
- "x-forwarded-for": appendForwarded(
88
- headers["x-forwarded-for"],
89
- remoteAddress,
90
- ),
91
- "x-forwarded-host": headers["x-forwarded-host"] || headers.host || "",
92
  "x-forwarded-proto": headers["x-forwarded-proto"] || "https",
93
  };
94
  }
95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
  function readSyncStatus() {
97
  try {
98
  if (fs.existsSync("/tmp/sync-status.json")) {
@@ -231,7 +279,13 @@ function renderSyncBadge(syncData) {
231
 
232
  function renderDashboard(initialData) {
233
  const controlUiHref = `${APP_BASE}/`;
234
- const keepAwakeHtml = initialData.spacePrivate
 
 
 
 
 
 
235
  ? `
236
  <div id="uptimerobot-private-note" class="helper-summary">
237
  <strong>This Space is private.</strong> External monitors cannot reliably access private HF health URLs, so keep-awake setup is only available on public Spaces.
@@ -739,6 +793,7 @@ function renderDashboard(initialData) {
739
 
740
  const monitorStateKey = 'huggingclaw_uptimerobot_setup_v1';
741
  const KEEP_AWAKE_PRIVATE = ${initialData.spacePrivate ? "true" : "false"};
 
742
 
743
  function setMonitorUiState(isConfigured) {
744
  const summary = document.getElementById('uptimerobot-summary');
@@ -824,7 +879,7 @@ function renderDashboard(initialData) {
824
  updateStats();
825
  setInterval(updateStats, 10000);
826
  document.getElementById('control-ui-link').setAttribute('href', getDashboardBase() + '/app/' + getCurrentSearch());
827
- if (!KEEP_AWAKE_PRIVATE) {
828
  restoreMonitorUiState();
829
  document.getElementById('uptimerobot-btn').addEventListener('click', setupUptimeRobot);
830
  document.getElementById('uptimerobot-toggle').addEventListener('click', toggleMonitorSetup);
@@ -942,13 +997,14 @@ async function createUptimeRobotMonitor(apiKey, host) {
942
  }
943
 
944
  function proxyHttp(req, res, proxyPath = req.url, proxyPort = GATEWAY_PORT) {
 
945
  const proxyReq = http.request(
946
  {
947
  hostname: GATEWAY_HOST,
948
  port: proxyPort,
949
  method: req.method,
950
  path: proxyPath,
951
- headers: buildProxyHeaders(req.headers, req.socket.remoteAddress),
952
  },
953
  (proxyRes) => {
954
  res.writeHead(proxyRes.statusCode || 502, proxyRes.headers);
@@ -970,7 +1026,7 @@ function proxyHttp(req, res, proxyPath = req.url, proxyPort = GATEWAY_PORT) {
970
  req.pipe(proxyReq);
971
  }
972
 
973
- function serializeUpgradeHeaders(req, remoteAddress) {
974
  const forwardedHeaders = [];
975
 
976
  for (let i = 0; i < req.rawHeaders.length; i += 2) {
@@ -979,6 +1035,7 @@ function serializeUpgradeHeaders(req, remoteAddress) {
979
  const lower = name.toLowerCase();
980
 
981
  if (
 
982
  lower === "x-forwarded-for" ||
983
  lower === "x-forwarded-host" ||
984
  lower === "x-forwarded-proto"
@@ -990,10 +1047,13 @@ function serializeUpgradeHeaders(req, remoteAddress) {
990
  }
991
 
992
  forwardedHeaders.push(
993
- `X-Forwarded-For: ${appendForwarded(req.headers["x-forwarded-for"], remoteAddress)}`,
994
  );
995
  forwardedHeaders.push(
996
- `X-Forwarded-Host: ${req.headers["x-forwarded-host"] || req.headers.host || ""}`,
 
 
 
997
  );
998
  forwardedHeaders.push(
999
  `X-Forwarded-Proto: ${req.headers["x-forwarded-proto"] || "https"}`,
@@ -1010,11 +1070,12 @@ function proxyUpgrade(
1010
  proxyPort = GATEWAY_PORT,
1011
  ) {
1012
  const proxySocket = net.connect(proxyPort, GATEWAY_HOST);
 
1013
 
1014
  proxySocket.on("connect", () => {
1015
  const requestLines = [
1016
  `${req.method} ${proxyPath} HTTP/${req.httpVersion}`,
1017
- ...serializeUpgradeHeaders(req, req.socket.remoteAddress),
1018
  "",
1019
  "",
1020
  ];
@@ -1092,15 +1153,33 @@ const server = http.createServer((req, res) => {
1092
 
1093
  void (async () => {
1094
  try {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1095
  const body = await readRequestBody(req);
1096
  const parsed = JSON.parse(body || "{}");
1097
  const apiKey = String(parsed.apiKey || "").trim();
1098
 
1099
- if (!apiKey) {
1100
  res.writeHead(400, { "Content-Type": "application/json" });
1101
  res.end(
1102
  JSON.stringify({
1103
- message: "Paste your UptimeRobot Main API key first.",
1104
  }),
1105
  );
1106
  return;
 
12
  const TELEGRAM_ENABLED = !!process.env.TELEGRAM_BOT_TOKEN;
13
  const WHATSAPP_ENABLED = /^true$/i.test(process.env.WHATSAPP_ENABLED || "");
14
  const WHATSAPP_STATUS_FILE = "/tmp/huggingclaw-wa-status.json";
15
+ const HF_BACKUP_ENABLED = !!process.env.HF_TOKEN;
16
  const SYNC_INTERVAL = process.env.SYNC_INTERVAL || "600";
17
  const DASHBOARD_BASE = "/dashboard";
18
  const DASHBOARD_STATUS_PATH = `${DASHBOARD_BASE}/status`;
 
20
  const DASHBOARD_UPTIMEROBOT_PATH = `${DASHBOARD_BASE}/uptimerobot/setup`;
21
  const DASHBOARD_APP_BASE = `${DASHBOARD_BASE}/app`;
22
  const APP_BASE = "/app";
23
+ const UPTIMEROBOT_SETUP_ENABLED =
24
+ String(process.env.UPTIMEROBOT_SETUP_ENABLED || "true").toLowerCase() ===
25
+ "true";
26
+ const UPTIMEROBOT_RATE_WINDOW_MS = 60 * 1000;
27
+ const UPTIMEROBOT_RATE_MAX = Number(
28
+ process.env.UPTIMEROBOT_RATE_LIMIT_PER_MINUTE || 5,
29
+ );
30
  const SPACE_VISIBILITY_TTL_MS = 10 * 60 * 1000;
31
  const spaceVisibilityCache = new Map();
32
+ const uptimerobotRateMap = new Map();
33
 
34
  function parseRequestUrl(url) {
35
  try {
 
88
  return `${existingValue}, ${cleanNext}`;
89
  }
90
 
91
+ function getForwardedClientIp(req) {
92
+ const forwardedFor = req.headers["x-forwarded-for"];
93
+ if (Array.isArray(forwardedFor) && forwardedFor.length > 0) {
94
+ return String(forwardedFor[0]).split(",")[0].trim();
95
+ }
96
+ if (typeof forwardedFor === "string" && forwardedFor.trim()) {
97
+ return forwardedFor.split(",")[0].trim();
98
+ }
99
+ return req.socket.remoteAddress || "";
100
+ }
101
+
102
  function buildProxyHeaders(headers, remoteAddress) {
103
  return {
104
  ...headers,
105
+ host: `${GATEWAY_HOST}:${GATEWAY_PORT}`,
106
+ "x-forwarded-for": remoteAddress || "",
107
+ "x-forwarded-host": headers.host || "",
 
 
 
108
  "x-forwarded-proto": headers["x-forwarded-proto"] || "https",
109
  };
110
  }
111
 
112
+ function getRequesterIp(req) {
113
+ return (
114
+ getForwardedClientIp(req) ||
115
+ req.socket.remoteAddress ||
116
+ "unknown"
117
+ );
118
+ }
119
+
120
+ function isRateLimited(req) {
121
+ const now = Date.now();
122
+ const ip = getRequesterIp(req);
123
+ const bucket = uptimerobotRateMap.get(ip) || [];
124
+ const recent = bucket.filter((ts) => now - ts < UPTIMEROBOT_RATE_WINDOW_MS);
125
+ recent.push(now);
126
+ uptimerobotRateMap.set(ip, recent);
127
+ return recent.length > UPTIMEROBOT_RATE_MAX;
128
+ }
129
+
130
+ function isAllowedUptimeSetupOrigin(req) {
131
+ const host = String(req.headers.host || "").toLowerCase();
132
+ const origin = String(req.headers.origin || "").toLowerCase();
133
+ const referer = String(req.headers.referer || "").toLowerCase();
134
+ if (!host) return false;
135
+ if (origin && !origin.includes(host)) return false;
136
+ if (referer && !referer.includes(host)) return false;
137
+ return true;
138
+ }
139
+
140
+ function isValidUptimeApiKey(key) {
141
+ return /^[A-Za-z0-9_-]{20,128}$/.test(String(key || ""));
142
+ }
143
+
144
  function readSyncStatus() {
145
  try {
146
  if (fs.existsSync("/tmp/sync-status.json")) {
 
279
 
280
  function renderDashboard(initialData) {
281
  const controlUiHref = `${APP_BASE}/`;
282
+ const keepAwakeHtml = !UPTIMEROBOT_SETUP_ENABLED
283
+ ? `
284
+ <div id="uptimerobot-private-note" class="helper-summary">
285
+ UptimeRobot setup is disabled for this Space.
286
+ </div>
287
+ `
288
+ : initialData.spacePrivate
289
  ? `
290
  <div id="uptimerobot-private-note" class="helper-summary">
291
  <strong>This Space is private.</strong> External monitors cannot reliably access private HF health URLs, so keep-awake setup is only available on public Spaces.
 
793
 
794
  const monitorStateKey = 'huggingclaw_uptimerobot_setup_v1';
795
  const KEEP_AWAKE_PRIVATE = ${initialData.spacePrivate ? "true" : "false"};
796
+ const KEEP_AWAKE_SETUP_ENABLED = ${UPTIMEROBOT_SETUP_ENABLED ? "true" : "false"};
797
 
798
  function setMonitorUiState(isConfigured) {
799
  const summary = document.getElementById('uptimerobot-summary');
 
879
  updateStats();
880
  setInterval(updateStats, 10000);
881
  document.getElementById('control-ui-link').setAttribute('href', getDashboardBase() + '/app/' + getCurrentSearch());
882
+ if (KEEP_AWAKE_SETUP_ENABLED && !KEEP_AWAKE_PRIVATE) {
883
  restoreMonitorUiState();
884
  document.getElementById('uptimerobot-btn').addEventListener('click', setupUptimeRobot);
885
  document.getElementById('uptimerobot-toggle').addEventListener('click', toggleMonitorSetup);
 
997
  }
998
 
999
  function proxyHttp(req, res, proxyPath = req.url, proxyPort = GATEWAY_PORT) {
1000
+ const clientIp = getForwardedClientIp(req);
1001
  const proxyReq = http.request(
1002
  {
1003
  hostname: GATEWAY_HOST,
1004
  port: proxyPort,
1005
  method: req.method,
1006
  path: proxyPath,
1007
+ headers: buildProxyHeaders(req.headers, clientIp),
1008
  },
1009
  (proxyRes) => {
1010
  res.writeHead(proxyRes.statusCode || 502, proxyRes.headers);
 
1026
  req.pipe(proxyReq);
1027
  }
1028
 
1029
+ function serializeUpgradeHeaders(req, clientIp, proxyPort) {
1030
  const forwardedHeaders = [];
1031
 
1032
  for (let i = 0; i < req.rawHeaders.length; i += 2) {
 
1035
  const lower = name.toLowerCase();
1036
 
1037
  if (
1038
+ lower === "host" ||
1039
  lower === "x-forwarded-for" ||
1040
  lower === "x-forwarded-host" ||
1041
  lower === "x-forwarded-proto"
 
1047
  }
1048
 
1049
  forwardedHeaders.push(
1050
+ `Host: ${GATEWAY_HOST}:${proxyPort}`,
1051
  );
1052
  forwardedHeaders.push(
1053
+ `X-Forwarded-For: ${clientIp || ""}`,
1054
+ );
1055
+ forwardedHeaders.push(
1056
+ `X-Forwarded-Host: ${req.headers.host || ""}`,
1057
  );
1058
  forwardedHeaders.push(
1059
  `X-Forwarded-Proto: ${req.headers["x-forwarded-proto"] || "https"}`,
 
1070
  proxyPort = GATEWAY_PORT,
1071
  ) {
1072
  const proxySocket = net.connect(proxyPort, GATEWAY_HOST);
1073
+ const clientIp = getForwardedClientIp(req);
1074
 
1075
  proxySocket.on("connect", () => {
1076
  const requestLines = [
1077
  `${req.method} ${proxyPath} HTTP/${req.httpVersion}`,
1078
+ ...serializeUpgradeHeaders(req, clientIp, proxyPort),
1079
  "",
1080
  "",
1081
  ];
 
1153
 
1154
  void (async () => {
1155
  try {
1156
+ if (!UPTIMEROBOT_SETUP_ENABLED) {
1157
+ res.writeHead(403, { "Content-Type": "application/json" });
1158
+ res.end(JSON.stringify({ message: "Uptime setup is disabled." }));
1159
+ return;
1160
+ }
1161
+
1162
+ if (isRateLimited(req)) {
1163
+ res.writeHead(429, { "Content-Type": "application/json" });
1164
+ res.end(JSON.stringify({ message: "Too many requests." }));
1165
+ return;
1166
+ }
1167
+
1168
+ if (!isAllowedUptimeSetupOrigin(req)) {
1169
+ res.writeHead(403, { "Content-Type": "application/json" });
1170
+ res.end(JSON.stringify({ message: "Invalid request origin." }));
1171
+ return;
1172
+ }
1173
+
1174
  const body = await readRequestBody(req);
1175
  const parsed = JSON.parse(body || "{}");
1176
  const apiKey = String(parsed.apiKey || "").trim();
1177
 
1178
+ if (!isValidUptimeApiKey(apiKey)) {
1179
  res.writeHead(400, { "Content-Type": "application/json" });
1180
  res.end(
1181
  JSON.stringify({
1182
+ message: "A valid API key is required.",
1183
  }),
1184
  );
1185
  return;
start.sh CHANGED
@@ -1,5 +1,7 @@
1
  #!/bin/bash
2
- set -e
 
 
3
 
4
  # ════════════════════════════════════════════════════════════════
5
  # HuggingClaw β€” OpenClaw Gateway for HF Spaces
@@ -127,95 +129,13 @@ if [ -n "$HF_TOKEN" ]; then
127
  fi
128
  fi
129
 
130
- # ── Auto-create + Restore workspace from HF Dataset ──
131
- if [ -n "$HF_USERNAME" ] && [ -n "$HF_TOKEN" ]; then
132
- BACKUP_DATASET="${BACKUP_DATASET_NAME:-huggingclaw-backup}"
133
- BACKUP_URL="https://${HF_USERNAME}:${HF_TOKEN}@huggingface.co/datasets/${HF_USERNAME}/${BACKUP_DATASET}"
134
-
135
- # Auto-create the dataset if it doesn't exist
136
- echo "πŸ“¦ Checking HF Dataset: ${HF_USERNAME}/${BACKUP_DATASET}..."
137
- DATASET_CHECK=$(curl -s -o /dev/null -w "%{http_code}" \
138
- -H "Authorization: Bearer $HF_TOKEN" \
139
- "https://huggingface.co/api/datasets/${HF_USERNAME}/${BACKUP_DATASET}" \
140
- --max-time 10 2>/dev/null || echo "000")
141
-
142
- if [ "$DATASET_CHECK" = "404" ]; then
143
- echo " πŸ“ Dataset not found, creating ${HF_USERNAME}/${BACKUP_DATASET}..."
144
- CREATE_RESULT=$(curl -s -w "\n%{http_code}" \
145
- -X POST "https://huggingface.co/api/repos/create" \
146
- -H "Authorization: Bearer $HF_TOKEN" \
147
- -H "Content-Type: application/json" \
148
- -d "{\"type\":\"dataset\",\"name\":\"${BACKUP_DATASET}\",\"private\":true}" \
149
- --max-time 15 2>/dev/null || echo "error")
150
- CREATE_STATUS=$(echo "$CREATE_RESULT" | tail -1)
151
- if [ "$CREATE_STATUS" = "200" ] || [ "$CREATE_STATUS" = "201" ]; then
152
- echo " βœ… Dataset created: ${HF_USERNAME}/${BACKUP_DATASET} (private)"
153
- else
154
- echo " ⚠️ Could not create dataset (HTTP $CREATE_STATUS). Create it manually:"
155
- echo " https://huggingface.co/datasets/create"
156
- fi
157
- elif [ "$DATASET_CHECK" = "200" ]; then
158
- echo " βœ… Dataset exists"
159
- else
160
- echo " ⚠️ Could not check dataset (HTTP $DATASET_CHECK)"
161
- fi
162
-
163
- # Restore workspace
164
- echo "πŸ“¦ Restoring workspace..."
165
- WORKSPACE="/home/node/.openclaw/workspace"
166
- GIT_USER_EMAIL="${WORKSPACE_GIT_USER:-openclaw@example.com}"
167
- GIT_USER_NAME="${WORKSPACE_GIT_NAME:-OpenClaw Bot}"
168
-
169
- cd "$WORKSPACE"
170
- if [ ! -d ".git" ]; then
171
- git init -q
172
- git remote add origin "$BACKUP_URL"
173
- else
174
- git remote set-url origin "$BACKUP_URL"
175
- fi
176
-
177
- git config user.email "$GIT_USER_EMAIL"
178
- git config user.name "$GIT_USER_NAME"
179
-
180
- if git fetch origin main 2>/dev/null; then
181
- git reset --hard origin/main 2>/dev/null && echo " βœ… Workspace restored!"
182
- else
183
- echo " ⚠️ No remote data yet, starting fresh."
184
- fi
185
- cd /
186
- fi
187
-
188
- # ── Restore persisted OpenClaw state (if present) ──
189
- STATE_BACKUP_ROOT="/home/node/.openclaw/workspace/.huggingclaw-state/openclaw"
190
- if [ -d "$STATE_BACKUP_ROOT" ]; then
191
- echo "🧠 Restoring OpenClaw state..."
192
- for source_path in "$STATE_BACKUP_ROOT"/*; do
193
- [ -e "$source_path" ] || continue
194
- name="$(basename "$source_path")"
195
- target_path="/home/node/.openclaw/${name}"
196
-
197
- rm -rf "$target_path"
198
- mkdir -p "$(dirname "$target_path")"
199
- cp -R "$source_path" "$target_path"
200
- done
201
- echo " βœ… OpenClaw state restored"
202
- fi
203
-
204
- # ── Restore persisted WhatsApp credentials (if present) ──
205
- WA_BACKUP_DIR="/home/node/.openclaw/workspace/.huggingclaw-state/credentials/whatsapp/default"
206
- WA_CREDS_DIR="/home/node/.openclaw/credentials/whatsapp/default"
207
- if [ "$WHATSAPP_ENABLED_NORMALIZED" = "true" ] && [ -d "$WA_BACKUP_DIR" ]; then
208
- WA_FILE_COUNT=$(find "$WA_BACKUP_DIR" -type f | wc -l | tr -d ' ')
209
- if [ "$WA_FILE_COUNT" -ge 2 ]; then
210
- echo "πŸ“± Restoring WhatsApp credentials..."
211
- rm -rf "$WA_CREDS_DIR"
212
- mkdir -p "$(dirname "$WA_CREDS_DIR")"
213
- cp -R "$WA_BACKUP_DIR" "$WA_CREDS_DIR"
214
- chmod -R go-rwx /home/node/.openclaw/credentials/whatsapp 2>/dev/null || true
215
- echo " βœ… WhatsApp credentials restored"
216
- else
217
- echo " ⚠️ Saved WhatsApp credentials look incomplete (${WA_FILE_COUNT} files), skipping restore."
218
- fi
219
  fi
220
 
221
  # ── Build config ──
@@ -419,8 +339,8 @@ printf " β”‚ %-40s β”‚\n" "Browser: βœ… ${BROWSER_EXECUTABLE_PATH}"
419
  else
420
  printf " β”‚ %-40s β”‚\n" "Browser: ❌ unavailable"
421
  fi
422
- if [ -n "$HF_USERNAME" ] && [ -n "$HF_TOKEN" ]; then
423
- printf " β”‚ %-40s β”‚\n" "Backup: βœ… ${HF_USERNAME}/${BACKUP_DATASET:-huggingclaw-backup}"
424
  else
425
  printf " β”‚ %-40s β”‚\n" "Backup: ❌ not configured"
426
  fi
@@ -434,7 +354,7 @@ printf " β”‚ %-40s β”‚\n" "Control UI: https://${SPACE_HOST}/app"
434
  printf " β”‚ %-40s β”‚\n" "Dashboard: https://${SPACE_HOST}"
435
  fi
436
  SYNC_STATUS="❌ disabled"
437
- if [ -n "$HF_USERNAME" ] && [ -n "$HF_TOKEN" ]; then
438
  SYNC_STATUS="βœ… every ${SYNC_INTERVAL:-180}s"
439
  fi
440
  printf " β”‚ %-40s β”‚\n" "Auto-sync: $SYNC_STATUS"
@@ -459,7 +379,7 @@ graceful_shutdown() {
459
 
460
  if [ -f "/home/node/app/workspace-sync.py" ]; then
461
  echo "πŸ’Ύ Saving OpenClaw state before exit..."
462
- python3 /home/node/app/workspace-sync.py --sync-once || \
463
  echo " ⚠️ Could not complete shutdown sync"
464
  fi
465
 
@@ -530,7 +450,9 @@ fi
530
  warmup_browser
531
 
532
  # 12. Start Workspace Sync after startup settles
533
- python3 -u /home/node/app/workspace-sync.py &
 
 
534
 
535
  # Wait for gateway (allows trap to fire)
536
  wait $GATEWAY_PID
 
1
  #!/bin/bash
2
+ set -euo pipefail
3
+
4
+ umask 0077
5
 
6
  # ════════════════════════════════════════════════════════════════
7
  # HuggingClaw β€” OpenClaw Gateway for HF Spaces
 
129
  fi
130
  fi
131
 
132
+ # ── Restore workspace/state from HF Dataset ──
133
+ BACKUP_DATASET="${BACKUP_DATASET_NAME:-huggingclaw-backup}"
134
+ if [ -n "${HF_TOKEN:-}" ]; then
135
+ echo "πŸ“¦ Restoring workspace and state from HF Dataset..."
136
+ python3 /home/node/app/workspace-sync.py restore || true
137
+ else
138
+ echo "HF_TOKEN is not set. Running without dataset persistence."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
139
  fi
140
 
141
  # ── Build config ──
 
339
  else
340
  printf " β”‚ %-40s β”‚\n" "Browser: ❌ unavailable"
341
  fi
342
+ if [ -n "${HF_TOKEN:-}" ]; then
343
+ printf " β”‚ %-40s β”‚\n" "Backup: βœ… ${BACKUP_DATASET:-huggingclaw-backup} (auto namespace)"
344
  else
345
  printf " β”‚ %-40s β”‚\n" "Backup: ❌ not configured"
346
  fi
 
354
  printf " β”‚ %-40s β”‚\n" "Dashboard: https://${SPACE_HOST}"
355
  fi
356
  SYNC_STATUS="❌ disabled"
357
+ if [ -n "${HF_TOKEN:-}" ]; then
358
  SYNC_STATUS="βœ… every ${SYNC_INTERVAL:-180}s"
359
  fi
360
  printf " β”‚ %-40s β”‚\n" "Auto-sync: $SYNC_STATUS"
 
379
 
380
  if [ -f "/home/node/app/workspace-sync.py" ]; then
381
  echo "πŸ’Ύ Saving OpenClaw state before exit..."
382
+ python3 /home/node/app/workspace-sync.py sync-once || \
383
  echo " ⚠️ Could not complete shutdown sync"
384
  fi
385
 
 
450
  warmup_browser
451
 
452
  # 12. Start Workspace Sync after startup settles
453
+ if [ -n "${HF_TOKEN:-}" ]; then
454
+ python3 -u /home/node/app/workspace-sync.py loop &
455
+ fi
456
 
457
  # Wait for gateway (allows trap to fire)
458
  wait $GATEWAY_PID
workspace-sync.py CHANGED
@@ -1,24 +1,39 @@
1
  #!/usr/bin/env python3
2
  """
3
- HuggingClaw Workspace Sync β€” HuggingFace Hub based backup
4
- Uses huggingface_hub Python library instead of git for more reliable
5
- HF Dataset operations (handles auth, LFS, retries automatically).
6
 
7
- Falls back to git-based sync if HF_USERNAME or HF_TOKEN are not set.
 
 
8
  """
9
 
 
 
10
  import os
 
 
11
  import sys
 
 
12
  import time
13
- import signal
14
- import shutil
15
- import subprocess
16
  from pathlib import Path
17
 
18
  os.environ.setdefault("HF_HUB_DISABLE_PROGRESS_BARS", "1")
19
 
 
 
 
20
  OPENCLAW_HOME = Path("/home/node/.openclaw")
21
  WORKSPACE = OPENCLAW_HOME / "workspace"
 
 
 
 
 
 
 
 
 
22
  STATE_DIR = WORKSPACE / ".huggingclaw-state"
23
  OPENCLAW_STATE_BACKUP_DIR = STATE_DIR / "openclaw"
24
  EXCLUDED_STATE_NAMES = {
@@ -27,41 +42,32 @@ EXCLUDED_STATE_NAMES = {
27
  "gateway.log",
28
  "browser",
29
  }
30
- WHATSAPP_CREDS_DIR = Path("/home/node/.openclaw/credentials/whatsapp/default")
31
  WHATSAPP_BACKUP_DIR = STATE_DIR / "credentials" / "whatsapp" / "default"
32
  RESET_MARKER = WORKSPACE / ".reset_credentials"
33
- INTERVAL = int(os.environ.get("SYNC_INTERVAL", "180"))
34
- INITIAL_DELAY = int(os.environ.get("SYNC_START_DELAY", "10"))
35
- HF_TOKEN = os.environ.get("HF_TOKEN", "")
36
- HF_USERNAME = os.environ.get("HF_USERNAME", "")
37
- BACKUP_DATASET = os.environ.get("BACKUP_DATASET_NAME", "huggingclaw-backup")
38
- WEBHOOK_URL = os.environ.get("WEBHOOK_URL", "")
39
- WHATSAPP_ENABLED = os.environ.get("WHATSAPP_ENABLED", "").strip().lower() == "true"
40
-
41
- running = True
42
 
43
- def signal_handler(sig, frame):
44
- global running
45
- running = False
46
 
47
- signal.signal(signal.SIGTERM, signal_handler)
48
- signal.signal(signal.SIGINT, signal_handler)
 
 
 
 
 
 
 
49
 
50
 
51
  def count_files(path: Path) -> int:
52
- """Count regular files recursively under a path."""
53
  if not path.exists():
54
  return 0
55
  return sum(1 for child in path.rglob("*") if child.is_file())
56
 
57
 
58
  def snapshot_state_into_workspace() -> None:
59
- """
60
- Mirror persistent state into the workspace-backed dataset repo.
61
-
62
- This keeps WhatsApp credentials in a hidden folder that is synced together
63
- with the workspace, without changing the live credentials location.
64
- """
65
  try:
66
  STATE_DIR.mkdir(parents=True, exist_ok=True)
67
  if OPENCLAW_STATE_BACKUP_DIR.exists():
@@ -77,8 +83,8 @@ def snapshot_state_into_workspace() -> None:
77
  shutil.copytree(source_path, backup_path)
78
  elif source_path.is_file():
79
  shutil.copy2(source_path, backup_path)
80
- except Exception as e:
81
- print(f" ⚠️ Could not snapshot OpenClaw state: {e}")
82
 
83
  try:
84
  if not WHATSAPP_ENABLED:
@@ -106,233 +112,281 @@ def snapshot_state_into_workspace() -> None:
106
  if WHATSAPP_BACKUP_DIR.exists():
107
  shutil.rmtree(WHATSAPP_BACKUP_DIR, ignore_errors=True)
108
  shutil.copytree(WHATSAPP_CREDS_DIR, WHATSAPP_BACKUP_DIR)
109
- except Exception as e:
110
- print(f" ⚠️ Could not snapshot WhatsApp state: {e}")
111
-
112
-
113
- def has_changes():
114
- """Check if workspace has uncommitted changes (git-based check)."""
115
- try:
116
- subprocess.run(["git", "add", "-A"], cwd=WORKSPACE, capture_output=True)
117
- result = subprocess.run(
118
- ["git", "diff", "--cached", "--quiet"],
119
- cwd=WORKSPACE, capture_output=True
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
  )
121
- return result.returncode != 0
122
- except Exception:
123
- return False
124
 
125
- def write_sync_status(status, message=""):
126
- """Write sync status to file for the health server dashboard."""
127
- try:
128
- import json
129
- data = {
130
- "status": status,
131
- "timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
132
- "message": message
133
- }
134
- with open("/tmp/sync-status.json", "w") as f:
135
- json.dump(data, f)
136
- except Exception as e:
137
- print(f" ⚠️ Could not write sync status: {e}")
138
-
139
- def trigger_webhook(event, status, message):
140
- """Trigger webhook notification."""
141
- if not WEBHOOK_URL:
142
- return
143
- try:
144
- import urllib.request
145
- import json
146
- data = json.dumps({
147
- "event": event,
148
- "status": status,
149
- "message": message,
150
- "timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
151
- }).encode('utf-8')
152
- req = urllib.request.Request(WEBHOOK_URL, data=data, headers={'Content-Type': 'application/json'})
153
- urllib.request.urlopen(req, timeout=10)
154
- except Exception as e:
155
- print(f" ⚠️ Webhook delivery failed: {e}")
156
-
157
- def sync_with_hf_hub():
158
- """Sync workspace using huggingface_hub library."""
159
- try:
160
- from huggingface_hub import HfApi, upload_folder
161
 
162
- api = HfApi(token=HF_TOKEN)
163
- repo_id = f"{HF_USERNAME}/{BACKUP_DATASET}"
164
 
165
- # Ensure dataset exists
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
166
  try:
167
- api.repo_info(repo_id=repo_id, repo_type="dataset")
168
- except Exception:
169
- print(f" πŸ“ Creating dataset {repo_id}...")
170
- try:
171
- api.create_repo(repo_id=repo_id, repo_type="dataset", private=True)
172
- print(f" βœ… Dataset created: {repo_id}")
173
- except Exception as e:
174
- print(f" ⚠️ Could not create dataset: {e}")
175
- return False
176
-
177
- # Upload workspace
178
- upload_folder(
179
- folder_path=str(WORKSPACE),
180
- repo_id=repo_id,
181
- repo_type="dataset",
182
- token=HF_TOKEN,
183
- commit_message=f"Auto-sync {time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime())}",
184
- ignore_patterns=[".git/*", ".git"],
185
- )
186
- return True
187
-
188
- except ImportError:
189
- print(" ⚠️ huggingface_hub not installed, falling back to git")
190
- return False
191
- except Exception as e:
192
- print(f" ⚠️ HF Hub sync failed: {e}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
193
  return False
194
 
 
 
195
 
196
- def sync_with_git():
197
- """Fallback: sync workspace using git."""
198
  try:
199
- ts = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
200
- subprocess.run(["git", "add", "-A"], cwd=WORKSPACE, capture_output=True)
201
- subprocess.run(
202
- ["git", "commit", "-m", f"Auto-sync {ts}"],
203
- cwd=WORKSPACE, capture_output=True
204
- )
205
- result = subprocess.run(
206
- ["git", "push", "origin", "main"],
207
- cwd=WORKSPACE, capture_output=True
208
- )
209
- return result.returncode == 0
210
- except Exception:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
211
  return False
212
 
213
 
214
- def run_sync_pass(use_hf_hub: bool) -> None:
215
- """Snapshot state and push it if anything changed."""
216
- snapshot_state_into_workspace()
217
-
218
- if not has_changes():
219
- return
220
-
221
- ts = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
222
- write_sync_status("syncing", f"Starting sync at {ts}")
223
-
224
- if use_hf_hub:
225
- if sync_with_hf_hub():
226
- print(f"πŸ”„ Workspace sync (hf_hub): pushed changes ({ts})")
227
- write_sync_status("success", "Successfully pushed to HF Hub")
228
- return
229
-
230
- if sync_with_git():
231
- print(f"πŸ”„ Workspace sync (git fallback): pushed changes ({ts})")
232
- write_sync_status("success", "Successfully pushed via git fallback")
233
- return
234
-
235
- msg = f"Workspace sync: failed ({ts}), will retry"
236
- print(f"πŸ”„ {msg}")
237
- write_sync_status("error", msg)
238
- trigger_webhook("sync", "error", msg)
239
- return
240
-
241
- if sync_with_git():
242
- print(f"πŸ”„ Workspace sync (git): pushed changes ({ts})")
243
- write_sync_status("success", "Successfully pushed via git")
244
- return
245
 
246
- msg = f"Workspace sync: push failed ({ts}), will retry"
247
- print(f"πŸ”„ {msg}")
248
- write_sync_status("error", msg)
249
- trigger_webhook("sync", "error", msg)
250
 
 
 
 
251
 
252
- def main():
253
- if "--snapshot-once" in sys.argv:
254
- snapshot_state_into_workspace()
255
- write_sync_status("configured", "State snapshot refreshed during shutdown.")
256
- return
257
 
258
- if "--sync-once" in sys.argv:
259
- if not WORKSPACE.exists():
260
- print("πŸ“ Workspace sync: workspace not found, exiting.")
261
- return
 
 
 
 
 
 
 
 
 
262
 
263
- use_hf_hub = bool(HF_TOKEN and HF_USERNAME)
264
- git_dir = WORKSPACE / ".git"
265
 
266
- if not use_hf_hub and not git_dir.exists():
267
- print("πŸ“ Workspace sync: no git repo and no HF credentials, skipping.")
268
- return
269
 
270
- snapshot_state_into_workspace()
 
271
 
272
- if not has_changes():
273
- print("πŸ“ Workspace sync: no changes to persist.")
274
- write_sync_status("configured", "No new state changes to sync.")
275
- return
276
 
277
- ts = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
278
- write_sync_status("syncing", f"Shutdown sync started at {ts}")
279
-
280
- if use_hf_hub:
281
- if sync_with_hf_hub():
282
- print(f"πŸ”„ Workspace sync (hf_hub): pushed changes ({ts})")
283
- write_sync_status("success", "Shutdown sync pushed to HF Hub")
284
- return
285
- if sync_with_git():
286
- print(f"πŸ”„ Workspace sync (git fallback): pushed changes ({ts})")
287
- write_sync_status("success", "Shutdown sync pushed via git fallback")
288
- return
289
- write_sync_status("error", "Shutdown sync failed")
290
- print("πŸ“ Workspace sync: shutdown sync failed.")
291
- return
292
 
293
- if sync_with_git():
294
- print(f"πŸ”„ Workspace sync (git): pushed changes ({ts})")
295
- write_sync_status("success", "Shutdown sync pushed via git")
296
- return
 
 
 
297
 
298
- write_sync_status("error", "Shutdown sync failed")
299
- print("πŸ“ Workspace sync: shutdown sync failed.")
300
- return
301
 
302
- if not WORKSPACE.exists():
303
- print("πŸ“ Workspace sync: workspace not found, exiting.")
304
- return
305
 
306
- use_hf_hub = bool(HF_TOKEN and HF_USERNAME)
307
- git_dir = WORKSPACE / ".git"
 
 
 
 
308
 
309
- if not use_hf_hub and not git_dir.exists():
310
- print("πŸ“ Workspace sync: no git repo and no HF credentials, skipping.")
311
- return
312
 
313
- # Give the gateway a short head start before the first sync probe.
314
- if use_hf_hub:
315
- write_sync_status("configured", f"Backup enabled. Waiting for next sync in {INTERVAL}s.")
316
- else:
317
- write_sync_status("configured", f"Git sync enabled. Waiting for next sync in {INTERVAL}s.")
318
 
319
- # Give the gateway a short head start before the first sync probe.
320
- time.sleep(INITIAL_DELAY)
321
 
322
- if use_hf_hub:
323
- print(f"πŸ”„ Workspace sync started (huggingface_hub): every {INTERVAL}s β†’ {HF_USERNAME}/{BACKUP_DATASET}")
324
- else:
325
- print(f"πŸ”„ Workspace sync started (git): every {INTERVAL}s")
326
 
327
- run_sync_pass(use_hf_hub)
 
328
 
329
- while running:
330
- time.sleep(INTERVAL)
331
- if not running:
332
- break
 
 
 
 
 
 
 
 
 
333
 
334
- run_sync_pass(use_hf_hub)
 
335
 
336
 
337
  if __name__ == "__main__":
338
- main()
 
1
  #!/usr/bin/env python3
2
  """
3
+ HuggingClaw workspace/state backup via huggingface_hub.
 
 
4
 
5
+ This keeps OpenClaw workspace data, app state, and optional WhatsApp
6
+ credentials inside a private HF dataset without embedding HF tokens in git
7
+ remotes or requiring a manual HF_USERNAME secret.
8
  """
9
 
10
+ import hashlib
11
+ import json
12
  import os
13
+ import shutil
14
+ import signal
15
  import sys
16
+ import tempfile
17
+ import threading
18
  import time
 
 
 
19
  from pathlib import Path
20
 
21
  os.environ.setdefault("HF_HUB_DISABLE_PROGRESS_BARS", "1")
22
 
23
+ from huggingface_hub import HfApi, snapshot_download, upload_folder
24
+ from huggingface_hub.errors import HfHubHTTPError, RepositoryNotFoundError
25
+
26
  OPENCLAW_HOME = Path("/home/node/.openclaw")
27
  WORKSPACE = OPENCLAW_HOME / "workspace"
28
+ STATUS_FILE = Path("/tmp/sync-status.json")
29
+ INTERVAL = int(os.environ.get("SYNC_INTERVAL", "180"))
30
+ INITIAL_DELAY = int(os.environ.get("SYNC_START_DELAY", "10"))
31
+ HF_TOKEN = os.environ.get("HF_TOKEN", "").strip()
32
+ HF_USERNAME = os.environ.get("HF_USERNAME", "").strip()
33
+ SPACE_AUTHOR_NAME = os.environ.get("SPACE_AUTHOR_NAME", "").strip()
34
+ BACKUP_DATASET_NAME = os.environ.get("BACKUP_DATASET_NAME", "huggingclaw-backup").strip()
35
+ WHATSAPP_ENABLED = os.environ.get("WHATSAPP_ENABLED", "").strip().lower() == "true"
36
+
37
  STATE_DIR = WORKSPACE / ".huggingclaw-state"
38
  OPENCLAW_STATE_BACKUP_DIR = STATE_DIR / "openclaw"
39
  EXCLUDED_STATE_NAMES = {
 
42
  "gateway.log",
43
  "browser",
44
  }
45
+ WHATSAPP_CREDS_DIR = OPENCLAW_HOME / "credentials" / "whatsapp" / "default"
46
  WHATSAPP_BACKUP_DIR = STATE_DIR / "credentials" / "whatsapp" / "default"
47
  RESET_MARKER = WORKSPACE / ".reset_credentials"
48
+ HF_API = HfApi(token=HF_TOKEN) if HF_TOKEN else None
49
+ STOP_EVENT = threading.Event()
50
+ _REPO_ID_CACHE: str | None = None
 
 
 
 
 
 
51
 
 
 
 
52
 
53
+ def write_status(status: str, message: str) -> None:
54
+ payload = {
55
+ "status": status,
56
+ "message": message,
57
+ "timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
58
+ }
59
+ tmp_path = STATUS_FILE.with_suffix(".tmp")
60
+ tmp_path.write_text(json.dumps(payload), encoding="utf-8")
61
+ tmp_path.replace(STATUS_FILE)
62
 
63
 
64
  def count_files(path: Path) -> int:
 
65
  if not path.exists():
66
  return 0
67
  return sum(1 for child in path.rglob("*") if child.is_file())
68
 
69
 
70
  def snapshot_state_into_workspace() -> None:
 
 
 
 
 
 
71
  try:
72
  STATE_DIR.mkdir(parents=True, exist_ok=True)
73
  if OPENCLAW_STATE_BACKUP_DIR.exists():
 
83
  shutil.copytree(source_path, backup_path)
84
  elif source_path.is_file():
85
  shutil.copy2(source_path, backup_path)
86
+ except Exception as exc:
87
+ print(f" ⚠️ Could not snapshot OpenClaw state: {exc}")
88
 
89
  try:
90
  if not WHATSAPP_ENABLED:
 
112
  if WHATSAPP_BACKUP_DIR.exists():
113
  shutil.rmtree(WHATSAPP_BACKUP_DIR, ignore_errors=True)
114
  shutil.copytree(WHATSAPP_CREDS_DIR, WHATSAPP_BACKUP_DIR)
115
+ except Exception as exc:
116
+ print(f" ⚠️ Could not snapshot WhatsApp state: {exc}")
117
+
118
+
119
+ def restore_embedded_state() -> None:
120
+ state_backup_root = STATE_DIR / "openclaw"
121
+ if state_backup_root.is_dir():
122
+ print("🧠 Restoring OpenClaw state...")
123
+ for source_path in state_backup_root.iterdir():
124
+ name = source_path.name
125
+ target_path = OPENCLAW_HOME / name
126
+ shutil.rmtree(target_path, ignore_errors=True)
127
+ if target_path.is_file():
128
+ target_path.unlink(missing_ok=True)
129
+ target_path.parent.mkdir(parents=True, exist_ok=True)
130
+ if source_path.is_dir():
131
+ shutil.copytree(source_path, target_path)
132
+ else:
133
+ shutil.copy2(source_path, target_path)
134
+ print(" βœ… OpenClaw state restored")
135
+
136
+ if WHATSAPP_ENABLED and WHATSAPP_BACKUP_DIR.is_dir():
137
+ file_count = count_files(WHATSAPP_BACKUP_DIR)
138
+ if file_count >= 2:
139
+ print("πŸ“± Restoring WhatsApp credentials...")
140
+ shutil.rmtree(WHATSAPP_CREDS_DIR, ignore_errors=True)
141
+ WHATSAPP_CREDS_DIR.parent.mkdir(parents=True, exist_ok=True)
142
+ shutil.copytree(WHATSAPP_BACKUP_DIR, WHATSAPP_CREDS_DIR)
143
+ os.chmod(OPENCLAW_HOME / "credentials", 0o700)
144
+ print(" βœ… WhatsApp credentials restored")
145
+ else:
146
+ print(f" ⚠️ Saved WhatsApp credentials look incomplete ({file_count} files), skipping restore.")
147
+
148
+
149
+ def resolve_backup_namespace() -> str:
150
+ global _REPO_ID_CACHE
151
+ if _REPO_ID_CACHE:
152
+ return _REPO_ID_CACHE
153
+
154
+ namespace = HF_USERNAME or SPACE_AUTHOR_NAME
155
+ if not namespace and HF_API is not None:
156
+ whoami = HF_API.whoami()
157
+ namespace = whoami.get("name") or whoami.get("user") or ""
158
+
159
+ namespace = str(namespace).strip()
160
+ if not namespace:
161
+ raise RuntimeError(
162
+ "Could not determine the Hugging Face username for backups. "
163
+ "Set HF_USERNAME or use a token tied to your account."
164
  )
 
 
 
165
 
166
+ _REPO_ID_CACHE = f"{namespace}/{BACKUP_DATASET_NAME}"
167
+ return _REPO_ID_CACHE
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
168
 
 
 
169
 
170
+ def ensure_repo_exists() -> str:
171
+ repo_id = resolve_backup_namespace()
172
+ try:
173
+ HF_API.repo_info(repo_id=repo_id, repo_type="dataset")
174
+ except RepositoryNotFoundError:
175
+ HF_API.create_repo(repo_id=repo_id, repo_type="dataset", private=True)
176
+ return repo_id
177
+
178
+
179
+ def metadata_marker(root: Path) -> tuple[int, int, int]:
180
+ if not root.exists():
181
+ return (0, 0, 0)
182
+
183
+ file_count = 0
184
+ total_size = 0
185
+ newest_mtime = 0
186
+ for path in root.rglob("*"):
187
+ if not path.is_file():
188
+ continue
189
+ rel = path.relative_to(root).as_posix()
190
+ if rel.startswith(".git/"):
191
+ continue
192
  try:
193
+ stat = path.stat()
194
+ except OSError:
195
+ continue
196
+ file_count += 1
197
+ total_size += int(stat.st_size)
198
+ newest_mtime = max(newest_mtime, int(stat.st_mtime_ns))
199
+ return (file_count, total_size, newest_mtime)
200
+
201
+
202
+ def fingerprint_dir(root: Path) -> str:
203
+ hasher = hashlib.sha256()
204
+ if not root.exists():
205
+ return hasher.hexdigest()
206
+
207
+ for path in sorted(p for p in root.rglob("*") if p.is_file()):
208
+ rel = path.relative_to(root).as_posix()
209
+ if rel.startswith(".git/"):
210
+ continue
211
+ hasher.update(rel.encode("utf-8"))
212
+ with path.open("rb") as handle:
213
+ for chunk in iter(lambda: handle.read(1024 * 1024), b""):
214
+ hasher.update(chunk)
215
+ return hasher.hexdigest()
216
+
217
+
218
+ def create_snapshot_dir(source_root: Path) -> Path:
219
+ staging_root = Path(tempfile.mkdtemp(prefix="huggingclaw-sync-"))
220
+ for path in sorted(source_root.rglob("*")):
221
+ rel = path.relative_to(source_root)
222
+ rel_posix = rel.as_posix()
223
+ if rel_posix.startswith(".git/") or rel_posix == ".git":
224
+ continue
225
+ target = staging_root / rel
226
+ if path.is_dir():
227
+ target.mkdir(parents=True, exist_ok=True)
228
+ continue
229
+ target.parent.mkdir(parents=True, exist_ok=True)
230
+ shutil.copy2(path, target)
231
+ return staging_root
232
+
233
+
234
+ def restore_workspace() -> bool:
235
+ if not HF_TOKEN:
236
+ write_status("disabled", "HF_TOKEN is not configured.")
237
  return False
238
 
239
+ repo_id = resolve_backup_namespace()
240
+ write_status("restoring", f"Restoring workspace from {repo_id}")
241
 
 
 
242
  try:
243
+ with tempfile.TemporaryDirectory() as tmpdir:
244
+ snapshot_download(
245
+ repo_id=repo_id,
246
+ repo_type="dataset",
247
+ token=HF_TOKEN,
248
+ local_dir=tmpdir,
249
+ )
250
+
251
+ tmp_path = Path(tmpdir)
252
+ if not any(tmp_path.iterdir()):
253
+ write_status("fresh", "Backup dataset is empty. Starting fresh.")
254
+ return True
255
+
256
+ WORKSPACE.mkdir(parents=True, exist_ok=True)
257
+ for child in list(WORKSPACE.iterdir()):
258
+ if child.name == ".git":
259
+ continue
260
+ if child.is_dir():
261
+ shutil.rmtree(child, ignore_errors=True)
262
+ else:
263
+ child.unlink(missing_ok=True)
264
+
265
+ for child in tmp_path.iterdir():
266
+ if child.name == ".git":
267
+ continue
268
+ destination = WORKSPACE / child.name
269
+ if child.is_dir():
270
+ shutil.copytree(child, destination)
271
+ else:
272
+ shutil.copy2(child, destination)
273
+
274
+ restore_embedded_state()
275
+ write_status("restored", f"Restored workspace from {repo_id}")
276
+ return True
277
+ except RepositoryNotFoundError:
278
+ write_status("fresh", f"Backup dataset {repo_id} does not exist yet.")
279
+ return True
280
+ except HfHubHTTPError as exc:
281
+ if exc.response is not None and exc.response.status_code == 404:
282
+ write_status("fresh", f"Backup dataset {repo_id} does not exist yet.")
283
+ return True
284
+ write_status("error", f"Restore failed: {exc}")
285
+ print(f"Restore failed: {exc}", file=sys.stderr)
286
+ return False
287
+ except Exception as exc:
288
+ write_status("error", f"Restore failed: {exc}")
289
+ print(f"Restore failed: {exc}", file=sys.stderr)
290
  return False
291
 
292
 
293
+ def sync_once(
294
+ last_fingerprint: str | None = None,
295
+ last_marker: tuple[int, int, int] | None = None,
296
+ ) -> tuple[str, tuple[int, int, int]]:
297
+ if not HF_TOKEN:
298
+ write_status("disabled", "HF_TOKEN is not configured.")
299
+ return (last_fingerprint or "", last_marker or (0, 0, 0))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
300
 
301
+ snapshot_state_into_workspace()
302
+ repo_id = ensure_repo_exists()
303
+ current_marker = metadata_marker(WORKSPACE)
 
304
 
305
+ if last_marker is not None and current_marker == last_marker:
306
+ write_status("synced", "No workspace changes detected.")
307
+ return (last_fingerprint or "", current_marker)
308
 
309
+ current_fingerprint = fingerprint_dir(WORKSPACE)
310
+ if last_fingerprint is not None and current_fingerprint == last_fingerprint:
311
+ write_status("synced", "No workspace changes detected.")
312
+ return (last_fingerprint, current_marker)
 
313
 
314
+ write_status("syncing", f"Uploading workspace to {repo_id}")
315
+ snapshot_dir = create_snapshot_dir(WORKSPACE)
316
+ try:
317
+ upload_folder(
318
+ folder_path=str(snapshot_dir),
319
+ repo_id=repo_id,
320
+ repo_type="dataset",
321
+ token=HF_TOKEN,
322
+ commit_message=f"HuggingClaw sync {time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime())}",
323
+ ignore_patterns=[".git/*", ".git"],
324
+ )
325
+ finally:
326
+ shutil.rmtree(snapshot_dir, ignore_errors=True)
327
 
328
+ write_status("success", f"Uploaded workspace to {repo_id}")
329
+ return (current_fingerprint, current_marker)
330
 
 
 
 
331
 
332
+ def handle_signal(_sig, _frame) -> None:
333
+ STOP_EVENT.set()
334
 
 
 
 
 
335
 
336
+ def loop() -> int:
337
+ signal.signal(signal.SIGTERM, handle_signal)
338
+ signal.signal(signal.SIGINT, handle_signal)
 
 
 
 
 
 
 
 
 
 
 
 
339
 
340
+ try:
341
+ repo_id = resolve_backup_namespace()
342
+ write_status("configured", f"Backup loop active for {repo_id} with {INTERVAL}s interval.")
343
+ except Exception as exc:
344
+ write_status("error", str(exc))
345
+ print(f"πŸ“ Workspace sync: {exc}")
346
+ return 1
347
 
348
+ last_fingerprint = fingerprint_dir(WORKSPACE)
349
+ last_marker = metadata_marker(WORKSPACE)
 
350
 
351
+ time.sleep(INITIAL_DELAY)
352
+ print(f"πŸ”„ Workspace sync started: every {INTERVAL}s β†’ {repo_id}")
 
353
 
354
+ while not STOP_EVENT.is_set():
355
+ try:
356
+ last_fingerprint, last_marker = sync_once(last_fingerprint, last_marker)
357
+ except Exception as exc:
358
+ write_status("error", f"Sync failed: {exc}")
359
+ print(f"πŸ“ Workspace sync failed: {exc}")
360
 
361
+ if STOP_EVENT.wait(INTERVAL):
362
+ break
 
363
 
364
+ return 0
 
 
 
 
365
 
 
 
366
 
367
+ def main() -> int:
368
+ WORKSPACE.mkdir(parents=True, exist_ok=True)
 
 
369
 
370
+ if len(sys.argv) < 2:
371
+ return loop()
372
 
373
+ command = sys.argv[1]
374
+ if command == "restore":
375
+ return 0 if restore_workspace() else 1
376
+ if command == "sync-once":
377
+ try:
378
+ sync_once()
379
+ return 0
380
+ except Exception as exc:
381
+ write_status("error", f"Shutdown sync failed: {exc}")
382
+ print(f"πŸ“ Workspace sync: shutdown sync failed: {exc}")
383
+ return 1
384
+ if command == "loop":
385
+ return loop()
386
 
387
+ print(f"Unknown command: {command}", file=sys.stderr)
388
+ return 1
389
 
390
 
391
  if __name__ == "__main__":
392
+ raise SystemExit(main())