Spaces:
Running
Running
refactor: overhaul workspace sync to use robust huggingface_hub operations and local fingerprinting
Browse files- CHANGELOG.md +19 -0
- README.md +11 -10
- health-server.js +95 -16
- start.sh +17 -95
- workspace-sync.py +280 -226
CHANGELOG.md
CHANGED
|
@@ -2,6 +2,25 @@
|
|
| 2 |
|
| 3 |
All notable changes to this project will be documented in this file.
|
| 4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
## [1.3.0] - 2026-04-04
|
| 6 |
|
| 7 |
### Added
|
|
|
|
| 2 |
|
| 3 |
All notable changes to this project will be documented in this file.
|
| 4 |
|
| 5 |
+
## [1.4.0] - 2026-04-25
|
| 6 |
+
|
| 7 |
+
### Added
|
| 8 |
+
|
| 9 |
+
- **Custom OpenAI-compatible provider registration** β HuggingClaw can now register a custom provider at startup with `CUSTOM_PROVIDER_NAME`, `CUSTOM_BASE_URL`, and `CUSTOM_MODEL_ID`, so you can point `LLM_MODEL` at your own OpenAI-compatible endpoint without modifying the OpenClaw CLI
|
| 10 |
+
|
| 11 |
+
### Changed
|
| 12 |
+
|
| 13 |
+
- **HF backup flow simplified** β HuggingClaw now uses `huggingface_hub` directly for restore and sync, matching the safer dataset-based pattern used in Hugging8n
|
| 14 |
+
- **HF username no longer required in most cases** β backup namespace resolution now works from `HF_USERNAME`, `SPACE_AUTHOR_NAME`, or the authenticated HF token, so `HF_TOKEN` is usually enough on its own
|
| 15 |
+
- **Startup restore path modernized** β startup now restores workspace and hidden state through `workspace-sync.py restore` instead of configuring a token-bearing git remote
|
| 16 |
+
- **README refreshed for the new backup model** β documentation now describes token-only backup setup, the removed git sync assumptions, and the hardened dashboard helper behavior
|
| 17 |
+
|
| 18 |
+
### Fixed
|
| 19 |
+
|
| 20 |
+
- **HF token exposure risk in git remotes** β removed the old authenticated remote URL pattern that could leave `HF_TOKEN` embedded in workspace git configuration
|
| 21 |
+
- **Backup status detection mismatch** β dashboard and startup summary now treat backup as enabled when `HF_TOKEN` is present, which matches the new auto-namespace flow
|
| 22 |
+
- **UptimeRobot setup hardening gap** β dashboard setup now supports explicit enable/disable control, request rate limiting, origin validation, and earlier API-key validation
|
| 23 |
+
|
| 24 |
## [1.3.0] - 2026-04-04
|
| 25 |
|
| 26 |
### Added
|
README.md
CHANGED
|
@@ -50,7 +50,7 @@ secrets:
|
|
| 50 |
- β‘ **Zero Config:** Duplicate this Space and set **just three** secrets (LLM_API_KEY, LLM_MODEL, GATEWAY_TOKEN) β no other setup needed.
|
| 51 |
- π³ **Fast Builds:** Uses a pre-built OpenClaw Docker image to deploy in minutes.
|
| 52 |
- π **Built-In Browser:** Headless Chromium is included in the Space, so browser actions work from the start.
|
| 53 |
-
- πΎ **Workspace Backup:** Chats, settings, and WhatsApp session state sync to a private HF Dataset via the `huggingface_hub`
|
| 54 |
- β° **External Keep-Alive:** Set up a one-time UptimeRobot monitor from the dashboard to help keep free HF Spaces awake.
|
| 55 |
- π₯ **Multi-User Messaging:** Support for Telegram (multi-user) and WhatsApp (pairing).
|
| 56 |
- π **Visual Dashboard:** Beautiful Web UI to monitor uptime, sync status, and active models.
|
|
@@ -120,16 +120,16 @@ To use WhatsApp, enable the channel and scan the QR code from the Control UI (**
|
|
| 120 |
|
| 121 |
## πΎ Workspace Backup *(Optional)*
|
| 122 |
|
| 123 |
-
|
|
|
|
|
|
|
| 124 |
|
| 125 |
| Variable | Default | Description |
|
| 126 |
| :--- | :--- | :--- |
|
| 127 |
-
| `HF_USERNAME` | β |
|
| 128 |
| `HF_TOKEN` | β | HF token with write access |
|
| 129 |
| `BACKUP_DATASET_NAME` | `huggingclaw-backup` | Dataset name for backup repo |
|
| 130 |
| `SYNC_INTERVAL` | `180` | Sync interval in seconds |
|
| 131 |
-
| `WORKSPACE_GIT_USER` | `openclaw@example.com` | Git commit email for syncs |
|
| 132 |
-
| `WORKSPACE_GIT_NAME` | `OpenClaw Bot` | Git commit name for syncs |
|
| 133 |
|
| 134 |
> [!TIP]
|
| 135 |
> This backup also stores a hidden copy of your WhatsApp session credentials, allowing paired logins to survive Space restarts automatically.
|
|
@@ -155,6 +155,7 @@ What happens next:
|
|
| 155 |
- You only need to do this once
|
| 156 |
|
| 157 |
You do **not** need to add this key to Hugging Face Space Secrets.
|
|
|
|
| 158 |
|
| 159 |
Note:
|
| 160 |
|
|
@@ -290,7 +291,7 @@ openclaw channels login --gateway https://YOUR_SPACE_NAME.hf.space
|
|
| 290 |
HuggingClaw/
|
| 291 |
βββ Dockerfile # Multi-stage build using pre-built OpenClaw image
|
| 292 |
βββ start.sh # Config generator, validator, and orchestrator
|
| 293 |
-
βββ workspace-sync.py # Syncs workspace to HF Datasets
|
| 294 |
βββ health-server.js # /health endpoint for uptime checks
|
| 295 |
βββ dns-fix.js # DNS-over-HTTPS fallback (for blocked domains)
|
| 296 |
βββ .env.example # Environment variable reference
|
|
@@ -299,8 +300,8 @@ HuggingClaw/
|
|
| 299 |
**Startup sequence:**
|
| 300 |
1. Validate required secrets (fail fast with clear error).
|
| 301 |
2. Check HF token (warn if expired or missing).
|
| 302 |
-
3.
|
| 303 |
-
4.
|
| 304 |
5. Generate `openclaw.json` from environment variables.
|
| 305 |
6. Print startup summary.
|
| 306 |
7. Launch background tasks (auto-sync and optional channel helpers).
|
|
@@ -312,11 +313,11 @@ HuggingClaw/
|
|
| 312 |
|
| 313 |
- **Missing secrets:** Ensure `LLM_API_KEY`, `LLM_MODEL`, and `GATEWAY_TOKEN` are set in your Space **Settings β Secrets**.
|
| 314 |
- **Telegram bot issues:** Verify your `TELEGRAM_BOT_TOKEN`. Check Space logs for lines like `π± Enabling Telegram`.
|
| 315 |
-
- **Backup restore failing:** Make sure `
|
| 316 |
- **Space keeps sleeping:** Open `/` and use `Keep Space Awake` to create the external monitor.
|
| 317 |
- **Auth errors / proxy:** If you see reverse-proxy auth errors, add the logged IPs under `TRUSTED_PROXIES` (from logs `remote=x.x.x.x`).
|
| 318 |
- **Control UI says too many failed authentication attempts:** Wait for the retry window to expire, then open the Space in an incognito window or clear site storage for your Space before logging in again with `GATEWAY_TOKEN`.
|
| 319 |
-
- **WhatsApp lost its session after restart:** Make sure `
|
| 320 |
- **UI blocked (CORS):** Set `ALLOWED_ORIGINS=https://your-space-name.hf.space`.
|
| 321 |
- **Version mismatches:** Pin a specific OpenClaw build with the `OPENCLAW_VERSION` Variable in HF Spaces, or `--build-arg OPENCLAW_VERSION=...` locally.
|
| 322 |
|
|
|
|
| 50 |
- β‘ **Zero Config:** Duplicate this Space and set **just three** secrets (LLM_API_KEY, LLM_MODEL, GATEWAY_TOKEN) β no other setup needed.
|
| 51 |
- π³ **Fast Builds:** Uses a pre-built OpenClaw Docker image to deploy in minutes.
|
| 52 |
- π **Built-In Browser:** Headless Chromium is included in the Space, so browser actions work from the start.
|
| 53 |
+
- πΎ **Workspace Backup:** Chats, settings, and WhatsApp session state sync to a private HF Dataset via the `huggingface_hub`, preserving data automatically without storing your HF token in a git remote.
|
| 54 |
- β° **External Keep-Alive:** Set up a one-time UptimeRobot monitor from the dashboard to help keep free HF Spaces awake.
|
| 55 |
- π₯ **Multi-User Messaging:** Support for Telegram (multi-user) and WhatsApp (pairing).
|
| 56 |
- π **Visual Dashboard:** Beautiful Web UI to monitor uptime, sync status, and active models.
|
|
|
|
| 120 |
|
| 121 |
## πΎ Workspace Backup *(Optional)*
|
| 122 |
|
| 123 |
+
As of **v1.4.0**, HuggingClaw uses the safer API-based backup flow by default: no token-bearing git remote is configured, and `HF_USERNAME` is usually optional.
|
| 124 |
+
|
| 125 |
+
For persistent chat history and configuration, HuggingClaw can sync your workspace to a private HuggingFace Dataset. On first run, it will automatically create (or use) the Dataset repo `<your-account>/huggingclaw-backup`, restore your workspace on startup, and sync changes periodically. In most cases, `HF_USERNAME` is no longer required because HuggingClaw can derive the namespace from your token automatically.
|
| 126 |
|
| 127 |
| Variable | Default | Description |
|
| 128 |
| :--- | :--- | :--- |
|
| 129 |
+
| `HF_USERNAME` | β | Optional override for the HuggingFace username/namespace |
|
| 130 |
| `HF_TOKEN` | β | HF token with write access |
|
| 131 |
| `BACKUP_DATASET_NAME` | `huggingclaw-backup` | Dataset name for backup repo |
|
| 132 |
| `SYNC_INTERVAL` | `180` | Sync interval in seconds |
|
|
|
|
|
|
|
| 133 |
|
| 134 |
> [!TIP]
|
| 135 |
> This backup also stores a hidden copy of your WhatsApp session credentials, allowing paired logins to survive Space restarts automatically.
|
|
|
|
| 155 |
- You only need to do this once
|
| 156 |
|
| 157 |
You do **not** need to add this key to Hugging Face Space Secrets.
|
| 158 |
+
The dashboard helper also rate-limits setup requests and rejects cross-origin submissions.
|
| 159 |
|
| 160 |
Note:
|
| 161 |
|
|
|
|
| 291 |
HuggingClaw/
|
| 292 |
βββ Dockerfile # Multi-stage build using pre-built OpenClaw image
|
| 293 |
βββ start.sh # Config generator, validator, and orchestrator
|
| 294 |
+
βββ workspace-sync.py # Syncs workspace/state to HF Datasets via huggingface_hub
|
| 295 |
βββ health-server.js # /health endpoint for uptime checks
|
| 296 |
βββ dns-fix.js # DNS-over-HTTPS fallback (for blocked domains)
|
| 297 |
βββ .env.example # Environment variable reference
|
|
|
|
| 300 |
**Startup sequence:**
|
| 301 |
1. Validate required secrets (fail fast with clear error).
|
| 302 |
2. Check HF token (warn if expired or missing).
|
| 303 |
+
3. Resolve the backup namespace from `HF_USERNAME`, `SPACE_AUTHOR_NAME`, or the HF token.
|
| 304 |
+
4. Auto-create backup dataset if missing and restore workspace/state from it.
|
| 305 |
5. Generate `openclaw.json` from environment variables.
|
| 306 |
6. Print startup summary.
|
| 307 |
7. Launch background tasks (auto-sync and optional channel helpers).
|
|
|
|
| 313 |
|
| 314 |
- **Missing secrets:** Ensure `LLM_API_KEY`, `LLM_MODEL`, and `GATEWAY_TOKEN` are set in your Space **Settings β Secrets**.
|
| 315 |
- **Telegram bot issues:** Verify your `TELEGRAM_BOT_TOKEN`. Check Space logs for lines like `π± Enabling Telegram`.
|
| 316 |
+
- **Backup restore failing:** Make sure `HF_TOKEN` is valid and has write access to your HF account dataset. Set `HF_USERNAME` only if auto-detection is not available in your environment.
|
| 317 |
- **Space keeps sleeping:** Open `/` and use `Keep Space Awake` to create the external monitor.
|
| 318 |
- **Auth errors / proxy:** If you see reverse-proxy auth errors, add the logged IPs under `TRUSTED_PROXIES` (from logs `remote=x.x.x.x`).
|
| 319 |
- **Control UI says too many failed authentication attempts:** Wait for the retry window to expire, then open the Space in an incognito window or clear site storage for your Space before logging in again with `GATEWAY_TOKEN`.
|
| 320 |
+
- **WhatsApp lost its session after restart:** Make sure `HF_TOKEN` is configured so the hidden session backup can be restored on boot.
|
| 321 |
- **UI blocked (CORS):** Set `ALLOWED_ORIGINS=https://your-space-name.hf.space`.
|
| 322 |
- **Version mismatches:** Pin a specific OpenClaw build with the `OPENCLAW_VERSION` Variable in HF Spaces, or `--build-arg OPENCLAW_VERSION=...` locally.
|
| 323 |
|
health-server.js
CHANGED
|
@@ -12,7 +12,7 @@ const LLM_MODEL = process.env.LLM_MODEL || "Not Set";
|
|
| 12 |
const TELEGRAM_ENABLED = !!process.env.TELEGRAM_BOT_TOKEN;
|
| 13 |
const WHATSAPP_ENABLED = /^true$/i.test(process.env.WHATSAPP_ENABLED || "");
|
| 14 |
const WHATSAPP_STATUS_FILE = "/tmp/huggingclaw-wa-status.json";
|
| 15 |
-
const HF_BACKUP_ENABLED = !!
|
| 16 |
const SYNC_INTERVAL = process.env.SYNC_INTERVAL || "600";
|
| 17 |
const DASHBOARD_BASE = "/dashboard";
|
| 18 |
const DASHBOARD_STATUS_PATH = `${DASHBOARD_BASE}/status`;
|
|
@@ -20,8 +20,16 @@ const DASHBOARD_HEALTH_PATH = `${DASHBOARD_BASE}/health`;
|
|
| 20 |
const DASHBOARD_UPTIMEROBOT_PATH = `${DASHBOARD_BASE}/uptimerobot/setup`;
|
| 21 |
const DASHBOARD_APP_BASE = `${DASHBOARD_BASE}/app`;
|
| 22 |
const APP_BASE = "/app";
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
const SPACE_VISIBILITY_TTL_MS = 10 * 60 * 1000;
|
| 24 |
const spaceVisibilityCache = new Map();
|
|
|
|
| 25 |
|
| 26 |
function parseRequestUrl(url) {
|
| 27 |
try {
|
|
@@ -80,19 +88,59 @@ function appendForwarded(existingValue, nextValue) {
|
|
| 80 |
return `${existingValue}, ${cleanNext}`;
|
| 81 |
}
|
| 82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
function buildProxyHeaders(headers, remoteAddress) {
|
| 84 |
return {
|
| 85 |
...headers,
|
| 86 |
-
host:
|
| 87 |
-
"x-forwarded-for":
|
| 88 |
-
|
| 89 |
-
remoteAddress,
|
| 90 |
-
),
|
| 91 |
-
"x-forwarded-host": headers["x-forwarded-host"] || headers.host || "",
|
| 92 |
"x-forwarded-proto": headers["x-forwarded-proto"] || "https",
|
| 93 |
};
|
| 94 |
}
|
| 95 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
function readSyncStatus() {
|
| 97 |
try {
|
| 98 |
if (fs.existsSync("/tmp/sync-status.json")) {
|
|
@@ -231,7 +279,13 @@ function renderSyncBadge(syncData) {
|
|
| 231 |
|
| 232 |
function renderDashboard(initialData) {
|
| 233 |
const controlUiHref = `${APP_BASE}/`;
|
| 234 |
-
const keepAwakeHtml =
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 235 |
? `
|
| 236 |
<div id="uptimerobot-private-note" class="helper-summary">
|
| 237 |
<strong>This Space is private.</strong> External monitors cannot reliably access private HF health URLs, so keep-awake setup is only available on public Spaces.
|
|
@@ -739,6 +793,7 @@ function renderDashboard(initialData) {
|
|
| 739 |
|
| 740 |
const monitorStateKey = 'huggingclaw_uptimerobot_setup_v1';
|
| 741 |
const KEEP_AWAKE_PRIVATE = ${initialData.spacePrivate ? "true" : "false"};
|
|
|
|
| 742 |
|
| 743 |
function setMonitorUiState(isConfigured) {
|
| 744 |
const summary = document.getElementById('uptimerobot-summary');
|
|
@@ -824,7 +879,7 @@ function renderDashboard(initialData) {
|
|
| 824 |
updateStats();
|
| 825 |
setInterval(updateStats, 10000);
|
| 826 |
document.getElementById('control-ui-link').setAttribute('href', getDashboardBase() + '/app/' + getCurrentSearch());
|
| 827 |
-
if (!KEEP_AWAKE_PRIVATE) {
|
| 828 |
restoreMonitorUiState();
|
| 829 |
document.getElementById('uptimerobot-btn').addEventListener('click', setupUptimeRobot);
|
| 830 |
document.getElementById('uptimerobot-toggle').addEventListener('click', toggleMonitorSetup);
|
|
@@ -942,13 +997,14 @@ async function createUptimeRobotMonitor(apiKey, host) {
|
|
| 942 |
}
|
| 943 |
|
| 944 |
function proxyHttp(req, res, proxyPath = req.url, proxyPort = GATEWAY_PORT) {
|
|
|
|
| 945 |
const proxyReq = http.request(
|
| 946 |
{
|
| 947 |
hostname: GATEWAY_HOST,
|
| 948 |
port: proxyPort,
|
| 949 |
method: req.method,
|
| 950 |
path: proxyPath,
|
| 951 |
-
headers: buildProxyHeaders(req.headers,
|
| 952 |
},
|
| 953 |
(proxyRes) => {
|
| 954 |
res.writeHead(proxyRes.statusCode || 502, proxyRes.headers);
|
|
@@ -970,7 +1026,7 @@ function proxyHttp(req, res, proxyPath = req.url, proxyPort = GATEWAY_PORT) {
|
|
| 970 |
req.pipe(proxyReq);
|
| 971 |
}
|
| 972 |
|
| 973 |
-
function serializeUpgradeHeaders(req,
|
| 974 |
const forwardedHeaders = [];
|
| 975 |
|
| 976 |
for (let i = 0; i < req.rawHeaders.length; i += 2) {
|
|
@@ -979,6 +1035,7 @@ function serializeUpgradeHeaders(req, remoteAddress) {
|
|
| 979 |
const lower = name.toLowerCase();
|
| 980 |
|
| 981 |
if (
|
|
|
|
| 982 |
lower === "x-forwarded-for" ||
|
| 983 |
lower === "x-forwarded-host" ||
|
| 984 |
lower === "x-forwarded-proto"
|
|
@@ -990,10 +1047,13 @@ function serializeUpgradeHeaders(req, remoteAddress) {
|
|
| 990 |
}
|
| 991 |
|
| 992 |
forwardedHeaders.push(
|
| 993 |
-
`
|
| 994 |
);
|
| 995 |
forwardedHeaders.push(
|
| 996 |
-
`X-Forwarded-
|
|
|
|
|
|
|
|
|
|
| 997 |
);
|
| 998 |
forwardedHeaders.push(
|
| 999 |
`X-Forwarded-Proto: ${req.headers["x-forwarded-proto"] || "https"}`,
|
|
@@ -1010,11 +1070,12 @@ function proxyUpgrade(
|
|
| 1010 |
proxyPort = GATEWAY_PORT,
|
| 1011 |
) {
|
| 1012 |
const proxySocket = net.connect(proxyPort, GATEWAY_HOST);
|
|
|
|
| 1013 |
|
| 1014 |
proxySocket.on("connect", () => {
|
| 1015 |
const requestLines = [
|
| 1016 |
`${req.method} ${proxyPath} HTTP/${req.httpVersion}`,
|
| 1017 |
-
...serializeUpgradeHeaders(req,
|
| 1018 |
"",
|
| 1019 |
"",
|
| 1020 |
];
|
|
@@ -1092,15 +1153,33 @@ const server = http.createServer((req, res) => {
|
|
| 1092 |
|
| 1093 |
void (async () => {
|
| 1094 |
try {
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1095 |
const body = await readRequestBody(req);
|
| 1096 |
const parsed = JSON.parse(body || "{}");
|
| 1097 |
const apiKey = String(parsed.apiKey || "").trim();
|
| 1098 |
|
| 1099 |
-
if (!apiKey) {
|
| 1100 |
res.writeHead(400, { "Content-Type": "application/json" });
|
| 1101 |
res.end(
|
| 1102 |
JSON.stringify({
|
| 1103 |
-
message: "
|
| 1104 |
}),
|
| 1105 |
);
|
| 1106 |
return;
|
|
|
|
| 12 |
const TELEGRAM_ENABLED = !!process.env.TELEGRAM_BOT_TOKEN;
|
| 13 |
const WHATSAPP_ENABLED = /^true$/i.test(process.env.WHATSAPP_ENABLED || "");
|
| 14 |
const WHATSAPP_STATUS_FILE = "/tmp/huggingclaw-wa-status.json";
|
| 15 |
+
const HF_BACKUP_ENABLED = !!process.env.HF_TOKEN;
|
| 16 |
const SYNC_INTERVAL = process.env.SYNC_INTERVAL || "600";
|
| 17 |
const DASHBOARD_BASE = "/dashboard";
|
| 18 |
const DASHBOARD_STATUS_PATH = `${DASHBOARD_BASE}/status`;
|
|
|
|
| 20 |
const DASHBOARD_UPTIMEROBOT_PATH = `${DASHBOARD_BASE}/uptimerobot/setup`;
|
| 21 |
const DASHBOARD_APP_BASE = `${DASHBOARD_BASE}/app`;
|
| 22 |
const APP_BASE = "/app";
|
| 23 |
+
const UPTIMEROBOT_SETUP_ENABLED =
|
| 24 |
+
String(process.env.UPTIMEROBOT_SETUP_ENABLED || "true").toLowerCase() ===
|
| 25 |
+
"true";
|
| 26 |
+
const UPTIMEROBOT_RATE_WINDOW_MS = 60 * 1000;
|
| 27 |
+
const UPTIMEROBOT_RATE_MAX = Number(
|
| 28 |
+
process.env.UPTIMEROBOT_RATE_LIMIT_PER_MINUTE || 5,
|
| 29 |
+
);
|
| 30 |
const SPACE_VISIBILITY_TTL_MS = 10 * 60 * 1000;
|
| 31 |
const spaceVisibilityCache = new Map();
|
| 32 |
+
const uptimerobotRateMap = new Map();
|
| 33 |
|
| 34 |
function parseRequestUrl(url) {
|
| 35 |
try {
|
|
|
|
| 88 |
return `${existingValue}, ${cleanNext}`;
|
| 89 |
}
|
| 90 |
|
| 91 |
+
function getForwardedClientIp(req) {
|
| 92 |
+
const forwardedFor = req.headers["x-forwarded-for"];
|
| 93 |
+
if (Array.isArray(forwardedFor) && forwardedFor.length > 0) {
|
| 94 |
+
return String(forwardedFor[0]).split(",")[0].trim();
|
| 95 |
+
}
|
| 96 |
+
if (typeof forwardedFor === "string" && forwardedFor.trim()) {
|
| 97 |
+
return forwardedFor.split(",")[0].trim();
|
| 98 |
+
}
|
| 99 |
+
return req.socket.remoteAddress || "";
|
| 100 |
+
}
|
| 101 |
+
|
| 102 |
function buildProxyHeaders(headers, remoteAddress) {
|
| 103 |
return {
|
| 104 |
...headers,
|
| 105 |
+
host: `${GATEWAY_HOST}:${GATEWAY_PORT}`,
|
| 106 |
+
"x-forwarded-for": remoteAddress || "",
|
| 107 |
+
"x-forwarded-host": headers.host || "",
|
|
|
|
|
|
|
|
|
|
| 108 |
"x-forwarded-proto": headers["x-forwarded-proto"] || "https",
|
| 109 |
};
|
| 110 |
}
|
| 111 |
|
| 112 |
+
function getRequesterIp(req) {
|
| 113 |
+
return (
|
| 114 |
+
getForwardedClientIp(req) ||
|
| 115 |
+
req.socket.remoteAddress ||
|
| 116 |
+
"unknown"
|
| 117 |
+
);
|
| 118 |
+
}
|
| 119 |
+
|
| 120 |
+
function isRateLimited(req) {
|
| 121 |
+
const now = Date.now();
|
| 122 |
+
const ip = getRequesterIp(req);
|
| 123 |
+
const bucket = uptimerobotRateMap.get(ip) || [];
|
| 124 |
+
const recent = bucket.filter((ts) => now - ts < UPTIMEROBOT_RATE_WINDOW_MS);
|
| 125 |
+
recent.push(now);
|
| 126 |
+
uptimerobotRateMap.set(ip, recent);
|
| 127 |
+
return recent.length > UPTIMEROBOT_RATE_MAX;
|
| 128 |
+
}
|
| 129 |
+
|
| 130 |
+
function isAllowedUptimeSetupOrigin(req) {
|
| 131 |
+
const host = String(req.headers.host || "").toLowerCase();
|
| 132 |
+
const origin = String(req.headers.origin || "").toLowerCase();
|
| 133 |
+
const referer = String(req.headers.referer || "").toLowerCase();
|
| 134 |
+
if (!host) return false;
|
| 135 |
+
if (origin && !origin.includes(host)) return false;
|
| 136 |
+
if (referer && !referer.includes(host)) return false;
|
| 137 |
+
return true;
|
| 138 |
+
}
|
| 139 |
+
|
| 140 |
+
function isValidUptimeApiKey(key) {
|
| 141 |
+
return /^[A-Za-z0-9_-]{20,128}$/.test(String(key || ""));
|
| 142 |
+
}
|
| 143 |
+
|
| 144 |
function readSyncStatus() {
|
| 145 |
try {
|
| 146 |
if (fs.existsSync("/tmp/sync-status.json")) {
|
|
|
|
| 279 |
|
| 280 |
function renderDashboard(initialData) {
|
| 281 |
const controlUiHref = `${APP_BASE}/`;
|
| 282 |
+
const keepAwakeHtml = !UPTIMEROBOT_SETUP_ENABLED
|
| 283 |
+
? `
|
| 284 |
+
<div id="uptimerobot-private-note" class="helper-summary">
|
| 285 |
+
UptimeRobot setup is disabled for this Space.
|
| 286 |
+
</div>
|
| 287 |
+
`
|
| 288 |
+
: initialData.spacePrivate
|
| 289 |
? `
|
| 290 |
<div id="uptimerobot-private-note" class="helper-summary">
|
| 291 |
<strong>This Space is private.</strong> External monitors cannot reliably access private HF health URLs, so keep-awake setup is only available on public Spaces.
|
|
|
|
| 793 |
|
| 794 |
const monitorStateKey = 'huggingclaw_uptimerobot_setup_v1';
|
| 795 |
const KEEP_AWAKE_PRIVATE = ${initialData.spacePrivate ? "true" : "false"};
|
| 796 |
+
const KEEP_AWAKE_SETUP_ENABLED = ${UPTIMEROBOT_SETUP_ENABLED ? "true" : "false"};
|
| 797 |
|
| 798 |
function setMonitorUiState(isConfigured) {
|
| 799 |
const summary = document.getElementById('uptimerobot-summary');
|
|
|
|
| 879 |
updateStats();
|
| 880 |
setInterval(updateStats, 10000);
|
| 881 |
document.getElementById('control-ui-link').setAttribute('href', getDashboardBase() + '/app/' + getCurrentSearch());
|
| 882 |
+
if (KEEP_AWAKE_SETUP_ENABLED && !KEEP_AWAKE_PRIVATE) {
|
| 883 |
restoreMonitorUiState();
|
| 884 |
document.getElementById('uptimerobot-btn').addEventListener('click', setupUptimeRobot);
|
| 885 |
document.getElementById('uptimerobot-toggle').addEventListener('click', toggleMonitorSetup);
|
|
|
|
| 997 |
}
|
| 998 |
|
| 999 |
function proxyHttp(req, res, proxyPath = req.url, proxyPort = GATEWAY_PORT) {
|
| 1000 |
+
const clientIp = getForwardedClientIp(req);
|
| 1001 |
const proxyReq = http.request(
|
| 1002 |
{
|
| 1003 |
hostname: GATEWAY_HOST,
|
| 1004 |
port: proxyPort,
|
| 1005 |
method: req.method,
|
| 1006 |
path: proxyPath,
|
| 1007 |
+
headers: buildProxyHeaders(req.headers, clientIp),
|
| 1008 |
},
|
| 1009 |
(proxyRes) => {
|
| 1010 |
res.writeHead(proxyRes.statusCode || 502, proxyRes.headers);
|
|
|
|
| 1026 |
req.pipe(proxyReq);
|
| 1027 |
}
|
| 1028 |
|
| 1029 |
+
function serializeUpgradeHeaders(req, clientIp, proxyPort) {
|
| 1030 |
const forwardedHeaders = [];
|
| 1031 |
|
| 1032 |
for (let i = 0; i < req.rawHeaders.length; i += 2) {
|
|
|
|
| 1035 |
const lower = name.toLowerCase();
|
| 1036 |
|
| 1037 |
if (
|
| 1038 |
+
lower === "host" ||
|
| 1039 |
lower === "x-forwarded-for" ||
|
| 1040 |
lower === "x-forwarded-host" ||
|
| 1041 |
lower === "x-forwarded-proto"
|
|
|
|
| 1047 |
}
|
| 1048 |
|
| 1049 |
forwardedHeaders.push(
|
| 1050 |
+
`Host: ${GATEWAY_HOST}:${proxyPort}`,
|
| 1051 |
);
|
| 1052 |
forwardedHeaders.push(
|
| 1053 |
+
`X-Forwarded-For: ${clientIp || ""}`,
|
| 1054 |
+
);
|
| 1055 |
+
forwardedHeaders.push(
|
| 1056 |
+
`X-Forwarded-Host: ${req.headers.host || ""}`,
|
| 1057 |
);
|
| 1058 |
forwardedHeaders.push(
|
| 1059 |
`X-Forwarded-Proto: ${req.headers["x-forwarded-proto"] || "https"}`,
|
|
|
|
| 1070 |
proxyPort = GATEWAY_PORT,
|
| 1071 |
) {
|
| 1072 |
const proxySocket = net.connect(proxyPort, GATEWAY_HOST);
|
| 1073 |
+
const clientIp = getForwardedClientIp(req);
|
| 1074 |
|
| 1075 |
proxySocket.on("connect", () => {
|
| 1076 |
const requestLines = [
|
| 1077 |
`${req.method} ${proxyPath} HTTP/${req.httpVersion}`,
|
| 1078 |
+
...serializeUpgradeHeaders(req, clientIp, proxyPort),
|
| 1079 |
"",
|
| 1080 |
"",
|
| 1081 |
];
|
|
|
|
| 1153 |
|
| 1154 |
void (async () => {
|
| 1155 |
try {
|
| 1156 |
+
if (!UPTIMEROBOT_SETUP_ENABLED) {
|
| 1157 |
+
res.writeHead(403, { "Content-Type": "application/json" });
|
| 1158 |
+
res.end(JSON.stringify({ message: "Uptime setup is disabled." }));
|
| 1159 |
+
return;
|
| 1160 |
+
}
|
| 1161 |
+
|
| 1162 |
+
if (isRateLimited(req)) {
|
| 1163 |
+
res.writeHead(429, { "Content-Type": "application/json" });
|
| 1164 |
+
res.end(JSON.stringify({ message: "Too many requests." }));
|
| 1165 |
+
return;
|
| 1166 |
+
}
|
| 1167 |
+
|
| 1168 |
+
if (!isAllowedUptimeSetupOrigin(req)) {
|
| 1169 |
+
res.writeHead(403, { "Content-Type": "application/json" });
|
| 1170 |
+
res.end(JSON.stringify({ message: "Invalid request origin." }));
|
| 1171 |
+
return;
|
| 1172 |
+
}
|
| 1173 |
+
|
| 1174 |
const body = await readRequestBody(req);
|
| 1175 |
const parsed = JSON.parse(body || "{}");
|
| 1176 |
const apiKey = String(parsed.apiKey || "").trim();
|
| 1177 |
|
| 1178 |
+
if (!isValidUptimeApiKey(apiKey)) {
|
| 1179 |
res.writeHead(400, { "Content-Type": "application/json" });
|
| 1180 |
res.end(
|
| 1181 |
JSON.stringify({
|
| 1182 |
+
message: "A valid API key is required.",
|
| 1183 |
}),
|
| 1184 |
);
|
| 1185 |
return;
|
start.sh
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
#!/bin/bash
|
| 2 |
-
set -
|
|
|
|
|
|
|
| 3 |
|
| 4 |
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 5 |
# HuggingClaw β OpenClaw Gateway for HF Spaces
|
|
@@ -127,95 +129,13 @@ if [ -n "$HF_TOKEN" ]; then
|
|
| 127 |
fi
|
| 128 |
fi
|
| 129 |
|
| 130 |
-
# ββ
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
echo "
|
| 137 |
-
DATASET_CHECK=$(curl -s -o /dev/null -w "%{http_code}" \
|
| 138 |
-
-H "Authorization: Bearer $HF_TOKEN" \
|
| 139 |
-
"https://huggingface.co/api/datasets/${HF_USERNAME}/${BACKUP_DATASET}" \
|
| 140 |
-
--max-time 10 2>/dev/null || echo "000")
|
| 141 |
-
|
| 142 |
-
if [ "$DATASET_CHECK" = "404" ]; then
|
| 143 |
-
echo " π Dataset not found, creating ${HF_USERNAME}/${BACKUP_DATASET}..."
|
| 144 |
-
CREATE_RESULT=$(curl -s -w "\n%{http_code}" \
|
| 145 |
-
-X POST "https://huggingface.co/api/repos/create" \
|
| 146 |
-
-H "Authorization: Bearer $HF_TOKEN" \
|
| 147 |
-
-H "Content-Type: application/json" \
|
| 148 |
-
-d "{\"type\":\"dataset\",\"name\":\"${BACKUP_DATASET}\",\"private\":true}" \
|
| 149 |
-
--max-time 15 2>/dev/null || echo "error")
|
| 150 |
-
CREATE_STATUS=$(echo "$CREATE_RESULT" | tail -1)
|
| 151 |
-
if [ "$CREATE_STATUS" = "200" ] || [ "$CREATE_STATUS" = "201" ]; then
|
| 152 |
-
echo " β
Dataset created: ${HF_USERNAME}/${BACKUP_DATASET} (private)"
|
| 153 |
-
else
|
| 154 |
-
echo " β οΈ Could not create dataset (HTTP $CREATE_STATUS). Create it manually:"
|
| 155 |
-
echo " https://huggingface.co/datasets/create"
|
| 156 |
-
fi
|
| 157 |
-
elif [ "$DATASET_CHECK" = "200" ]; then
|
| 158 |
-
echo " β
Dataset exists"
|
| 159 |
-
else
|
| 160 |
-
echo " β οΈ Could not check dataset (HTTP $DATASET_CHECK)"
|
| 161 |
-
fi
|
| 162 |
-
|
| 163 |
-
# Restore workspace
|
| 164 |
-
echo "π¦ Restoring workspace..."
|
| 165 |
-
WORKSPACE="/home/node/.openclaw/workspace"
|
| 166 |
-
GIT_USER_EMAIL="${WORKSPACE_GIT_USER:-openclaw@example.com}"
|
| 167 |
-
GIT_USER_NAME="${WORKSPACE_GIT_NAME:-OpenClaw Bot}"
|
| 168 |
-
|
| 169 |
-
cd "$WORKSPACE"
|
| 170 |
-
if [ ! -d ".git" ]; then
|
| 171 |
-
git init -q
|
| 172 |
-
git remote add origin "$BACKUP_URL"
|
| 173 |
-
else
|
| 174 |
-
git remote set-url origin "$BACKUP_URL"
|
| 175 |
-
fi
|
| 176 |
-
|
| 177 |
-
git config user.email "$GIT_USER_EMAIL"
|
| 178 |
-
git config user.name "$GIT_USER_NAME"
|
| 179 |
-
|
| 180 |
-
if git fetch origin main 2>/dev/null; then
|
| 181 |
-
git reset --hard origin/main 2>/dev/null && echo " β
Workspace restored!"
|
| 182 |
-
else
|
| 183 |
-
echo " β οΈ No remote data yet, starting fresh."
|
| 184 |
-
fi
|
| 185 |
-
cd /
|
| 186 |
-
fi
|
| 187 |
-
|
| 188 |
-
# ββ Restore persisted OpenClaw state (if present) ββ
|
| 189 |
-
STATE_BACKUP_ROOT="/home/node/.openclaw/workspace/.huggingclaw-state/openclaw"
|
| 190 |
-
if [ -d "$STATE_BACKUP_ROOT" ]; then
|
| 191 |
-
echo "π§ Restoring OpenClaw state..."
|
| 192 |
-
for source_path in "$STATE_BACKUP_ROOT"/*; do
|
| 193 |
-
[ -e "$source_path" ] || continue
|
| 194 |
-
name="$(basename "$source_path")"
|
| 195 |
-
target_path="/home/node/.openclaw/${name}"
|
| 196 |
-
|
| 197 |
-
rm -rf "$target_path"
|
| 198 |
-
mkdir -p "$(dirname "$target_path")"
|
| 199 |
-
cp -R "$source_path" "$target_path"
|
| 200 |
-
done
|
| 201 |
-
echo " β
OpenClaw state restored"
|
| 202 |
-
fi
|
| 203 |
-
|
| 204 |
-
# ββ Restore persisted WhatsApp credentials (if present) ββ
|
| 205 |
-
WA_BACKUP_DIR="/home/node/.openclaw/workspace/.huggingclaw-state/credentials/whatsapp/default"
|
| 206 |
-
WA_CREDS_DIR="/home/node/.openclaw/credentials/whatsapp/default"
|
| 207 |
-
if [ "$WHATSAPP_ENABLED_NORMALIZED" = "true" ] && [ -d "$WA_BACKUP_DIR" ]; then
|
| 208 |
-
WA_FILE_COUNT=$(find "$WA_BACKUP_DIR" -type f | wc -l | tr -d ' ')
|
| 209 |
-
if [ "$WA_FILE_COUNT" -ge 2 ]; then
|
| 210 |
-
echo "π± Restoring WhatsApp credentials..."
|
| 211 |
-
rm -rf "$WA_CREDS_DIR"
|
| 212 |
-
mkdir -p "$(dirname "$WA_CREDS_DIR")"
|
| 213 |
-
cp -R "$WA_BACKUP_DIR" "$WA_CREDS_DIR"
|
| 214 |
-
chmod -R go-rwx /home/node/.openclaw/credentials/whatsapp 2>/dev/null || true
|
| 215 |
-
echo " β
WhatsApp credentials restored"
|
| 216 |
-
else
|
| 217 |
-
echo " β οΈ Saved WhatsApp credentials look incomplete (${WA_FILE_COUNT} files), skipping restore."
|
| 218 |
-
fi
|
| 219 |
fi
|
| 220 |
|
| 221 |
# ββ Build config ββ
|
|
@@ -419,8 +339,8 @@ printf " β %-40s β\n" "Browser: β
${BROWSER_EXECUTABLE_PATH}"
|
|
| 419 |
else
|
| 420 |
printf " β %-40s β\n" "Browser: β unavailable"
|
| 421 |
fi
|
| 422 |
-
if [ -n "$
|
| 423 |
-
printf " β %-40s β\n" "Backup: β
${
|
| 424 |
else
|
| 425 |
printf " β %-40s β\n" "Backup: β not configured"
|
| 426 |
fi
|
|
@@ -434,7 +354,7 @@ printf " β %-40s β\n" "Control UI: https://${SPACE_HOST}/app"
|
|
| 434 |
printf " β %-40s β\n" "Dashboard: https://${SPACE_HOST}"
|
| 435 |
fi
|
| 436 |
SYNC_STATUS="β disabled"
|
| 437 |
-
if [ -n "$
|
| 438 |
SYNC_STATUS="β
every ${SYNC_INTERVAL:-180}s"
|
| 439 |
fi
|
| 440 |
printf " β %-40s β\n" "Auto-sync: $SYNC_STATUS"
|
|
@@ -459,7 +379,7 @@ graceful_shutdown() {
|
|
| 459 |
|
| 460 |
if [ -f "/home/node/app/workspace-sync.py" ]; then
|
| 461 |
echo "πΎ Saving OpenClaw state before exit..."
|
| 462 |
-
python3 /home/node/app/workspace-sync.py
|
| 463 |
echo " β οΈ Could not complete shutdown sync"
|
| 464 |
fi
|
| 465 |
|
|
@@ -530,7 +450,9 @@ fi
|
|
| 530 |
warmup_browser
|
| 531 |
|
| 532 |
# 12. Start Workspace Sync after startup settles
|
| 533 |
-
|
|
|
|
|
|
|
| 534 |
|
| 535 |
# Wait for gateway (allows trap to fire)
|
| 536 |
wait $GATEWAY_PID
|
|
|
|
| 1 |
#!/bin/bash
|
| 2 |
+
set -euo pipefail
|
| 3 |
+
|
| 4 |
+
umask 0077
|
| 5 |
|
| 6 |
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 7 |
# HuggingClaw β OpenClaw Gateway for HF Spaces
|
|
|
|
| 129 |
fi
|
| 130 |
fi
|
| 131 |
|
| 132 |
+
# ββ Restore workspace/state from HF Dataset ββ
|
| 133 |
+
BACKUP_DATASET="${BACKUP_DATASET_NAME:-huggingclaw-backup}"
|
| 134 |
+
if [ -n "${HF_TOKEN:-}" ]; then
|
| 135 |
+
echo "π¦ Restoring workspace and state from HF Dataset..."
|
| 136 |
+
python3 /home/node/app/workspace-sync.py restore || true
|
| 137 |
+
else
|
| 138 |
+
echo "HF_TOKEN is not set. Running without dataset persistence."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 139 |
fi
|
| 140 |
|
| 141 |
# ββ Build config ββ
|
|
|
|
| 339 |
else
|
| 340 |
printf " β %-40s β\n" "Browser: β unavailable"
|
| 341 |
fi
|
| 342 |
+
if [ -n "${HF_TOKEN:-}" ]; then
|
| 343 |
+
printf " β %-40s β\n" "Backup: β
${BACKUP_DATASET:-huggingclaw-backup} (auto namespace)"
|
| 344 |
else
|
| 345 |
printf " β %-40s β\n" "Backup: β not configured"
|
| 346 |
fi
|
|
|
|
| 354 |
printf " β %-40s β\n" "Dashboard: https://${SPACE_HOST}"
|
| 355 |
fi
|
| 356 |
SYNC_STATUS="β disabled"
|
| 357 |
+
if [ -n "${HF_TOKEN:-}" ]; then
|
| 358 |
SYNC_STATUS="β
every ${SYNC_INTERVAL:-180}s"
|
| 359 |
fi
|
| 360 |
printf " β %-40s β\n" "Auto-sync: $SYNC_STATUS"
|
|
|
|
| 379 |
|
| 380 |
if [ -f "/home/node/app/workspace-sync.py" ]; then
|
| 381 |
echo "πΎ Saving OpenClaw state before exit..."
|
| 382 |
+
python3 /home/node/app/workspace-sync.py sync-once || \
|
| 383 |
echo " β οΈ Could not complete shutdown sync"
|
| 384 |
fi
|
| 385 |
|
|
|
|
| 450 |
warmup_browser
|
| 451 |
|
| 452 |
# 12. Start Workspace Sync after startup settles
|
| 453 |
+
if [ -n "${HF_TOKEN:-}" ]; then
|
| 454 |
+
python3 -u /home/node/app/workspace-sync.py loop &
|
| 455 |
+
fi
|
| 456 |
|
| 457 |
# Wait for gateway (allows trap to fire)
|
| 458 |
wait $GATEWAY_PID
|
workspace-sync.py
CHANGED
|
@@ -1,24 +1,39 @@
|
|
| 1 |
#!/usr/bin/env python3
|
| 2 |
"""
|
| 3 |
-
HuggingClaw
|
| 4 |
-
Uses huggingface_hub Python library instead of git for more reliable
|
| 5 |
-
HF Dataset operations (handles auth, LFS, retries automatically).
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
| 8 |
"""
|
| 9 |
|
|
|
|
|
|
|
| 10 |
import os
|
|
|
|
|
|
|
| 11 |
import sys
|
|
|
|
|
|
|
| 12 |
import time
|
| 13 |
-
import signal
|
| 14 |
-
import shutil
|
| 15 |
-
import subprocess
|
| 16 |
from pathlib import Path
|
| 17 |
|
| 18 |
os.environ.setdefault("HF_HUB_DISABLE_PROGRESS_BARS", "1")
|
| 19 |
|
|
|
|
|
|
|
|
|
|
| 20 |
OPENCLAW_HOME = Path("/home/node/.openclaw")
|
| 21 |
WORKSPACE = OPENCLAW_HOME / "workspace"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
STATE_DIR = WORKSPACE / ".huggingclaw-state"
|
| 23 |
OPENCLAW_STATE_BACKUP_DIR = STATE_DIR / "openclaw"
|
| 24 |
EXCLUDED_STATE_NAMES = {
|
|
@@ -27,41 +42,32 @@ EXCLUDED_STATE_NAMES = {
|
|
| 27 |
"gateway.log",
|
| 28 |
"browser",
|
| 29 |
}
|
| 30 |
-
WHATSAPP_CREDS_DIR =
|
| 31 |
WHATSAPP_BACKUP_DIR = STATE_DIR / "credentials" / "whatsapp" / "default"
|
| 32 |
RESET_MARKER = WORKSPACE / ".reset_credentials"
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
HF_USERNAME = os.environ.get("HF_USERNAME", "")
|
| 37 |
-
BACKUP_DATASET = os.environ.get("BACKUP_DATASET_NAME", "huggingclaw-backup")
|
| 38 |
-
WEBHOOK_URL = os.environ.get("WEBHOOK_URL", "")
|
| 39 |
-
WHATSAPP_ENABLED = os.environ.get("WHATSAPP_ENABLED", "").strip().lower() == "true"
|
| 40 |
-
|
| 41 |
-
running = True
|
| 42 |
|
| 43 |
-
def signal_handler(sig, frame):
|
| 44 |
-
global running
|
| 45 |
-
running = False
|
| 46 |
|
| 47 |
-
|
| 48 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
|
| 51 |
def count_files(path: Path) -> int:
|
| 52 |
-
"""Count regular files recursively under a path."""
|
| 53 |
if not path.exists():
|
| 54 |
return 0
|
| 55 |
return sum(1 for child in path.rglob("*") if child.is_file())
|
| 56 |
|
| 57 |
|
| 58 |
def snapshot_state_into_workspace() -> None:
|
| 59 |
-
"""
|
| 60 |
-
Mirror persistent state into the workspace-backed dataset repo.
|
| 61 |
-
|
| 62 |
-
This keeps WhatsApp credentials in a hidden folder that is synced together
|
| 63 |
-
with the workspace, without changing the live credentials location.
|
| 64 |
-
"""
|
| 65 |
try:
|
| 66 |
STATE_DIR.mkdir(parents=True, exist_ok=True)
|
| 67 |
if OPENCLAW_STATE_BACKUP_DIR.exists():
|
|
@@ -77,8 +83,8 @@ def snapshot_state_into_workspace() -> None:
|
|
| 77 |
shutil.copytree(source_path, backup_path)
|
| 78 |
elif source_path.is_file():
|
| 79 |
shutil.copy2(source_path, backup_path)
|
| 80 |
-
except Exception as
|
| 81 |
-
print(f" β οΈ Could not snapshot OpenClaw state: {
|
| 82 |
|
| 83 |
try:
|
| 84 |
if not WHATSAPP_ENABLED:
|
|
@@ -106,233 +112,281 @@ def snapshot_state_into_workspace() -> None:
|
|
| 106 |
if WHATSAPP_BACKUP_DIR.exists():
|
| 107 |
shutil.rmtree(WHATSAPP_BACKUP_DIR, ignore_errors=True)
|
| 108 |
shutil.copytree(WHATSAPP_CREDS_DIR, WHATSAPP_BACKUP_DIR)
|
| 109 |
-
except Exception as
|
| 110 |
-
print(f" β οΈ Could not snapshot WhatsApp state: {
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
def
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
)
|
| 121 |
-
return result.returncode != 0
|
| 122 |
-
except Exception:
|
| 123 |
-
return False
|
| 124 |
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
try:
|
| 128 |
-
import json
|
| 129 |
-
data = {
|
| 130 |
-
"status": status,
|
| 131 |
-
"timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
|
| 132 |
-
"message": message
|
| 133 |
-
}
|
| 134 |
-
with open("/tmp/sync-status.json", "w") as f:
|
| 135 |
-
json.dump(data, f)
|
| 136 |
-
except Exception as e:
|
| 137 |
-
print(f" β οΈ Could not write sync status: {e}")
|
| 138 |
-
|
| 139 |
-
def trigger_webhook(event, status, message):
|
| 140 |
-
"""Trigger webhook notification."""
|
| 141 |
-
if not WEBHOOK_URL:
|
| 142 |
-
return
|
| 143 |
-
try:
|
| 144 |
-
import urllib.request
|
| 145 |
-
import json
|
| 146 |
-
data = json.dumps({
|
| 147 |
-
"event": event,
|
| 148 |
-
"status": status,
|
| 149 |
-
"message": message,
|
| 150 |
-
"timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
|
| 151 |
-
}).encode('utf-8')
|
| 152 |
-
req = urllib.request.Request(WEBHOOK_URL, data=data, headers={'Content-Type': 'application/json'})
|
| 153 |
-
urllib.request.urlopen(req, timeout=10)
|
| 154 |
-
except Exception as e:
|
| 155 |
-
print(f" β οΈ Webhook delivery failed: {e}")
|
| 156 |
-
|
| 157 |
-
def sync_with_hf_hub():
|
| 158 |
-
"""Sync workspace using huggingface_hub library."""
|
| 159 |
-
try:
|
| 160 |
-
from huggingface_hub import HfApi, upload_folder
|
| 161 |
|
| 162 |
-
api = HfApi(token=HF_TOKEN)
|
| 163 |
-
repo_id = f"{HF_USERNAME}/{BACKUP_DATASET}"
|
| 164 |
|
| 165 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
try:
|
| 167 |
-
|
| 168 |
-
except
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
)
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 193 |
return False
|
| 194 |
|
|
|
|
|
|
|
| 195 |
|
| 196 |
-
def sync_with_git():
|
| 197 |
-
"""Fallback: sync workspace using git."""
|
| 198 |
try:
|
| 199 |
-
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 211 |
return False
|
| 212 |
|
| 213 |
|
| 214 |
-
def
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
if not
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
ts = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
|
| 222 |
-
write_sync_status("syncing", f"Starting sync at {ts}")
|
| 223 |
-
|
| 224 |
-
if use_hf_hub:
|
| 225 |
-
if sync_with_hf_hub():
|
| 226 |
-
print(f"π Workspace sync (hf_hub): pushed changes ({ts})")
|
| 227 |
-
write_sync_status("success", "Successfully pushed to HF Hub")
|
| 228 |
-
return
|
| 229 |
-
|
| 230 |
-
if sync_with_git():
|
| 231 |
-
print(f"π Workspace sync (git fallback): pushed changes ({ts})")
|
| 232 |
-
write_sync_status("success", "Successfully pushed via git fallback")
|
| 233 |
-
return
|
| 234 |
-
|
| 235 |
-
msg = f"Workspace sync: failed ({ts}), will retry"
|
| 236 |
-
print(f"π {msg}")
|
| 237 |
-
write_sync_status("error", msg)
|
| 238 |
-
trigger_webhook("sync", "error", msg)
|
| 239 |
-
return
|
| 240 |
-
|
| 241 |
-
if sync_with_git():
|
| 242 |
-
print(f"π Workspace sync (git): pushed changes ({ts})")
|
| 243 |
-
write_sync_status("success", "Successfully pushed via git")
|
| 244 |
-
return
|
| 245 |
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
trigger_webhook("sync", "error", msg)
|
| 250 |
|
|
|
|
|
|
|
|
|
|
| 251 |
|
| 252 |
-
|
| 253 |
-
if
|
| 254 |
-
|
| 255 |
-
|
| 256 |
-
return
|
| 257 |
|
| 258 |
-
|
| 259 |
-
|
| 260 |
-
|
| 261 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 262 |
|
| 263 |
-
|
| 264 |
-
|
| 265 |
|
| 266 |
-
if not use_hf_hub and not git_dir.exists():
|
| 267 |
-
print("π Workspace sync: no git repo and no HF credentials, skipping.")
|
| 268 |
-
return
|
| 269 |
|
| 270 |
-
|
|
|
|
| 271 |
|
| 272 |
-
if not has_changes():
|
| 273 |
-
print("π Workspace sync: no changes to persist.")
|
| 274 |
-
write_sync_status("configured", "No new state changes to sync.")
|
| 275 |
-
return
|
| 276 |
|
| 277 |
-
|
| 278 |
-
|
| 279 |
-
|
| 280 |
-
if use_hf_hub:
|
| 281 |
-
if sync_with_hf_hub():
|
| 282 |
-
print(f"π Workspace sync (hf_hub): pushed changes ({ts})")
|
| 283 |
-
write_sync_status("success", "Shutdown sync pushed to HF Hub")
|
| 284 |
-
return
|
| 285 |
-
if sync_with_git():
|
| 286 |
-
print(f"π Workspace sync (git fallback): pushed changes ({ts})")
|
| 287 |
-
write_sync_status("success", "Shutdown sync pushed via git fallback")
|
| 288 |
-
return
|
| 289 |
-
write_sync_status("error", "Shutdown sync failed")
|
| 290 |
-
print("π Workspace sync: shutdown sync failed.")
|
| 291 |
-
return
|
| 292 |
|
| 293 |
-
|
| 294 |
-
|
| 295 |
-
|
| 296 |
-
|
|
|
|
|
|
|
|
|
|
| 297 |
|
| 298 |
-
|
| 299 |
-
|
| 300 |
-
return
|
| 301 |
|
| 302 |
-
|
| 303 |
-
|
| 304 |
-
return
|
| 305 |
|
| 306 |
-
|
| 307 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 308 |
|
| 309 |
-
|
| 310 |
-
|
| 311 |
-
return
|
| 312 |
|
| 313 |
-
|
| 314 |
-
if use_hf_hub:
|
| 315 |
-
write_sync_status("configured", f"Backup enabled. Waiting for next sync in {INTERVAL}s.")
|
| 316 |
-
else:
|
| 317 |
-
write_sync_status("configured", f"Git sync enabled. Waiting for next sync in {INTERVAL}s.")
|
| 318 |
|
| 319 |
-
# Give the gateway a short head start before the first sync probe.
|
| 320 |
-
time.sleep(INITIAL_DELAY)
|
| 321 |
|
| 322 |
-
|
| 323 |
-
|
| 324 |
-
else:
|
| 325 |
-
print(f"π Workspace sync started (git): every {INTERVAL}s")
|
| 326 |
|
| 327 |
-
|
|
|
|
| 328 |
|
| 329 |
-
|
| 330 |
-
|
| 331 |
-
if
|
| 332 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 333 |
|
| 334 |
-
|
|
|
|
| 335 |
|
| 336 |
|
| 337 |
if __name__ == "__main__":
|
| 338 |
-
main()
|
|
|
|
| 1 |
#!/usr/bin/env python3
|
| 2 |
"""
|
| 3 |
+
HuggingClaw workspace/state backup via huggingface_hub.
|
|
|
|
|
|
|
| 4 |
|
| 5 |
+
This keeps OpenClaw workspace data, app state, and optional WhatsApp
|
| 6 |
+
credentials inside a private HF dataset without embedding HF tokens in git
|
| 7 |
+
remotes or requiring a manual HF_USERNAME secret.
|
| 8 |
"""
|
| 9 |
|
| 10 |
+
import hashlib
|
| 11 |
+
import json
|
| 12 |
import os
|
| 13 |
+
import shutil
|
| 14 |
+
import signal
|
| 15 |
import sys
|
| 16 |
+
import tempfile
|
| 17 |
+
import threading
|
| 18 |
import time
|
|
|
|
|
|
|
|
|
|
| 19 |
from pathlib import Path
|
| 20 |
|
| 21 |
os.environ.setdefault("HF_HUB_DISABLE_PROGRESS_BARS", "1")
|
| 22 |
|
| 23 |
+
from huggingface_hub import HfApi, snapshot_download, upload_folder
|
| 24 |
+
from huggingface_hub.errors import HfHubHTTPError, RepositoryNotFoundError
|
| 25 |
+
|
| 26 |
OPENCLAW_HOME = Path("/home/node/.openclaw")
|
| 27 |
WORKSPACE = OPENCLAW_HOME / "workspace"
|
| 28 |
+
STATUS_FILE = Path("/tmp/sync-status.json")
|
| 29 |
+
INTERVAL = int(os.environ.get("SYNC_INTERVAL", "180"))
|
| 30 |
+
INITIAL_DELAY = int(os.environ.get("SYNC_START_DELAY", "10"))
|
| 31 |
+
HF_TOKEN = os.environ.get("HF_TOKEN", "").strip()
|
| 32 |
+
HF_USERNAME = os.environ.get("HF_USERNAME", "").strip()
|
| 33 |
+
SPACE_AUTHOR_NAME = os.environ.get("SPACE_AUTHOR_NAME", "").strip()
|
| 34 |
+
BACKUP_DATASET_NAME = os.environ.get("BACKUP_DATASET_NAME", "huggingclaw-backup").strip()
|
| 35 |
+
WHATSAPP_ENABLED = os.environ.get("WHATSAPP_ENABLED", "").strip().lower() == "true"
|
| 36 |
+
|
| 37 |
STATE_DIR = WORKSPACE / ".huggingclaw-state"
|
| 38 |
OPENCLAW_STATE_BACKUP_DIR = STATE_DIR / "openclaw"
|
| 39 |
EXCLUDED_STATE_NAMES = {
|
|
|
|
| 42 |
"gateway.log",
|
| 43 |
"browser",
|
| 44 |
}
|
| 45 |
+
WHATSAPP_CREDS_DIR = OPENCLAW_HOME / "credentials" / "whatsapp" / "default"
|
| 46 |
WHATSAPP_BACKUP_DIR = STATE_DIR / "credentials" / "whatsapp" / "default"
|
| 47 |
RESET_MARKER = WORKSPACE / ".reset_credentials"
|
| 48 |
+
HF_API = HfApi(token=HF_TOKEN) if HF_TOKEN else None
|
| 49 |
+
STOP_EVENT = threading.Event()
|
| 50 |
+
_REPO_ID_CACHE: str | None = None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
+
def write_status(status: str, message: str) -> None:
|
| 54 |
+
payload = {
|
| 55 |
+
"status": status,
|
| 56 |
+
"message": message,
|
| 57 |
+
"timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
|
| 58 |
+
}
|
| 59 |
+
tmp_path = STATUS_FILE.with_suffix(".tmp")
|
| 60 |
+
tmp_path.write_text(json.dumps(payload), encoding="utf-8")
|
| 61 |
+
tmp_path.replace(STATUS_FILE)
|
| 62 |
|
| 63 |
|
| 64 |
def count_files(path: Path) -> int:
|
|
|
|
| 65 |
if not path.exists():
|
| 66 |
return 0
|
| 67 |
return sum(1 for child in path.rglob("*") if child.is_file())
|
| 68 |
|
| 69 |
|
| 70 |
def snapshot_state_into_workspace() -> None:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
try:
|
| 72 |
STATE_DIR.mkdir(parents=True, exist_ok=True)
|
| 73 |
if OPENCLAW_STATE_BACKUP_DIR.exists():
|
|
|
|
| 83 |
shutil.copytree(source_path, backup_path)
|
| 84 |
elif source_path.is_file():
|
| 85 |
shutil.copy2(source_path, backup_path)
|
| 86 |
+
except Exception as exc:
|
| 87 |
+
print(f" β οΈ Could not snapshot OpenClaw state: {exc}")
|
| 88 |
|
| 89 |
try:
|
| 90 |
if not WHATSAPP_ENABLED:
|
|
|
|
| 112 |
if WHATSAPP_BACKUP_DIR.exists():
|
| 113 |
shutil.rmtree(WHATSAPP_BACKUP_DIR, ignore_errors=True)
|
| 114 |
shutil.copytree(WHATSAPP_CREDS_DIR, WHATSAPP_BACKUP_DIR)
|
| 115 |
+
except Exception as exc:
|
| 116 |
+
print(f" β οΈ Could not snapshot WhatsApp state: {exc}")
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
def restore_embedded_state() -> None:
|
| 120 |
+
state_backup_root = STATE_DIR / "openclaw"
|
| 121 |
+
if state_backup_root.is_dir():
|
| 122 |
+
print("π§ Restoring OpenClaw state...")
|
| 123 |
+
for source_path in state_backup_root.iterdir():
|
| 124 |
+
name = source_path.name
|
| 125 |
+
target_path = OPENCLAW_HOME / name
|
| 126 |
+
shutil.rmtree(target_path, ignore_errors=True)
|
| 127 |
+
if target_path.is_file():
|
| 128 |
+
target_path.unlink(missing_ok=True)
|
| 129 |
+
target_path.parent.mkdir(parents=True, exist_ok=True)
|
| 130 |
+
if source_path.is_dir():
|
| 131 |
+
shutil.copytree(source_path, target_path)
|
| 132 |
+
else:
|
| 133 |
+
shutil.copy2(source_path, target_path)
|
| 134 |
+
print(" β
OpenClaw state restored")
|
| 135 |
+
|
| 136 |
+
if WHATSAPP_ENABLED and WHATSAPP_BACKUP_DIR.is_dir():
|
| 137 |
+
file_count = count_files(WHATSAPP_BACKUP_DIR)
|
| 138 |
+
if file_count >= 2:
|
| 139 |
+
print("π± Restoring WhatsApp credentials...")
|
| 140 |
+
shutil.rmtree(WHATSAPP_CREDS_DIR, ignore_errors=True)
|
| 141 |
+
WHATSAPP_CREDS_DIR.parent.mkdir(parents=True, exist_ok=True)
|
| 142 |
+
shutil.copytree(WHATSAPP_BACKUP_DIR, WHATSAPP_CREDS_DIR)
|
| 143 |
+
os.chmod(OPENCLAW_HOME / "credentials", 0o700)
|
| 144 |
+
print(" β
WhatsApp credentials restored")
|
| 145 |
+
else:
|
| 146 |
+
print(f" β οΈ Saved WhatsApp credentials look incomplete ({file_count} files), skipping restore.")
|
| 147 |
+
|
| 148 |
+
|
| 149 |
+
def resolve_backup_namespace() -> str:
|
| 150 |
+
global _REPO_ID_CACHE
|
| 151 |
+
if _REPO_ID_CACHE:
|
| 152 |
+
return _REPO_ID_CACHE
|
| 153 |
+
|
| 154 |
+
namespace = HF_USERNAME or SPACE_AUTHOR_NAME
|
| 155 |
+
if not namespace and HF_API is not None:
|
| 156 |
+
whoami = HF_API.whoami()
|
| 157 |
+
namespace = whoami.get("name") or whoami.get("user") or ""
|
| 158 |
+
|
| 159 |
+
namespace = str(namespace).strip()
|
| 160 |
+
if not namespace:
|
| 161 |
+
raise RuntimeError(
|
| 162 |
+
"Could not determine the Hugging Face username for backups. "
|
| 163 |
+
"Set HF_USERNAME or use a token tied to your account."
|
| 164 |
)
|
|
|
|
|
|
|
|
|
|
| 165 |
|
| 166 |
+
_REPO_ID_CACHE = f"{namespace}/{BACKUP_DATASET_NAME}"
|
| 167 |
+
return _REPO_ID_CACHE
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 168 |
|
|
|
|
|
|
|
| 169 |
|
| 170 |
+
def ensure_repo_exists() -> str:
|
| 171 |
+
repo_id = resolve_backup_namespace()
|
| 172 |
+
try:
|
| 173 |
+
HF_API.repo_info(repo_id=repo_id, repo_type="dataset")
|
| 174 |
+
except RepositoryNotFoundError:
|
| 175 |
+
HF_API.create_repo(repo_id=repo_id, repo_type="dataset", private=True)
|
| 176 |
+
return repo_id
|
| 177 |
+
|
| 178 |
+
|
| 179 |
+
def metadata_marker(root: Path) -> tuple[int, int, int]:
|
| 180 |
+
if not root.exists():
|
| 181 |
+
return (0, 0, 0)
|
| 182 |
+
|
| 183 |
+
file_count = 0
|
| 184 |
+
total_size = 0
|
| 185 |
+
newest_mtime = 0
|
| 186 |
+
for path in root.rglob("*"):
|
| 187 |
+
if not path.is_file():
|
| 188 |
+
continue
|
| 189 |
+
rel = path.relative_to(root).as_posix()
|
| 190 |
+
if rel.startswith(".git/"):
|
| 191 |
+
continue
|
| 192 |
try:
|
| 193 |
+
stat = path.stat()
|
| 194 |
+
except OSError:
|
| 195 |
+
continue
|
| 196 |
+
file_count += 1
|
| 197 |
+
total_size += int(stat.st_size)
|
| 198 |
+
newest_mtime = max(newest_mtime, int(stat.st_mtime_ns))
|
| 199 |
+
return (file_count, total_size, newest_mtime)
|
| 200 |
+
|
| 201 |
+
|
| 202 |
+
def fingerprint_dir(root: Path) -> str:
|
| 203 |
+
hasher = hashlib.sha256()
|
| 204 |
+
if not root.exists():
|
| 205 |
+
return hasher.hexdigest()
|
| 206 |
+
|
| 207 |
+
for path in sorted(p for p in root.rglob("*") if p.is_file()):
|
| 208 |
+
rel = path.relative_to(root).as_posix()
|
| 209 |
+
if rel.startswith(".git/"):
|
| 210 |
+
continue
|
| 211 |
+
hasher.update(rel.encode("utf-8"))
|
| 212 |
+
with path.open("rb") as handle:
|
| 213 |
+
for chunk in iter(lambda: handle.read(1024 * 1024), b""):
|
| 214 |
+
hasher.update(chunk)
|
| 215 |
+
return hasher.hexdigest()
|
| 216 |
+
|
| 217 |
+
|
| 218 |
+
def create_snapshot_dir(source_root: Path) -> Path:
|
| 219 |
+
staging_root = Path(tempfile.mkdtemp(prefix="huggingclaw-sync-"))
|
| 220 |
+
for path in sorted(source_root.rglob("*")):
|
| 221 |
+
rel = path.relative_to(source_root)
|
| 222 |
+
rel_posix = rel.as_posix()
|
| 223 |
+
if rel_posix.startswith(".git/") or rel_posix == ".git":
|
| 224 |
+
continue
|
| 225 |
+
target = staging_root / rel
|
| 226 |
+
if path.is_dir():
|
| 227 |
+
target.mkdir(parents=True, exist_ok=True)
|
| 228 |
+
continue
|
| 229 |
+
target.parent.mkdir(parents=True, exist_ok=True)
|
| 230 |
+
shutil.copy2(path, target)
|
| 231 |
+
return staging_root
|
| 232 |
+
|
| 233 |
+
|
| 234 |
+
def restore_workspace() -> bool:
|
| 235 |
+
if not HF_TOKEN:
|
| 236 |
+
write_status("disabled", "HF_TOKEN is not configured.")
|
| 237 |
return False
|
| 238 |
|
| 239 |
+
repo_id = resolve_backup_namespace()
|
| 240 |
+
write_status("restoring", f"Restoring workspace from {repo_id}")
|
| 241 |
|
|
|
|
|
|
|
| 242 |
try:
|
| 243 |
+
with tempfile.TemporaryDirectory() as tmpdir:
|
| 244 |
+
snapshot_download(
|
| 245 |
+
repo_id=repo_id,
|
| 246 |
+
repo_type="dataset",
|
| 247 |
+
token=HF_TOKEN,
|
| 248 |
+
local_dir=tmpdir,
|
| 249 |
+
)
|
| 250 |
+
|
| 251 |
+
tmp_path = Path(tmpdir)
|
| 252 |
+
if not any(tmp_path.iterdir()):
|
| 253 |
+
write_status("fresh", "Backup dataset is empty. Starting fresh.")
|
| 254 |
+
return True
|
| 255 |
+
|
| 256 |
+
WORKSPACE.mkdir(parents=True, exist_ok=True)
|
| 257 |
+
for child in list(WORKSPACE.iterdir()):
|
| 258 |
+
if child.name == ".git":
|
| 259 |
+
continue
|
| 260 |
+
if child.is_dir():
|
| 261 |
+
shutil.rmtree(child, ignore_errors=True)
|
| 262 |
+
else:
|
| 263 |
+
child.unlink(missing_ok=True)
|
| 264 |
+
|
| 265 |
+
for child in tmp_path.iterdir():
|
| 266 |
+
if child.name == ".git":
|
| 267 |
+
continue
|
| 268 |
+
destination = WORKSPACE / child.name
|
| 269 |
+
if child.is_dir():
|
| 270 |
+
shutil.copytree(child, destination)
|
| 271 |
+
else:
|
| 272 |
+
shutil.copy2(child, destination)
|
| 273 |
+
|
| 274 |
+
restore_embedded_state()
|
| 275 |
+
write_status("restored", f"Restored workspace from {repo_id}")
|
| 276 |
+
return True
|
| 277 |
+
except RepositoryNotFoundError:
|
| 278 |
+
write_status("fresh", f"Backup dataset {repo_id} does not exist yet.")
|
| 279 |
+
return True
|
| 280 |
+
except HfHubHTTPError as exc:
|
| 281 |
+
if exc.response is not None and exc.response.status_code == 404:
|
| 282 |
+
write_status("fresh", f"Backup dataset {repo_id} does not exist yet.")
|
| 283 |
+
return True
|
| 284 |
+
write_status("error", f"Restore failed: {exc}")
|
| 285 |
+
print(f"Restore failed: {exc}", file=sys.stderr)
|
| 286 |
+
return False
|
| 287 |
+
except Exception as exc:
|
| 288 |
+
write_status("error", f"Restore failed: {exc}")
|
| 289 |
+
print(f"Restore failed: {exc}", file=sys.stderr)
|
| 290 |
return False
|
| 291 |
|
| 292 |
|
| 293 |
+
def sync_once(
|
| 294 |
+
last_fingerprint: str | None = None,
|
| 295 |
+
last_marker: tuple[int, int, int] | None = None,
|
| 296 |
+
) -> tuple[str, tuple[int, int, int]]:
|
| 297 |
+
if not HF_TOKEN:
|
| 298 |
+
write_status("disabled", "HF_TOKEN is not configured.")
|
| 299 |
+
return (last_fingerprint or "", last_marker or (0, 0, 0))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 300 |
|
| 301 |
+
snapshot_state_into_workspace()
|
| 302 |
+
repo_id = ensure_repo_exists()
|
| 303 |
+
current_marker = metadata_marker(WORKSPACE)
|
|
|
|
| 304 |
|
| 305 |
+
if last_marker is not None and current_marker == last_marker:
|
| 306 |
+
write_status("synced", "No workspace changes detected.")
|
| 307 |
+
return (last_fingerprint or "", current_marker)
|
| 308 |
|
| 309 |
+
current_fingerprint = fingerprint_dir(WORKSPACE)
|
| 310 |
+
if last_fingerprint is not None and current_fingerprint == last_fingerprint:
|
| 311 |
+
write_status("synced", "No workspace changes detected.")
|
| 312 |
+
return (last_fingerprint, current_marker)
|
|
|
|
| 313 |
|
| 314 |
+
write_status("syncing", f"Uploading workspace to {repo_id}")
|
| 315 |
+
snapshot_dir = create_snapshot_dir(WORKSPACE)
|
| 316 |
+
try:
|
| 317 |
+
upload_folder(
|
| 318 |
+
folder_path=str(snapshot_dir),
|
| 319 |
+
repo_id=repo_id,
|
| 320 |
+
repo_type="dataset",
|
| 321 |
+
token=HF_TOKEN,
|
| 322 |
+
commit_message=f"HuggingClaw sync {time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime())}",
|
| 323 |
+
ignore_patterns=[".git/*", ".git"],
|
| 324 |
+
)
|
| 325 |
+
finally:
|
| 326 |
+
shutil.rmtree(snapshot_dir, ignore_errors=True)
|
| 327 |
|
| 328 |
+
write_status("success", f"Uploaded workspace to {repo_id}")
|
| 329 |
+
return (current_fingerprint, current_marker)
|
| 330 |
|
|
|
|
|
|
|
|
|
|
| 331 |
|
| 332 |
+
def handle_signal(_sig, _frame) -> None:
|
| 333 |
+
STOP_EVENT.set()
|
| 334 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 335 |
|
| 336 |
+
def loop() -> int:
|
| 337 |
+
signal.signal(signal.SIGTERM, handle_signal)
|
| 338 |
+
signal.signal(signal.SIGINT, handle_signal)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 339 |
|
| 340 |
+
try:
|
| 341 |
+
repo_id = resolve_backup_namespace()
|
| 342 |
+
write_status("configured", f"Backup loop active for {repo_id} with {INTERVAL}s interval.")
|
| 343 |
+
except Exception as exc:
|
| 344 |
+
write_status("error", str(exc))
|
| 345 |
+
print(f"π Workspace sync: {exc}")
|
| 346 |
+
return 1
|
| 347 |
|
| 348 |
+
last_fingerprint = fingerprint_dir(WORKSPACE)
|
| 349 |
+
last_marker = metadata_marker(WORKSPACE)
|
|
|
|
| 350 |
|
| 351 |
+
time.sleep(INITIAL_DELAY)
|
| 352 |
+
print(f"π Workspace sync started: every {INTERVAL}s β {repo_id}")
|
|
|
|
| 353 |
|
| 354 |
+
while not STOP_EVENT.is_set():
|
| 355 |
+
try:
|
| 356 |
+
last_fingerprint, last_marker = sync_once(last_fingerprint, last_marker)
|
| 357 |
+
except Exception as exc:
|
| 358 |
+
write_status("error", f"Sync failed: {exc}")
|
| 359 |
+
print(f"π Workspace sync failed: {exc}")
|
| 360 |
|
| 361 |
+
if STOP_EVENT.wait(INTERVAL):
|
| 362 |
+
break
|
|
|
|
| 363 |
|
| 364 |
+
return 0
|
|
|
|
|
|
|
|
|
|
|
|
|
| 365 |
|
|
|
|
|
|
|
| 366 |
|
| 367 |
+
def main() -> int:
|
| 368 |
+
WORKSPACE.mkdir(parents=True, exist_ok=True)
|
|
|
|
|
|
|
| 369 |
|
| 370 |
+
if len(sys.argv) < 2:
|
| 371 |
+
return loop()
|
| 372 |
|
| 373 |
+
command = sys.argv[1]
|
| 374 |
+
if command == "restore":
|
| 375 |
+
return 0 if restore_workspace() else 1
|
| 376 |
+
if command == "sync-once":
|
| 377 |
+
try:
|
| 378 |
+
sync_once()
|
| 379 |
+
return 0
|
| 380 |
+
except Exception as exc:
|
| 381 |
+
write_status("error", f"Shutdown sync failed: {exc}")
|
| 382 |
+
print(f"π Workspace sync: shutdown sync failed: {exc}")
|
| 383 |
+
return 1
|
| 384 |
+
if command == "loop":
|
| 385 |
+
return loop()
|
| 386 |
|
| 387 |
+
print(f"Unknown command: {command}", file=sys.stderr)
|
| 388 |
+
return 1
|
| 389 |
|
| 390 |
|
| 391 |
if __name__ == "__main__":
|
| 392 |
+
raise SystemExit(main())
|