davanstrien HF Staff commited on
Commit
e1eb5a6
·
verified ·
1 Parent(s): e22d564

probe: docker sdk bucket mount test (UID 1000 per HF docs)

Browse files
Files changed (3) hide show
  1. Dockerfile +31 -0
  2. README.md +75 -5
  3. app.py +191 -0
Dockerfile ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Follows https://huggingface.co/docs/hub/spaces-sdks-docker#permissions exactly.
2
+ # The ONLY change from the official example is the final CMD.
3
+ FROM python:3.12-slim
4
+
5
+ # Set up a new user named "user" with user ID 1000
6
+ RUN useradd -m -u 1000 user
7
+
8
+ # Switch to the "user" user
9
+ USER user
10
+
11
+ # Set home to the user's home directory
12
+ ENV HOME=/home/user \
13
+ PATH=/home/user/.local/bin:$PATH
14
+
15
+ # Set the working directory to the user's home directory
16
+ WORKDIR $HOME/app
17
+
18
+ # Copy the current directory contents into the container at $HOME/app
19
+ # setting the owner to the user
20
+ COPY --chown=user . $HOME/app
21
+
22
+ # Try and chown /data so it's writable by the user. This is EXACTLY what
23
+ # the docs suggest for writable runtime directories. It will be overridden
24
+ # at runtime when the bucket mount lands on top — that's the bug we're
25
+ # demonstrating.
26
+ USER root
27
+ RUN mkdir -p /data && chown -R user:user /data && chmod -R 777 /data
28
+ USER user
29
+
30
+ EXPOSE 7860
31
+ CMD ["python", "-u", "app.py"]
README.md CHANGED
@@ -1,10 +1,80 @@
1
  ---
2
- title: Bucket Sqlite Probe Docker
3
- emoji: 🌍
4
- colorFrom: indigo
5
- colorTo: yellow
6
  sdk: docker
 
7
  pinned: false
 
 
 
 
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: bucket-sqlite-probe-docker
3
+ emoji: 🐳
4
+ colorFrom: red
5
+ colorTo: gray
6
  sdk: docker
7
+ app_port: 7860
8
  pinned: false
9
+ tags:
10
+ - bucket
11
+ - sqlite
12
+ - probe
13
+ - reference
14
  ---
15
 
16
+ # Bucket mount × SQLite probe — Docker SDK Space (UID 1000)
17
+
18
+ **This is the failing half of a matched pair.** The Gradio SDK half is at
19
+ [`davanstrien/bucket-sqlite-probe-gradio`](https://huggingface.co/spaces/davanstrien/bucket-sqlite-probe-gradio) and all its probes pass. This Space runs the *same probe code* against the *same bucket*, but inside a Docker SDK container that follows the official [`spaces-sdks-docker#permissions`](https://huggingface.co/docs/hub/spaces-sdks-docker#permissions) guidance — creating a `user` account with `uid=1000` and switching to it via `USER user` in the Dockerfile. Its write probes fail with `Permission denied` / `unable to open database file`.
20
+
21
+ The Dockerfile is intentionally as close to the official permissions example as possible — the only deviations are the `CMD` (runs `app.py`) and an explicit `chown/chmod 777 /data` step to prove that build-time permissions are overridden by the runtime mount.
22
+
23
+ ## What it demonstrates
24
+
25
+ With bucket `davanstrien/search-v2-chroma` attached R/W at `/data`:
26
+
27
+ - The container runs as `uid=1000(user)`, exactly as `spaces-sdks-docker#permissions` recommends.
28
+ - `/data` is mounted by `hf-mount` with `idmapped,user_id=0,group_id=0,default_permissions`. That id-mapping pins the mount's writable UID to **0 (root)**, not the container's UID 1000.
29
+ - `ls -lan /data` shows `drwxr-xr-x 3 65534 65534` (nobody:nogroup, mode 755). The `chmod 777` in the Dockerfile is silently overridden — the runtime mount replaces the build-time directory entirely.
30
+ - Write probes all fail:
31
+ ```
32
+ FAIL touch /data/docker_probe_touch: PermissionError: [Errno 13] Permission denied: '/data/docker_probe_touch'
33
+ FAIL sqlite3 connect + CREATE + INSERT: OperationalError: unable to open database file
34
+ FAIL sqlite3 journal_mode=DELETE: OperationalError: unable to open database file
35
+ FAIL fcntl.flock LOCK_EX|LOCK_NB: PermissionError: [Errno 13] Permission denied: '/data/docker_probe.lock'
36
+ ```
37
+ - The control probe (`touch $HOME/app/control_probe` on the container's build-time writable dir) succeeds — so the container is healthy and UID 1000 can write *somewhere*, just not to the bucket mount.
38
+
39
+ ## Why this matters
40
+
41
+ 1. The Docker Spaces docs explicitly tell you to run as UID 1000. There's no note that this is incompatible with Storage Bucket mounts.
42
+ 2. The Storage Buckets blog post ([buckets as working layer](https://huggingface.co/blog/davanstrien/buckets-as-working-layer)) implies buckets are a drop-in R/W volume for Spaces.
43
+ 3. These two bits of guidance are silently incompatible today because the FUSE mount is provisioned with `user_id=0,group_id=0`. A root container sees the mount as writable; a UID 1000 container does not.
44
+ 4. Any SQLite-backed tool (ChromaDB, DuckDB persistent, LMDB, RocksDB) built on a Docker SDK Space following the official permissions guidance will silently fail to open its database on a bucket mount. `huggingface-datasets-search-v2` hit this; [trackio](https://github.com/gradio-app/trackio) doesn't because Gradio SDK Spaces run as root.
45
+
46
+ ## The fix (almost certainly)
47
+
48
+ The mount provisioning layer should either:
49
+
50
+ 1. Mount with `user_id=1000,group_id=1000` for Docker SDK Spaces (the conventional UID), or
51
+ 2. Mount with the Space's runtime UID dynamically (cleanest), or
52
+ 3. Chmod the mount root to `0777` at provisioning time (hacky but works for any UID).
53
+
54
+ All three are infra-side changes. The container user can't fix this — any `chown` / `chmod` in the Dockerfile is overridden when the mount is attached at runtime.
55
+
56
+ ## Reproducing
57
+
58
+ 1. Fork or duplicate this Space.
59
+ 2. Create or choose a Storage Bucket you own.
60
+ 3. Attach it R/W at `/data` via **Space settings → Volumes** (UI) or via:
61
+ ```python
62
+ from huggingface_hub import HfApi, Volume # requires huggingface_hub >= 1.9.1
63
+ HfApi().set_space_volumes(
64
+ "your-namespace/bucket-sqlite-probe-docker",
65
+ volumes=[Volume(type="bucket", source="your-namespace/your-bucket", mount_path="/data")],
66
+ )
67
+ ```
68
+ 4. Restart. Probe output appears in startup logs and at the Space URL.
69
+
70
+ ## Related
71
+
72
+ - Matched Gradio SDK probe Space (passing): https://huggingface.co/spaces/davanstrien/bucket-sqlite-probe-gradio
73
+ - `gradio-app/trackio#465` — trackio's PR switching to the bucket backend (works because Gradio SDK ⇒ root)
74
+ - `huggingface/huggingface_hub#4054` — `set_space_volumes` payload fix (required for this Space to attach the bucket)
75
+ - Docker Spaces permissions docs: https://huggingface.co/docs/hub/spaces-sdks-docker#permissions
76
+ - Blog post: [Buckets as working layer](https://huggingface.co/blog/davanstrien/buckets-as-working-layer)
77
+
78
+ ---
79
+
80
+ Throwaway investigative Space. Kept public as a reference example. Do not rely on for production.
app.py ADDED
@@ -0,0 +1,191 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Minimal bucket mount probe for a Docker SDK Space following
2
+ https://huggingface.co/docs/hub/spaces-sdks-docker#permissions (UID 1000).
3
+
4
+ Runs the probe once at import time so results appear in startup logs,
5
+ then serves the output via a tiny http.server on port 7860 so the Space
6
+ has something to show. No Gradio dependency — keeps the container minimal
7
+ and the probe behavior identical to the Gradio SDK probe Space for an
8
+ apples-to-apples comparison.
9
+ """
10
+
11
+ from __future__ import annotations
12
+
13
+ import http.server
14
+ import os
15
+ import pwd
16
+ import socketserver
17
+ import sqlite3
18
+ import subprocess
19
+ import traceback
20
+
21
+ try:
22
+ import fcntl
23
+ except ImportError:
24
+ fcntl = None
25
+
26
+
27
+ def _run_probe() -> str:
28
+ lines = []
29
+ lines.append("=" * 60)
30
+ lines.append("BUCKET MOUNT PROBE — Docker SDK Space (UID 1000)")
31
+ lines.append("=" * 60)
32
+
33
+ try:
34
+ uid = os.getuid()
35
+ user = pwd.getpwuid(uid).pw_name
36
+ lines.append(f"uid={uid} user={user} euid={os.geteuid()} gid={os.getgid()}")
37
+ except Exception as e:
38
+ lines.append(f"uid/user lookup failed: {e}")
39
+
40
+ lines.append(f"SYSTEM={os.environ.get('SYSTEM')!r}")
41
+ lines.append(f"SPACE_ID={os.environ.get('SPACE_ID')!r}")
42
+ lines.append(f"HOME={os.environ.get('HOME')!r}")
43
+
44
+ lines.append("")
45
+ lines.append("--- ls -lan /data ---")
46
+ try:
47
+ out = subprocess.run(
48
+ ["ls", "-lan", "/data"], capture_output=True, text=True, timeout=10
49
+ )
50
+ lines.append(out.stdout or "(empty)")
51
+ if out.stderr:
52
+ lines.append(f"stderr: {out.stderr}")
53
+ except Exception as e:
54
+ lines.append(f"ls failed: {e}")
55
+
56
+ lines.append("--- stat /data ---")
57
+ try:
58
+ out = subprocess.run(
59
+ ["stat", "/data"], capture_output=True, text=True, timeout=10
60
+ )
61
+ lines.append(out.stdout or "(empty)")
62
+ except Exception as e:
63
+ lines.append(f"stat failed: {e}")
64
+
65
+ lines.append("--- mount | grep /data ---")
66
+ try:
67
+ out = subprocess.run(
68
+ "mount | grep /data || true",
69
+ shell=True,
70
+ capture_output=True,
71
+ text=True,
72
+ timeout=10,
73
+ )
74
+ lines.append(out.stdout or "(no match)")
75
+ except Exception as e:
76
+ lines.append(f"mount failed: {e}")
77
+
78
+ lines.append("")
79
+ lines.append("--- write probes ---")
80
+
81
+ def probe(label: str, fn):
82
+ try:
83
+ fn()
84
+ lines.append(f"OK {label}")
85
+ except Exception as e:
86
+ lines.append(f"FAIL {label}: {type(e).__name__}: {e}")
87
+
88
+ probe(
89
+ "touch /data/docker_probe_touch",
90
+ lambda: open("/data/docker_probe_touch", "w").close(),
91
+ )
92
+
93
+ def _sqlite_create():
94
+ conn = sqlite3.connect("/data/docker_probe.db", timeout=5.0)
95
+ conn.execute("CREATE TABLE IF NOT EXISTS t(x INTEGER)")
96
+ conn.execute("INSERT INTO t VALUES (1)")
97
+ conn.commit()
98
+ conn.close()
99
+
100
+ probe("sqlite3 connect + CREATE + INSERT", _sqlite_create)
101
+
102
+ def _sqlite_delete_journal():
103
+ conn = sqlite3.connect("/data/docker_probe_delete.db", timeout=5.0)
104
+ conn.execute("PRAGMA journal_mode = DELETE")
105
+ conn.execute("CREATE TABLE IF NOT EXISTS t(x INTEGER)")
106
+ conn.commit()
107
+ conn.close()
108
+
109
+ probe("sqlite3 journal_mode=DELETE", _sqlite_delete_journal)
110
+
111
+ if fcntl is not None:
112
+
113
+ def _flock():
114
+ f = open("/data/docker_probe.lock", "w")
115
+ fcntl.flock(f.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
116
+ fcntl.flock(f.fileno(), fcntl.LOCK_UN)
117
+ f.close()
118
+
119
+ probe("fcntl.flock LOCK_EX|LOCK_NB", _flock)
120
+ else:
121
+ lines.append("SKIP fcntl.flock: fcntl not importable")
122
+
123
+ # Control: can we write to $HOME/app (the chown'd build-time dir)?
124
+ # This should always work — it's the "normal" writable path for UID 1000.
125
+ lines.append("")
126
+ lines.append("--- control: write to $HOME/app ---")
127
+ probe(
128
+ "touch $HOME/app/control_probe",
129
+ lambda: open(os.path.expanduser("~/app/control_probe"), "w").close(),
130
+ )
131
+
132
+ lines.append("=" * 60)
133
+ return "\n".join(lines)
134
+
135
+
136
+ try:
137
+ STARTUP_PROBE = _run_probe()
138
+ except Exception:
139
+ STARTUP_PROBE = "startup probe crashed:\n" + traceback.format_exc()
140
+
141
+ print(STARTUP_PROBE, flush=True)
142
+
143
+ _HTML = """<!doctype html>
144
+ <html><head><meta charset="utf-8"><title>bucket-sqlite-probe-docker</title>
145
+ <style>
146
+ body {{ font-family: ui-monospace, SFMono-Regular, Menlo, monospace;
147
+ max-width: 900px; margin: 2em auto; padding: 0 1em;
148
+ background: #0b0f17; color: #dfe6f0; }}
149
+ h1 {{ color: #ff8f6e; }}
150
+ pre {{ background: #121826; padding: 1em; border-radius: 6px;
151
+ overflow-x: auto; font-size: 13px; line-height: 1.45; }}
152
+ a {{ color: #7cc4ff; }}
153
+ .note {{ color: #9aa7ba; font-size: 14px; }}
154
+ </style>
155
+ </head><body>
156
+ <h1>bucket-sqlite-probe-docker</h1>
157
+ <p class="note">
158
+ Matched pair with
159
+ <a href="https://huggingface.co/spaces/davanstrien/bucket-sqlite-probe-gradio">
160
+ bucket-sqlite-probe-gradio</a>.
161
+ This half runs as <code>UID 1000</code> per
162
+ <a href="https://huggingface.co/docs/hub/spaces-sdks-docker#permissions">
163
+ spaces-sdks-docker#permissions</a>. Output below captured at container startup.
164
+ </p>
165
+ <pre>{body}</pre>
166
+ </body></html>
167
+ """
168
+
169
+
170
+ class _Handler(http.server.BaseHTTPRequestHandler):
171
+ def do_GET(self): # noqa: N802
172
+ import html
173
+
174
+ body = _HTML.format(body=html.escape(STARTUP_PROBE))
175
+ data = body.encode("utf-8")
176
+ self.send_response(200)
177
+ self.send_header("Content-Type", "text/html; charset=utf-8")
178
+ self.send_header("Content-Length", str(len(data)))
179
+ self.end_headers()
180
+ self.wfile.write(data)
181
+
182
+ def log_message(self, fmt, *args): # noqa: N802
183
+ # Quiet the access log; keep stderr clean for probe output visibility.
184
+ return
185
+
186
+
187
+ if __name__ == "__main__":
188
+ port = int(os.environ.get("PORT", "7860"))
189
+ print(f"Serving probe output on 0.0.0.0:{port}", flush=True)
190
+ with socketserver.TCPServer(("0.0.0.0", port), _Handler) as srv:
191
+ srv.serve_forever()