Lzy01241010 commited on
Commit
97b3442
·
verified ·
1 Parent(s): 856a97c

Delete blog_demo

Browse files
blog_demo/README.md DELETED
@@ -1,80 +0,0 @@
1
- # Blog chat widget — Quest-4B
2
-
3
- Tiny proxy + one-page chat UI you can lift into a blog to let readers
4
- talk to the Quest-4B HF Inference Endpoint.
5
-
6
- ## Why a proxy
7
-
8
- - **Token safety.** `HF_TOKEN` can never ship in client-side JS. Anyone
9
- loading the page would grab it and run jobs on your HF account.
10
- - **CORS.** HF Inference Endpoints don't emit permissive CORS headers, so
11
- a browser `fetch` straight to the endpoint is blocked even if the
12
- token problem were solved.
13
-
14
- The proxy (server.py) holds the token server-side, validates incoming
15
- requests, and streams the model's reply back.
16
-
17
- ## Run locally
18
-
19
- ```bash
20
- cd blog_demo
21
- python3 -m venv .venv
22
- source .venv/bin/activate
23
- pip install -r requirements.txt
24
-
25
- export HF_TOKEN=hf_xxx
26
- export QUEST_BASE_URL=https://<your-endpoint>.endpoints.huggingface.cloud/v1/
27
- # optional:
28
- export QUEST_ENDPOINT_MODEL=tgi # "osunlp/Quest-4B" if the container is vLLM
29
- export ALLOWED_ORIGINS=http://127.0.0.1:8000
30
-
31
- python server.py
32
- ```
33
-
34
- Open <http://127.0.0.1:8000/>, ask a question, watch the reply stream in.
35
-
36
- Health check: <http://127.0.0.1:8000/health> returns whether the token
37
- and base URL are wired up without leaking them.
38
-
39
- ## What gets sent upstream
40
-
41
- ```
42
- POST {QUEST_BASE_URL}/chat/completions
43
- Headers: Authorization: Bearer $HF_TOKEN
44
- Body: {
45
- "model": "tgi",
46
- "messages": [...],
47
- "temperature": 0.4,
48
- "max_tokens": 1024,
49
- "stream": true
50
- }
51
- ```
52
-
53
- Any OpenAI-compatible endpoint (vLLM, TGI, SGLang, …) responds to this
54
- shape. The proxy pipes the upstream SSE frames straight to the browser;
55
- the page parses `choices[].delta.content` to render the streaming answer.
56
-
57
- ## Deploying on the blog
58
-
59
- Pick whichever backend is closest to where the blog is hosted:
60
-
61
- | Host | How |
62
- |---|---|
63
- | Next.js / Vercel | paste the `POST /api/chat` handler logic into `app/api/chat/route.ts` (use Node's `fetch` + `ReadableStream`), set `HF_TOKEN` and `QUEST_BASE_URL` in Vercel env vars |
64
- | Cloudflare Workers | port the proxy to a Worker, put `HF_API_TOKEN` in Worker Secrets, bind your blog domain as `ALLOWED_ORIGINS` |
65
- | FastAPI behind nginx | run `server.py` under `systemd` or `supervisor`, proxy `/api/chat` from the blog hostname |
66
- | Hugging Face Space (Docker) | drop the whole folder in a Docker Space, set `HF_TOKEN` and `QUEST_BASE_URL` as Space Secrets |
67
-
68
- ### Lock it down before going public
69
-
70
- 1. **Origin allowlist** — set `ALLOWED_ORIGINS=https://your-blog.com`
71
- so other sites can't call your proxy from a browser.
72
- 2. **Rate limit** — add an IP-based limit (e.g. `slowapi` for FastAPI,
73
- Cloudflare Rate Limiting for Workers). A single abusive visitor can
74
- drain your endpoint budget fast.
75
- 3. **Input caps** — the proxy already trims each message to 8000 chars
76
- and caps history at 40 turns; tune these for your use case.
77
- 4. **Fine-grained token** — create a new HF token with access only to
78
- the Quest endpoint so a leak can't touch anything else.
79
- 5. **Observability** — log request counts, latency, and 4xx/5xx rates
80
- so you notice abuse early.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
blog_demo/index.html DELETED
@@ -1,238 +0,0 @@
1
- <!doctype html>
2
- <html lang="en">
3
- <head>
4
- <meta charset="utf-8" />
5
- <meta name="viewport" content="width=device-width,initial-scale=1" />
6
- <title>Quest-4B chat demo</title>
7
- <style>
8
- :root {
9
- --bg: #f2f4f8;
10
- --paper: #ffffff;
11
- --text: #0d1117;
12
- --muted: #64748b;
13
- --accent: #be5b2b;
14
- --line: rgba(10, 15, 40, 0.1);
15
- }
16
- * { box-sizing: border-box; }
17
- body {
18
- margin: 0;
19
- background: var(--bg);
20
- color: var(--text);
21
- font: 15px/1.55 -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto,
22
- sans-serif;
23
- }
24
- .shell {
25
- max-width: 760px;
26
- margin: 32px auto;
27
- padding: 0 20px;
28
- }
29
- h1 {
30
- font-size: 1.6rem;
31
- margin: 0 0 4px;
32
- }
33
- .sub {
34
- color: var(--muted);
35
- margin: 0 0 20px;
36
- font-size: 0.92rem;
37
- }
38
- .sub code { background: #fff; padding: 1px 6px; border-radius: 6px; border: 1px solid var(--line); }
39
- .card {
40
- background: var(--paper);
41
- border: 1px solid var(--line);
42
- border-radius: 14px;
43
- box-shadow: 0 1px 2px rgba(10, 15, 40, 0.05),
44
- 0 2px 10px rgba(10, 15, 40, 0.06);
45
- padding: 20px;
46
- }
47
- #log {
48
- min-height: 220px;
49
- max-height: 60vh;
50
- overflow-y: auto;
51
- padding: 4px 6px;
52
- margin-bottom: 14px;
53
- }
54
- .msg { margin: 0 0 14px; }
55
- .msg .who {
56
- font-size: 0.75rem;
57
- font-weight: 700;
58
- letter-spacing: 0.06em;
59
- text-transform: uppercase;
60
- color: var(--muted);
61
- margin-bottom: 4px;
62
- }
63
- .msg.assistant .who { color: var(--accent); }
64
- .msg .body { white-space: pre-wrap; word-wrap: break-word; }
65
- form { display: flex; gap: 10px; align-items: stretch; }
66
- textarea {
67
- flex: 1;
68
- min-height: 46px;
69
- max-height: 160px;
70
- resize: vertical;
71
- padding: 12px 14px;
72
- border: 1px solid var(--line);
73
- border-radius: 12px;
74
- font: inherit;
75
- outline: none;
76
- }
77
- textarea:focus { border-color: var(--accent); box-shadow: 0 0 0 3px rgba(190,91,43,0.15); }
78
- button {
79
- background: var(--text);
80
- color: #fff;
81
- border: 0;
82
- border-radius: 999px;
83
- padding: 0 22px;
84
- font-weight: 600;
85
- cursor: pointer;
86
- }
87
- button:disabled { opacity: 0.55; cursor: default; }
88
- .status { color: var(--muted); font-size: 0.85rem; margin-top: 10px; }
89
- .status.err { color: #dc2626; }
90
- </style>
91
- </head>
92
- <body>
93
- <div class="shell">
94
- <h1>Quest-4B chat demo</h1>
95
- <p class="sub">
96
- Front-end calls <code>/api/chat</code> on this host; the Python proxy
97
- adds <code>Authorization: Bearer $HF_TOKEN</code> and forwards the
98
- request to <code>$QUEST_BASE_URL</code>.
99
- </p>
100
- <div class="card">
101
- <div id="log"></div>
102
- <form id="f">
103
- <textarea
104
- id="q"
105
- placeholder="Ask Quest-4B something... (Enter to send, Shift+Enter for newline)"
106
- autofocus
107
- ></textarea>
108
- <button id="send" type="submit">Send</button>
109
- </form>
110
- <div class="status" id="status"></div>
111
- </div>
112
- </div>
113
- <script>
114
- const log = document.getElementById("log");
115
- const form = document.getElementById("f");
116
- const input = document.getElementById("q");
117
- const send = document.getElementById("send");
118
- const status = document.getElementById("status");
119
- const history = [];
120
-
121
- function addMessage(role, text) {
122
- const el = document.createElement("div");
123
- el.className = "msg " + role;
124
- const who = document.createElement("div");
125
- who.className = "who";
126
- who.textContent = role === "user" ? "You" : "Quest-4B";
127
- const body = document.createElement("div");
128
- body.className = "body";
129
- body.textContent = text;
130
- el.appendChild(who);
131
- el.appendChild(body);
132
- log.appendChild(el);
133
- log.scrollTop = log.scrollHeight;
134
- return body;
135
- }
136
-
137
- function setStatus(text, isError = false) {
138
- status.textContent = text || "";
139
- status.classList.toggle("err", Boolean(isError));
140
- }
141
-
142
- async function send_message(content) {
143
- history.push({ role: "user", content });
144
- addMessage("user", content);
145
- const assistantBody = addMessage("assistant", "…");
146
- setStatus("Waiting for the endpoint…");
147
- send.disabled = true;
148
-
149
- try {
150
- const res = await fetch("/api/chat", {
151
- method: "POST",
152
- headers: { "Content-Type": "application/json" },
153
- body: JSON.stringify({ messages: history, temperature: 0.4 }),
154
- });
155
- if (!res.ok || !res.body) {
156
- const text = await res.text();
157
- assistantBody.textContent = "";
158
- throw new Error(text || res.statusText);
159
- }
160
-
161
- const reader = res.body.getReader();
162
- const decoder = new TextDecoder();
163
- let buffer = "";
164
- let acc = "";
165
- assistantBody.textContent = "";
166
-
167
- while (true) {
168
- const { value, done } = await reader.read();
169
- if (done) break;
170
- buffer += decoder.decode(value, { stream: true });
171
- const lines = buffer.split("\n");
172
- buffer = lines.pop() || "";
173
- for (const raw of lines) {
174
- const line = raw.trim();
175
- if (!line.startsWith("data:")) continue;
176
- const payload = line.slice(5).trim();
177
- if (!payload || payload === "[DONE]") continue;
178
- try {
179
- const obj = JSON.parse(payload);
180
- if (obj.error) {
181
- throw new Error(
182
- "endpoint " +
183
- (obj.error.status || "?") +
184
- ": " +
185
- (obj.error.body || "unknown")
186
- );
187
- }
188
- const delta = obj.choices?.[0]?.delta?.content || "";
189
- if (delta) {
190
- acc += delta;
191
- assistantBody.textContent = acc;
192
- log.scrollTop = log.scrollHeight;
193
- }
194
- } catch (parseErr) {
195
- if (parseErr.message?.startsWith("endpoint ")) throw parseErr;
196
- }
197
- }
198
- }
199
- history.push({ role: "assistant", content: acc });
200
- setStatus("");
201
- } catch (err) {
202
- assistantBody.textContent =
203
- assistantBody.textContent || "[no response]";
204
- setStatus(String(err.message || err), true);
205
- } finally {
206
- send.disabled = false;
207
- input.focus();
208
- }
209
- }
210
-
211
- form.addEventListener("submit", (e) => {
212
- e.preventDefault();
213
- const text = input.value.trim();
214
- if (!text) return;
215
- input.value = "";
216
- send_message(text);
217
- });
218
- input.addEventListener("keydown", (e) => {
219
- if (e.key === "Enter" && !e.shiftKey) {
220
- e.preventDefault();
221
- form.requestSubmit();
222
- }
223
- });
224
-
225
- fetch("/health")
226
- .then((r) => r.json())
227
- .then((j) => {
228
- if (!j.has_token || !j.has_base_url) {
229
- setStatus(
230
- "Server is running but HF_TOKEN / QUEST_BASE_URL are not set — chat will 500 until you export them.",
231
- true
232
- );
233
- }
234
- })
235
- .catch(() => setStatus("Cannot reach /health", true));
236
- </script>
237
- </body>
238
- </html>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
blog_demo/requirements.txt DELETED
@@ -1,3 +0,0 @@
1
- fastapi>=0.110
2
- uvicorn[standard]>=0.27
3
- httpx>=0.27
 
 
 
 
blog_demo/server.py DELETED
@@ -1,169 +0,0 @@
1
- """
2
- Minimal proxy server so a static blog page can safely chat with the
3
- Quest-4B HF Inference Endpoint.
4
-
5
- Why a proxy at all?
6
-
7
- 1. The browser cannot put HF_TOKEN into client-side JS -- the moment you
8
- ship it to visitors, the token is stolen and anyone can rack up a bill
9
- on your HF account.
10
- 2. HF Inference Endpoints do not emit permissive CORS headers, so even
11
- without the token concern, a browser `fetch` straight to the endpoint
12
- would be blocked.
13
-
14
- This tiny FastAPI app holds the token server-side, forwards chat turns
15
- to QUEST_BASE_URL, and streams the response back to the browser.
16
-
17
- Run locally:
18
-
19
- cd blog_demo
20
- pip install fastapi uvicorn httpx
21
- export HF_TOKEN=hf_xxx
22
- export QUEST_BASE_URL=https://<your-endpoint>.endpoints.huggingface.cloud/v1/
23
- python server.py
24
-
25
- Then open http://127.0.0.1:8000/ in a browser.
26
- """
27
- from __future__ import annotations
28
-
29
- import json
30
- import os
31
- from typing import Any, Dict, List
32
-
33
- import httpx
34
- from fastapi import FastAPI, HTTPException, Request
35
- from fastapi.middleware.cors import CORSMiddleware
36
- from fastapi.responses import FileResponse, StreamingResponse
37
-
38
- HF_TOKEN = os.environ.get("HF_TOKEN", "").strip()
39
- QUEST_BASE_URL = os.environ.get("QUEST_BASE_URL", "").strip().rstrip("/")
40
- QUEST_MODEL = os.environ.get("QUEST_ENDPOINT_MODEL", "tgi").strip() or "tgi"
41
-
42
- ALLOWED_ORIGINS = [
43
- o.strip()
44
- for o in os.environ.get(
45
- "ALLOWED_ORIGINS",
46
- "http://127.0.0.1:8000,http://localhost:8000",
47
- ).split(",")
48
- if o.strip()
49
- ]
50
-
51
- REQUEST_TIMEOUT = float(os.environ.get("REQUEST_TIMEOUT", "600"))
52
-
53
- app = FastAPI(title="Quest-4B blog proxy")
54
-
55
- app.add_middleware(
56
- CORSMiddleware,
57
- allow_origins=ALLOWED_ORIGINS,
58
- allow_methods=["GET", "POST", "OPTIONS"],
59
- allow_headers=["Content-Type"],
60
- allow_credentials=False,
61
- )
62
-
63
- STATIC_DIR = os.path.dirname(os.path.abspath(__file__))
64
-
65
-
66
- @app.get("/")
67
- def index() -> FileResponse:
68
- return FileResponse(os.path.join(STATIC_DIR, "index.html"))
69
-
70
-
71
- @app.get("/health")
72
- def health() -> Dict[str, Any]:
73
- return {
74
- "ok": True,
75
- "has_token": bool(HF_TOKEN),
76
- "has_base_url": bool(QUEST_BASE_URL),
77
- "model": QUEST_MODEL,
78
- }
79
-
80
-
81
- def _validate_config() -> None:
82
- if not HF_TOKEN:
83
- raise HTTPException(500, "HF_TOKEN is not set on the server")
84
- if not QUEST_BASE_URL:
85
- raise HTTPException(500, "QUEST_BASE_URL is not set on the server")
86
-
87
-
88
- def _sanitise_messages(raw: Any) -> List[Dict[str, str]]:
89
- if not isinstance(raw, list) or not raw:
90
- raise HTTPException(400, "`messages` must be a non-empty array")
91
- cleaned: List[Dict[str, str]] = []
92
- for m in raw:
93
- if not isinstance(m, dict):
94
- raise HTTPException(400, "each message must be an object")
95
- role = str(m.get("role", "")).strip()
96
- content = m.get("content", "")
97
- if role not in {"system", "user", "assistant"}:
98
- raise HTTPException(400, f"invalid role: {role!r}")
99
- if not isinstance(content, str):
100
- raise HTTPException(400, "message.content must be a string")
101
- cleaned.append({"role": role, "content": content[:8000]})
102
- if len(cleaned) > 40:
103
- cleaned = cleaned[-40:]
104
- return cleaned
105
-
106
-
107
- @app.post("/api/chat")
108
- async def chat(request: Request) -> StreamingResponse:
109
- _validate_config()
110
- try:
111
- body = await request.json()
112
- except Exception as exc:
113
- raise HTTPException(400, f"invalid json: {exc}") from exc
114
-
115
- messages = _sanitise_messages(body.get("messages"))
116
- temperature = float(body.get("temperature", 0.4))
117
- max_tokens = int(body.get("max_tokens", 1024))
118
- max_tokens = max(32, min(max_tokens, 4096))
119
-
120
- payload = {
121
- "model": QUEST_MODEL,
122
- "messages": messages,
123
- "temperature": max(0.0, min(temperature, 1.5)),
124
- "max_tokens": max_tokens,
125
- "stream": True,
126
- }
127
-
128
- upstream_url = f"{QUEST_BASE_URL}/chat/completions"
129
- headers = {
130
- "Authorization": f"Bearer {HF_TOKEN}",
131
- "Content-Type": "application/json",
132
- "Accept": "text/event-stream",
133
- }
134
-
135
- async def relay() -> Any:
136
- timeout = httpx.Timeout(REQUEST_TIMEOUT, connect=15.0)
137
- async with httpx.AsyncClient(timeout=timeout) as client:
138
- try:
139
- async with client.stream(
140
- "POST", upstream_url, json=payload, headers=headers
141
- ) as upstream:
142
- if upstream.status_code >= 400:
143
- text = (await upstream.aread()).decode("utf-8", errors="replace")
144
- err = json.dumps(
145
- {"error": {"status": upstream.status_code, "body": text[:800]}}
146
- )
147
- yield f"data: {err}\n\n".encode()
148
- yield b"data: [DONE]\n\n"
149
- return
150
- async for chunk in upstream.aiter_raw():
151
- if chunk:
152
- yield chunk
153
- except httpx.HTTPError as exc:
154
- err = json.dumps({"error": {"status": 502, "body": str(exc)}})
155
- yield f"data: {err}\n\n".encode()
156
- yield b"data: [DONE]\n\n"
157
-
158
- return StreamingResponse(relay(), media_type="text/event-stream")
159
-
160
-
161
- if __name__ == "__main__":
162
- import uvicorn
163
-
164
- uvicorn.run(
165
- "server:app",
166
- host=os.environ.get("HOST", "127.0.0.1"),
167
- port=int(os.environ.get("PORT", "8000")),
168
- reload=False,
169
- )