Delete blog_demo
Browse files- blog_demo/README.md +0 -80
- blog_demo/index.html +0 -238
- blog_demo/requirements.txt +0 -3
- blog_demo/server.py +0 -169
blog_demo/README.md
DELETED
|
@@ -1,80 +0,0 @@
|
|
| 1 |
-
# Blog chat widget — Quest-4B
|
| 2 |
-
|
| 3 |
-
Tiny proxy + one-page chat UI you can lift into a blog to let readers
|
| 4 |
-
talk to the Quest-4B HF Inference Endpoint.
|
| 5 |
-
|
| 6 |
-
## Why a proxy
|
| 7 |
-
|
| 8 |
-
- **Token safety.** `HF_TOKEN` can never ship in client-side JS. Anyone
|
| 9 |
-
loading the page would grab it and run jobs on your HF account.
|
| 10 |
-
- **CORS.** HF Inference Endpoints don't emit permissive CORS headers, so
|
| 11 |
-
a browser `fetch` straight to the endpoint is blocked even if the
|
| 12 |
-
token problem were solved.
|
| 13 |
-
|
| 14 |
-
The proxy (server.py) holds the token server-side, validates incoming
|
| 15 |
-
requests, and streams the model's reply back.
|
| 16 |
-
|
| 17 |
-
## Run locally
|
| 18 |
-
|
| 19 |
-
```bash
|
| 20 |
-
cd blog_demo
|
| 21 |
-
python3 -m venv .venv
|
| 22 |
-
source .venv/bin/activate
|
| 23 |
-
pip install -r requirements.txt
|
| 24 |
-
|
| 25 |
-
export HF_TOKEN=hf_xxx
|
| 26 |
-
export QUEST_BASE_URL=https://<your-endpoint>.endpoints.huggingface.cloud/v1/
|
| 27 |
-
# optional:
|
| 28 |
-
export QUEST_ENDPOINT_MODEL=tgi # "osunlp/Quest-4B" if the container is vLLM
|
| 29 |
-
export ALLOWED_ORIGINS=http://127.0.0.1:8000
|
| 30 |
-
|
| 31 |
-
python server.py
|
| 32 |
-
```
|
| 33 |
-
|
| 34 |
-
Open <http://127.0.0.1:8000/>, ask a question, watch the reply stream in.
|
| 35 |
-
|
| 36 |
-
Health check: <http://127.0.0.1:8000/health> returns whether the token
|
| 37 |
-
and base URL are wired up without leaking them.
|
| 38 |
-
|
| 39 |
-
## What gets sent upstream
|
| 40 |
-
|
| 41 |
-
```
|
| 42 |
-
POST {QUEST_BASE_URL}/chat/completions
|
| 43 |
-
Headers: Authorization: Bearer $HF_TOKEN
|
| 44 |
-
Body: {
|
| 45 |
-
"model": "tgi",
|
| 46 |
-
"messages": [...],
|
| 47 |
-
"temperature": 0.4,
|
| 48 |
-
"max_tokens": 1024,
|
| 49 |
-
"stream": true
|
| 50 |
-
}
|
| 51 |
-
```
|
| 52 |
-
|
| 53 |
-
Any OpenAI-compatible endpoint (vLLM, TGI, SGLang, …) responds to this
|
| 54 |
-
shape. The proxy pipes the upstream SSE frames straight to the browser;
|
| 55 |
-
the page parses `choices[].delta.content` to render the streaming answer.
|
| 56 |
-
|
| 57 |
-
## Deploying on the blog
|
| 58 |
-
|
| 59 |
-
Pick whichever backend is closest to where the blog is hosted:
|
| 60 |
-
|
| 61 |
-
| Host | How |
|
| 62 |
-
|---|---|
|
| 63 |
-
| Next.js / Vercel | paste the `POST /api/chat` handler logic into `app/api/chat/route.ts` (use Node's `fetch` + `ReadableStream`), set `HF_TOKEN` and `QUEST_BASE_URL` in Vercel env vars |
|
| 64 |
-
| Cloudflare Workers | port the proxy to a Worker, put `HF_API_TOKEN` in Worker Secrets, bind your blog domain as `ALLOWED_ORIGINS` |
|
| 65 |
-
| FastAPI behind nginx | run `server.py` under `systemd` or `supervisor`, proxy `/api/chat` from the blog hostname |
|
| 66 |
-
| Hugging Face Space (Docker) | drop the whole folder in a Docker Space, set `HF_TOKEN` and `QUEST_BASE_URL` as Space Secrets |
|
| 67 |
-
|
| 68 |
-
### Lock it down before going public
|
| 69 |
-
|
| 70 |
-
1. **Origin allowlist** — set `ALLOWED_ORIGINS=https://your-blog.com`
|
| 71 |
-
so other sites can't call your proxy from a browser.
|
| 72 |
-
2. **Rate limit** — add an IP-based limit (e.g. `slowapi` for FastAPI,
|
| 73 |
-
Cloudflare Rate Limiting for Workers). A single abusive visitor can
|
| 74 |
-
drain your endpoint budget fast.
|
| 75 |
-
3. **Input caps** — the proxy already trims each message to 8000 chars
|
| 76 |
-
and caps history at 40 turns; tune these for your use case.
|
| 77 |
-
4. **Fine-grained token** — create a new HF token with access only to
|
| 78 |
-
the Quest endpoint so a leak can't touch anything else.
|
| 79 |
-
5. **Observability** — log request counts, latency, and 4xx/5xx rates
|
| 80 |
-
so you notice abuse early.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
blog_demo/index.html
DELETED
|
@@ -1,238 +0,0 @@
|
|
| 1 |
-
<!doctype html>
|
| 2 |
-
<html lang="en">
|
| 3 |
-
<head>
|
| 4 |
-
<meta charset="utf-8" />
|
| 5 |
-
<meta name="viewport" content="width=device-width,initial-scale=1" />
|
| 6 |
-
<title>Quest-4B chat demo</title>
|
| 7 |
-
<style>
|
| 8 |
-
:root {
|
| 9 |
-
--bg: #f2f4f8;
|
| 10 |
-
--paper: #ffffff;
|
| 11 |
-
--text: #0d1117;
|
| 12 |
-
--muted: #64748b;
|
| 13 |
-
--accent: #be5b2b;
|
| 14 |
-
--line: rgba(10, 15, 40, 0.1);
|
| 15 |
-
}
|
| 16 |
-
* { box-sizing: border-box; }
|
| 17 |
-
body {
|
| 18 |
-
margin: 0;
|
| 19 |
-
background: var(--bg);
|
| 20 |
-
color: var(--text);
|
| 21 |
-
font: 15px/1.55 -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto,
|
| 22 |
-
sans-serif;
|
| 23 |
-
}
|
| 24 |
-
.shell {
|
| 25 |
-
max-width: 760px;
|
| 26 |
-
margin: 32px auto;
|
| 27 |
-
padding: 0 20px;
|
| 28 |
-
}
|
| 29 |
-
h1 {
|
| 30 |
-
font-size: 1.6rem;
|
| 31 |
-
margin: 0 0 4px;
|
| 32 |
-
}
|
| 33 |
-
.sub {
|
| 34 |
-
color: var(--muted);
|
| 35 |
-
margin: 0 0 20px;
|
| 36 |
-
font-size: 0.92rem;
|
| 37 |
-
}
|
| 38 |
-
.sub code { background: #fff; padding: 1px 6px; border-radius: 6px; border: 1px solid var(--line); }
|
| 39 |
-
.card {
|
| 40 |
-
background: var(--paper);
|
| 41 |
-
border: 1px solid var(--line);
|
| 42 |
-
border-radius: 14px;
|
| 43 |
-
box-shadow: 0 1px 2px rgba(10, 15, 40, 0.05),
|
| 44 |
-
0 2px 10px rgba(10, 15, 40, 0.06);
|
| 45 |
-
padding: 20px;
|
| 46 |
-
}
|
| 47 |
-
#log {
|
| 48 |
-
min-height: 220px;
|
| 49 |
-
max-height: 60vh;
|
| 50 |
-
overflow-y: auto;
|
| 51 |
-
padding: 4px 6px;
|
| 52 |
-
margin-bottom: 14px;
|
| 53 |
-
}
|
| 54 |
-
.msg { margin: 0 0 14px; }
|
| 55 |
-
.msg .who {
|
| 56 |
-
font-size: 0.75rem;
|
| 57 |
-
font-weight: 700;
|
| 58 |
-
letter-spacing: 0.06em;
|
| 59 |
-
text-transform: uppercase;
|
| 60 |
-
color: var(--muted);
|
| 61 |
-
margin-bottom: 4px;
|
| 62 |
-
}
|
| 63 |
-
.msg.assistant .who { color: var(--accent); }
|
| 64 |
-
.msg .body { white-space: pre-wrap; word-wrap: break-word; }
|
| 65 |
-
form { display: flex; gap: 10px; align-items: stretch; }
|
| 66 |
-
textarea {
|
| 67 |
-
flex: 1;
|
| 68 |
-
min-height: 46px;
|
| 69 |
-
max-height: 160px;
|
| 70 |
-
resize: vertical;
|
| 71 |
-
padding: 12px 14px;
|
| 72 |
-
border: 1px solid var(--line);
|
| 73 |
-
border-radius: 12px;
|
| 74 |
-
font: inherit;
|
| 75 |
-
outline: none;
|
| 76 |
-
}
|
| 77 |
-
textarea:focus { border-color: var(--accent); box-shadow: 0 0 0 3px rgba(190,91,43,0.15); }
|
| 78 |
-
button {
|
| 79 |
-
background: var(--text);
|
| 80 |
-
color: #fff;
|
| 81 |
-
border: 0;
|
| 82 |
-
border-radius: 999px;
|
| 83 |
-
padding: 0 22px;
|
| 84 |
-
font-weight: 600;
|
| 85 |
-
cursor: pointer;
|
| 86 |
-
}
|
| 87 |
-
button:disabled { opacity: 0.55; cursor: default; }
|
| 88 |
-
.status { color: var(--muted); font-size: 0.85rem; margin-top: 10px; }
|
| 89 |
-
.status.err { color: #dc2626; }
|
| 90 |
-
</style>
|
| 91 |
-
</head>
|
| 92 |
-
<body>
|
| 93 |
-
<div class="shell">
|
| 94 |
-
<h1>Quest-4B chat demo</h1>
|
| 95 |
-
<p class="sub">
|
| 96 |
-
Front-end calls <code>/api/chat</code> on this host; the Python proxy
|
| 97 |
-
adds <code>Authorization: Bearer $HF_TOKEN</code> and forwards the
|
| 98 |
-
request to <code>$QUEST_BASE_URL</code>.
|
| 99 |
-
</p>
|
| 100 |
-
<div class="card">
|
| 101 |
-
<div id="log"></div>
|
| 102 |
-
<form id="f">
|
| 103 |
-
<textarea
|
| 104 |
-
id="q"
|
| 105 |
-
placeholder="Ask Quest-4B something... (Enter to send, Shift+Enter for newline)"
|
| 106 |
-
autofocus
|
| 107 |
-
></textarea>
|
| 108 |
-
<button id="send" type="submit">Send</button>
|
| 109 |
-
</form>
|
| 110 |
-
<div class="status" id="status"></div>
|
| 111 |
-
</div>
|
| 112 |
-
</div>
|
| 113 |
-
<script>
|
| 114 |
-
const log = document.getElementById("log");
|
| 115 |
-
const form = document.getElementById("f");
|
| 116 |
-
const input = document.getElementById("q");
|
| 117 |
-
const send = document.getElementById("send");
|
| 118 |
-
const status = document.getElementById("status");
|
| 119 |
-
const history = [];
|
| 120 |
-
|
| 121 |
-
function addMessage(role, text) {
|
| 122 |
-
const el = document.createElement("div");
|
| 123 |
-
el.className = "msg " + role;
|
| 124 |
-
const who = document.createElement("div");
|
| 125 |
-
who.className = "who";
|
| 126 |
-
who.textContent = role === "user" ? "You" : "Quest-4B";
|
| 127 |
-
const body = document.createElement("div");
|
| 128 |
-
body.className = "body";
|
| 129 |
-
body.textContent = text;
|
| 130 |
-
el.appendChild(who);
|
| 131 |
-
el.appendChild(body);
|
| 132 |
-
log.appendChild(el);
|
| 133 |
-
log.scrollTop = log.scrollHeight;
|
| 134 |
-
return body;
|
| 135 |
-
}
|
| 136 |
-
|
| 137 |
-
function setStatus(text, isError = false) {
|
| 138 |
-
status.textContent = text || "";
|
| 139 |
-
status.classList.toggle("err", Boolean(isError));
|
| 140 |
-
}
|
| 141 |
-
|
| 142 |
-
async function send_message(content) {
|
| 143 |
-
history.push({ role: "user", content });
|
| 144 |
-
addMessage("user", content);
|
| 145 |
-
const assistantBody = addMessage("assistant", "…");
|
| 146 |
-
setStatus("Waiting for the endpoint…");
|
| 147 |
-
send.disabled = true;
|
| 148 |
-
|
| 149 |
-
try {
|
| 150 |
-
const res = await fetch("/api/chat", {
|
| 151 |
-
method: "POST",
|
| 152 |
-
headers: { "Content-Type": "application/json" },
|
| 153 |
-
body: JSON.stringify({ messages: history, temperature: 0.4 }),
|
| 154 |
-
});
|
| 155 |
-
if (!res.ok || !res.body) {
|
| 156 |
-
const text = await res.text();
|
| 157 |
-
assistantBody.textContent = "";
|
| 158 |
-
throw new Error(text || res.statusText);
|
| 159 |
-
}
|
| 160 |
-
|
| 161 |
-
const reader = res.body.getReader();
|
| 162 |
-
const decoder = new TextDecoder();
|
| 163 |
-
let buffer = "";
|
| 164 |
-
let acc = "";
|
| 165 |
-
assistantBody.textContent = "";
|
| 166 |
-
|
| 167 |
-
while (true) {
|
| 168 |
-
const { value, done } = await reader.read();
|
| 169 |
-
if (done) break;
|
| 170 |
-
buffer += decoder.decode(value, { stream: true });
|
| 171 |
-
const lines = buffer.split("\n");
|
| 172 |
-
buffer = lines.pop() || "";
|
| 173 |
-
for (const raw of lines) {
|
| 174 |
-
const line = raw.trim();
|
| 175 |
-
if (!line.startsWith("data:")) continue;
|
| 176 |
-
const payload = line.slice(5).trim();
|
| 177 |
-
if (!payload || payload === "[DONE]") continue;
|
| 178 |
-
try {
|
| 179 |
-
const obj = JSON.parse(payload);
|
| 180 |
-
if (obj.error) {
|
| 181 |
-
throw new Error(
|
| 182 |
-
"endpoint " +
|
| 183 |
-
(obj.error.status || "?") +
|
| 184 |
-
": " +
|
| 185 |
-
(obj.error.body || "unknown")
|
| 186 |
-
);
|
| 187 |
-
}
|
| 188 |
-
const delta = obj.choices?.[0]?.delta?.content || "";
|
| 189 |
-
if (delta) {
|
| 190 |
-
acc += delta;
|
| 191 |
-
assistantBody.textContent = acc;
|
| 192 |
-
log.scrollTop = log.scrollHeight;
|
| 193 |
-
}
|
| 194 |
-
} catch (parseErr) {
|
| 195 |
-
if (parseErr.message?.startsWith("endpoint ")) throw parseErr;
|
| 196 |
-
}
|
| 197 |
-
}
|
| 198 |
-
}
|
| 199 |
-
history.push({ role: "assistant", content: acc });
|
| 200 |
-
setStatus("");
|
| 201 |
-
} catch (err) {
|
| 202 |
-
assistantBody.textContent =
|
| 203 |
-
assistantBody.textContent || "[no response]";
|
| 204 |
-
setStatus(String(err.message || err), true);
|
| 205 |
-
} finally {
|
| 206 |
-
send.disabled = false;
|
| 207 |
-
input.focus();
|
| 208 |
-
}
|
| 209 |
-
}
|
| 210 |
-
|
| 211 |
-
form.addEventListener("submit", (e) => {
|
| 212 |
-
e.preventDefault();
|
| 213 |
-
const text = input.value.trim();
|
| 214 |
-
if (!text) return;
|
| 215 |
-
input.value = "";
|
| 216 |
-
send_message(text);
|
| 217 |
-
});
|
| 218 |
-
input.addEventListener("keydown", (e) => {
|
| 219 |
-
if (e.key === "Enter" && !e.shiftKey) {
|
| 220 |
-
e.preventDefault();
|
| 221 |
-
form.requestSubmit();
|
| 222 |
-
}
|
| 223 |
-
});
|
| 224 |
-
|
| 225 |
-
fetch("/health")
|
| 226 |
-
.then((r) => r.json())
|
| 227 |
-
.then((j) => {
|
| 228 |
-
if (!j.has_token || !j.has_base_url) {
|
| 229 |
-
setStatus(
|
| 230 |
-
"Server is running but HF_TOKEN / QUEST_BASE_URL are not set — chat will 500 until you export them.",
|
| 231 |
-
true
|
| 232 |
-
);
|
| 233 |
-
}
|
| 234 |
-
})
|
| 235 |
-
.catch(() => setStatus("Cannot reach /health", true));
|
| 236 |
-
</script>
|
| 237 |
-
</body>
|
| 238 |
-
</html>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
blog_demo/requirements.txt
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
fastapi>=0.110
|
| 2 |
-
uvicorn[standard]>=0.27
|
| 3 |
-
httpx>=0.27
|
|
|
|
|
|
|
|
|
|
|
|
blog_demo/server.py
DELETED
|
@@ -1,169 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Minimal proxy server so a static blog page can safely chat with the
|
| 3 |
-
Quest-4B HF Inference Endpoint.
|
| 4 |
-
|
| 5 |
-
Why a proxy at all?
|
| 6 |
-
|
| 7 |
-
1. The browser cannot put HF_TOKEN into client-side JS -- the moment you
|
| 8 |
-
ship it to visitors, the token is stolen and anyone can rack up a bill
|
| 9 |
-
on your HF account.
|
| 10 |
-
2. HF Inference Endpoints do not emit permissive CORS headers, so even
|
| 11 |
-
without the token concern, a browser `fetch` straight to the endpoint
|
| 12 |
-
would be blocked.
|
| 13 |
-
|
| 14 |
-
This tiny FastAPI app holds the token server-side, forwards chat turns
|
| 15 |
-
to QUEST_BASE_URL, and streams the response back to the browser.
|
| 16 |
-
|
| 17 |
-
Run locally:
|
| 18 |
-
|
| 19 |
-
cd blog_demo
|
| 20 |
-
pip install fastapi uvicorn httpx
|
| 21 |
-
export HF_TOKEN=hf_xxx
|
| 22 |
-
export QUEST_BASE_URL=https://<your-endpoint>.endpoints.huggingface.cloud/v1/
|
| 23 |
-
python server.py
|
| 24 |
-
|
| 25 |
-
Then open http://127.0.0.1:8000/ in a browser.
|
| 26 |
-
"""
|
| 27 |
-
from __future__ import annotations
|
| 28 |
-
|
| 29 |
-
import json
|
| 30 |
-
import os
|
| 31 |
-
from typing import Any, Dict, List
|
| 32 |
-
|
| 33 |
-
import httpx
|
| 34 |
-
from fastapi import FastAPI, HTTPException, Request
|
| 35 |
-
from fastapi.middleware.cors import CORSMiddleware
|
| 36 |
-
from fastapi.responses import FileResponse, StreamingResponse
|
| 37 |
-
|
| 38 |
-
HF_TOKEN = os.environ.get("HF_TOKEN", "").strip()
|
| 39 |
-
QUEST_BASE_URL = os.environ.get("QUEST_BASE_URL", "").strip().rstrip("/")
|
| 40 |
-
QUEST_MODEL = os.environ.get("QUEST_ENDPOINT_MODEL", "tgi").strip() or "tgi"
|
| 41 |
-
|
| 42 |
-
ALLOWED_ORIGINS = [
|
| 43 |
-
o.strip()
|
| 44 |
-
for o in os.environ.get(
|
| 45 |
-
"ALLOWED_ORIGINS",
|
| 46 |
-
"http://127.0.0.1:8000,http://localhost:8000",
|
| 47 |
-
).split(",")
|
| 48 |
-
if o.strip()
|
| 49 |
-
]
|
| 50 |
-
|
| 51 |
-
REQUEST_TIMEOUT = float(os.environ.get("REQUEST_TIMEOUT", "600"))
|
| 52 |
-
|
| 53 |
-
app = FastAPI(title="Quest-4B blog proxy")
|
| 54 |
-
|
| 55 |
-
app.add_middleware(
|
| 56 |
-
CORSMiddleware,
|
| 57 |
-
allow_origins=ALLOWED_ORIGINS,
|
| 58 |
-
allow_methods=["GET", "POST", "OPTIONS"],
|
| 59 |
-
allow_headers=["Content-Type"],
|
| 60 |
-
allow_credentials=False,
|
| 61 |
-
)
|
| 62 |
-
|
| 63 |
-
STATIC_DIR = os.path.dirname(os.path.abspath(__file__))
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
@app.get("/")
|
| 67 |
-
def index() -> FileResponse:
|
| 68 |
-
return FileResponse(os.path.join(STATIC_DIR, "index.html"))
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
@app.get("/health")
|
| 72 |
-
def health() -> Dict[str, Any]:
|
| 73 |
-
return {
|
| 74 |
-
"ok": True,
|
| 75 |
-
"has_token": bool(HF_TOKEN),
|
| 76 |
-
"has_base_url": bool(QUEST_BASE_URL),
|
| 77 |
-
"model": QUEST_MODEL,
|
| 78 |
-
}
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
def _validate_config() -> None:
|
| 82 |
-
if not HF_TOKEN:
|
| 83 |
-
raise HTTPException(500, "HF_TOKEN is not set on the server")
|
| 84 |
-
if not QUEST_BASE_URL:
|
| 85 |
-
raise HTTPException(500, "QUEST_BASE_URL is not set on the server")
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
def _sanitise_messages(raw: Any) -> List[Dict[str, str]]:
|
| 89 |
-
if not isinstance(raw, list) or not raw:
|
| 90 |
-
raise HTTPException(400, "`messages` must be a non-empty array")
|
| 91 |
-
cleaned: List[Dict[str, str]] = []
|
| 92 |
-
for m in raw:
|
| 93 |
-
if not isinstance(m, dict):
|
| 94 |
-
raise HTTPException(400, "each message must be an object")
|
| 95 |
-
role = str(m.get("role", "")).strip()
|
| 96 |
-
content = m.get("content", "")
|
| 97 |
-
if role not in {"system", "user", "assistant"}:
|
| 98 |
-
raise HTTPException(400, f"invalid role: {role!r}")
|
| 99 |
-
if not isinstance(content, str):
|
| 100 |
-
raise HTTPException(400, "message.content must be a string")
|
| 101 |
-
cleaned.append({"role": role, "content": content[:8000]})
|
| 102 |
-
if len(cleaned) > 40:
|
| 103 |
-
cleaned = cleaned[-40:]
|
| 104 |
-
return cleaned
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
@app.post("/api/chat")
|
| 108 |
-
async def chat(request: Request) -> StreamingResponse:
|
| 109 |
-
_validate_config()
|
| 110 |
-
try:
|
| 111 |
-
body = await request.json()
|
| 112 |
-
except Exception as exc:
|
| 113 |
-
raise HTTPException(400, f"invalid json: {exc}") from exc
|
| 114 |
-
|
| 115 |
-
messages = _sanitise_messages(body.get("messages"))
|
| 116 |
-
temperature = float(body.get("temperature", 0.4))
|
| 117 |
-
max_tokens = int(body.get("max_tokens", 1024))
|
| 118 |
-
max_tokens = max(32, min(max_tokens, 4096))
|
| 119 |
-
|
| 120 |
-
payload = {
|
| 121 |
-
"model": QUEST_MODEL,
|
| 122 |
-
"messages": messages,
|
| 123 |
-
"temperature": max(0.0, min(temperature, 1.5)),
|
| 124 |
-
"max_tokens": max_tokens,
|
| 125 |
-
"stream": True,
|
| 126 |
-
}
|
| 127 |
-
|
| 128 |
-
upstream_url = f"{QUEST_BASE_URL}/chat/completions"
|
| 129 |
-
headers = {
|
| 130 |
-
"Authorization": f"Bearer {HF_TOKEN}",
|
| 131 |
-
"Content-Type": "application/json",
|
| 132 |
-
"Accept": "text/event-stream",
|
| 133 |
-
}
|
| 134 |
-
|
| 135 |
-
async def relay() -> Any:
|
| 136 |
-
timeout = httpx.Timeout(REQUEST_TIMEOUT, connect=15.0)
|
| 137 |
-
async with httpx.AsyncClient(timeout=timeout) as client:
|
| 138 |
-
try:
|
| 139 |
-
async with client.stream(
|
| 140 |
-
"POST", upstream_url, json=payload, headers=headers
|
| 141 |
-
) as upstream:
|
| 142 |
-
if upstream.status_code >= 400:
|
| 143 |
-
text = (await upstream.aread()).decode("utf-8", errors="replace")
|
| 144 |
-
err = json.dumps(
|
| 145 |
-
{"error": {"status": upstream.status_code, "body": text[:800]}}
|
| 146 |
-
)
|
| 147 |
-
yield f"data: {err}\n\n".encode()
|
| 148 |
-
yield b"data: [DONE]\n\n"
|
| 149 |
-
return
|
| 150 |
-
async for chunk in upstream.aiter_raw():
|
| 151 |
-
if chunk:
|
| 152 |
-
yield chunk
|
| 153 |
-
except httpx.HTTPError as exc:
|
| 154 |
-
err = json.dumps({"error": {"status": 502, "body": str(exc)}})
|
| 155 |
-
yield f"data: {err}\n\n".encode()
|
| 156 |
-
yield b"data: [DONE]\n\n"
|
| 157 |
-
|
| 158 |
-
return StreamingResponse(relay(), media_type="text/event-stream")
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
if __name__ == "__main__":
|
| 162 |
-
import uvicorn
|
| 163 |
-
|
| 164 |
-
uvicorn.run(
|
| 165 |
-
"server:app",
|
| 166 |
-
host=os.environ.get("HOST", "127.0.0.1"),
|
| 167 |
-
port=int(os.environ.get("PORT", "8000")),
|
| 168 |
-
reload=False,
|
| 169 |
-
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|