Spaces:

osunlp
/

QUEST

Running

App Files Files Community

Lzy01241010 commited on 10 days ago

Commit

97b3442

verified ·

1 Parent(s): 856a97c

Delete blog_demo

Browse files

Files changed (4) hide show

blog_demo/README.md +0 -80
blog_demo/index.html +0 -238
blog_demo/requirements.txt +0 -3
blog_demo/server.py +0 -169

blog_demo/README.md DELETED Viewed

@@ -1,80 +0,0 @@
-# Blog chat widget — Quest-4B
-Tiny proxy + one-page chat UI you can lift into a blog to let readers
-talk to the Quest-4B HF Inference Endpoint.
-## Why a proxy
-- **Token safety.** `HF_TOKEN` can never ship in client-side JS. Anyone
-  loading the page would grab it and run jobs on your HF account.
-- **CORS.** HF Inference Endpoints don't emit permissive CORS headers, so
-  a browser `fetch` straight to the endpoint is blocked even if the
-  token problem were solved.
-The proxy (server.py) holds the token server-side, validates incoming
-requests, and streams the model's reply back.
-## Run locally
-```bash
-cd blog_demo
-python3 -m venv .venv
-source .venv/bin/activate
-pip install -r requirements.txt
-export HF_TOKEN=hf_xxx
-export QUEST_BASE_URL=https://<your-endpoint>.endpoints.huggingface.cloud/v1/
-# optional:
-export QUEST_ENDPOINT_MODEL=tgi           # "osunlp/Quest-4B" if the container is vLLM
-export ALLOWED_ORIGINS=http://127.0.0.1:8000
-python server.py
-```
-Open <http://127.0.0.1:8000/>, ask a question, watch the reply stream in.
-Health check: <http://127.0.0.1:8000/health> returns whether the token
-and base URL are wired up without leaking them.
-## What gets sent upstream
-```
-POST  {QUEST_BASE_URL}/chat/completions
-Headers: Authorization: Bearer $HF_TOKEN
-Body: {
-  "model": "tgi",
-  "messages": [...],
-  "temperature": 0.4,
-  "max_tokens": 1024,
-  "stream": true
-}
-```
-Any OpenAI-compatible endpoint (vLLM, TGI, SGLang, …) responds to this
-shape. The proxy pipes the upstream SSE frames straight to the browser;
-the page parses `choices[].delta.content` to render the streaming answer.
-## Deploying on the blog
-Pick whichever backend is closest to where the blog is hosted:
-| Host | How |
-|---|---|
-| Next.js / Vercel | paste the `POST /api/chat` handler logic into `app/api/chat/route.ts` (use Node's `fetch` + `ReadableStream`), set `HF_TOKEN` and `QUEST_BASE_URL` in Vercel env vars |
-| Cloudflare Workers | port the proxy to a Worker, put `HF_API_TOKEN` in Worker Secrets, bind your blog domain as `ALLOWED_ORIGINS` |
-| FastAPI behind nginx | run `server.py` under `systemd` or `supervisor`, proxy `/api/chat` from the blog hostname |
-| Hugging Face Space (Docker) | drop the whole folder in a Docker Space, set `HF_TOKEN` and `QUEST_BASE_URL` as Space Secrets |
-### Lock it down before going public
-1. **Origin allowlist** — set `ALLOWED_ORIGINS=https://your-blog.com`
-   so other sites can't call your proxy from a browser.
-2. **Rate limit** — add an IP-based limit (e.g. `slowapi` for FastAPI,
-   Cloudflare Rate Limiting for Workers). A single abusive visitor can
-   drain your endpoint budget fast.
-3. **Input caps** — the proxy already trims each message to 8000 chars
-   and caps history at 40 turns; tune these for your use case.
-4. **Fine-grained token** — create a new HF token with access only to
-   the Quest endpoint so a leak can't touch anything else.
-5. **Observability** — log request counts, latency, and 4xx/5xx rates
-   so you notice abuse early.

blog_demo/index.html DELETED Viewed

@@ -1,238 +0,0 @@
-<!doctype html>
-<html lang="en">
-  <head>
-    <meta charset="utf-8" />
-    <meta name="viewport" content="width=device-width,initial-scale=1" />
-    <title>Quest-4B chat demo</title>
-    <style>
-      :root {
-        --bg: #f2f4f8;
-        --paper: #ffffff;
-        --text: #0d1117;
-        --muted: #64748b;
-        --accent: #be5b2b;
-        --line: rgba(10, 15, 40, 0.1);
-      }
-      * { box-sizing: border-box; }
-      body {
-        margin: 0;
-        background: var(--bg);
-        color: var(--text);
-        font: 15px/1.55 -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto,
-          sans-serif;
-      }
-      .shell {
-        max-width: 760px;
-        margin: 32px auto;
-        padding: 0 20px;
-      }
-      h1 {
-        font-size: 1.6rem;
-        margin: 0 0 4px;
-      }
-      .sub {
-        color: var(--muted);
-        margin: 0 0 20px;
-        font-size: 0.92rem;
-      }
-      .sub code { background: #fff; padding: 1px 6px; border-radius: 6px; border: 1px solid var(--line); }
-      .card {
-        background: var(--paper);
-        border: 1px solid var(--line);
-        border-radius: 14px;
-        box-shadow: 0 1px 2px rgba(10, 15, 40, 0.05),
-          0 2px 10px rgba(10, 15, 40, 0.06);
-        padding: 20px;
-      }
-      #log {
-        min-height: 220px;
-        max-height: 60vh;
-        overflow-y: auto;
-        padding: 4px 6px;
-        margin-bottom: 14px;
-      }
-      .msg { margin: 0 0 14px; }
-      .msg .who {
-        font-size: 0.75rem;
-        font-weight: 700;
-        letter-spacing: 0.06em;
-        text-transform: uppercase;
-        color: var(--muted);
-        margin-bottom: 4px;
-      }
-      .msg.assistant .who { color: var(--accent); }
-      .msg .body { white-space: pre-wrap; word-wrap: break-word; }
-      form { display: flex; gap: 10px; align-items: stretch; }
-      textarea {
-        flex: 1;
-        min-height: 46px;
-        max-height: 160px;
-        resize: vertical;
-        padding: 12px 14px;
-        border: 1px solid var(--line);
-        border-radius: 12px;
-        font: inherit;
-        outline: none;
-      }
-      textarea:focus { border-color: var(--accent); box-shadow: 0 0 0 3px rgba(190,91,43,0.15); }
-      button {
-        background: var(--text);
-        color: #fff;
-        border: 0;
-        border-radius: 999px;
-        padding: 0 22px;
-        font-weight: 600;
-        cursor: pointer;
-      }
-      button:disabled { opacity: 0.55; cursor: default; }
-      .status { color: var(--muted); font-size: 0.85rem; margin-top: 10px; }
-      .status.err { color: #dc2626; }
-    </style>
-  </head>
-  <body>
-    <div class="shell">
-      <h1>Quest-4B chat demo</h1>
-      <p class="sub">
-        Front-end calls <code>/api/chat</code> on this host; the Python proxy
-        adds <code>Authorization: Bearer $HF_TOKEN</code> and forwards the
-        request to <code>$QUEST_BASE_URL</code>.
-      </p>
-      <div class="card">
-        <div id="log"></div>
-        <form id="f">
-          <textarea
-            id="q"
-            placeholder="Ask Quest-4B something... (Enter to send, Shift+Enter for newline)"
-            autofocus
-          ></textarea>
-          <button id="send" type="submit">Send</button>
-        </form>
-        <div class="status" id="status"></div>
-      </div>
-    </div>
-    <script>
-      const log = document.getElementById("log");
-      const form = document.getElementById("f");
-      const input = document.getElementById("q");
-      const send = document.getElementById("send");
-      const status = document.getElementById("status");
-      const history = [];
-      function addMessage(role, text) {
-        const el = document.createElement("div");
-        el.className = "msg " + role;
-        const who = document.createElement("div");
-        who.className = "who";
-        who.textContent = role === "user" ? "You" : "Quest-4B";
-        const body = document.createElement("div");
-        body.className = "body";
-        body.textContent = text;
-        el.appendChild(who);
-        el.appendChild(body);
-        log.appendChild(el);
-        log.scrollTop = log.scrollHeight;
-        return body;
-      }
-      function setStatus(text, isError = false) {
-        status.textContent = text || "";
-        status.classList.toggle("err", Boolean(isError));
-      }
-      async function send_message(content) {
-        history.push({ role: "user", content });
-        addMessage("user", content);
-        const assistantBody = addMessage("assistant", "…");
-        setStatus("Waiting for the endpoint…");
-        send.disabled = true;
-        try {
-          const res = await fetch("/api/chat", {
-            method: "POST",
-            headers: { "Content-Type": "application/json" },
-            body: JSON.stringify({ messages: history, temperature: 0.4 }),
-          });
-          if (!res.ok || !res.body) {
-            const text = await res.text();
-            assistantBody.textContent = "";
-            throw new Error(text || res.statusText);
-          }
-          const reader = res.body.getReader();
-          const decoder = new TextDecoder();
-          let buffer = "";
-          let acc = "";
-          assistantBody.textContent = "";
-          while (true) {
-            const { value, done } = await reader.read();
-            if (done) break;
-            buffer += decoder.decode(value, { stream: true });
-            const lines = buffer.split("\n");
-            buffer = lines.pop() || "";
-            for (const raw of lines) {
-              const line = raw.trim();
-              if (!line.startsWith("data:")) continue;
-              const payload = line.slice(5).trim();
-              if (!payload || payload === "[DONE]") continue;
-              try {
-                const obj = JSON.parse(payload);
-                if (obj.error) {
-                  throw new Error(
-                    "endpoint " +
-                      (obj.error.status || "?") +
-                      ": " +
-                      (obj.error.body || "unknown")
-                  );
-                }
-                const delta = obj.choices?.[0]?.delta?.content || "";
-                if (delta) {
-                  acc += delta;
-                  assistantBody.textContent = acc;
-                  log.scrollTop = log.scrollHeight;
-                }
-              } catch (parseErr) {
-                if (parseErr.message?.startsWith("endpoint ")) throw parseErr;
-              }
-            }
-          }
-          history.push({ role: "assistant", content: acc });
-          setStatus("");
-        } catch (err) {
-          assistantBody.textContent =
-            assistantBody.textContent || "[no response]";
-          setStatus(String(err.message || err), true);
-        } finally {
-          send.disabled = false;
-          input.focus();
-        }
-      }
-      form.addEventListener("submit", (e) => {
-        e.preventDefault();
-        const text = input.value.trim();
-        if (!text) return;
-        input.value = "";
-        send_message(text);
-      });
-      input.addEventListener("keydown", (e) => {
-        if (e.key === "Enter" && !e.shiftKey) {
-          e.preventDefault();
-          form.requestSubmit();
-        }
-      });
-      fetch("/health")
-        .then((r) => r.json())
-        .then((j) => {
-          if (!j.has_token || !j.has_base_url) {
-            setStatus(
-              "Server is running but HF_TOKEN / QUEST_BASE_URL are not set — chat will 500 until you export them.",
-              true
-            );
-          }
-        })
-        .catch(() => setStatus("Cannot reach /health", true));
-    </script>
-  </body>
-</html>

blog_demo/requirements.txt DELETED Viewed

@@ -1,3 +0,0 @@
-fastapi>=0.110
-uvicorn[standard]>=0.27
-httpx>=0.27

blog_demo/server.py DELETED Viewed

@@ -1,169 +0,0 @@
-"""
-Minimal proxy server so a static blog page can safely chat with the
-Quest-4B HF Inference Endpoint.
-Why a proxy at all?
-1. The browser cannot put HF_TOKEN into client-side JS -- the moment you
-   ship it to visitors, the token is stolen and anyone can rack up a bill
-   on your HF account.
-2. HF Inference Endpoints do not emit permissive CORS headers, so even
-   without the token concern, a browser `fetch` straight to the endpoint
-   would be blocked.
-This tiny FastAPI app holds the token server-side, forwards chat turns
-to QUEST_BASE_URL, and streams the response back to the browser.
-Run locally:
-    cd blog_demo
-    pip install fastapi uvicorn httpx
-    export HF_TOKEN=hf_xxx
-    export QUEST_BASE_URL=https://<your-endpoint>.endpoints.huggingface.cloud/v1/
-    python server.py
-Then open http://127.0.0.1:8000/ in a browser.
-"""
-from __future__ import annotations
-import json
-import os
-from typing import Any, Dict, List
-import httpx
-from fastapi import FastAPI, HTTPException, Request
-from fastapi.middleware.cors import CORSMiddleware
-from fastapi.responses import FileResponse, StreamingResponse
-HF_TOKEN = os.environ.get("HF_TOKEN", "").strip()
-QUEST_BASE_URL = os.environ.get("QUEST_BASE_URL", "").strip().rstrip("/")
-QUEST_MODEL = os.environ.get("QUEST_ENDPOINT_MODEL", "tgi").strip() or "tgi"
-ALLOWED_ORIGINS = [
-    o.strip()
-    for o in os.environ.get(
-        "ALLOWED_ORIGINS",
-        "http://127.0.0.1:8000,http://localhost:8000",
-    ).split(",")
-    if o.strip()
-]
-REQUEST_TIMEOUT = float(os.environ.get("REQUEST_TIMEOUT", "600"))
-app = FastAPI(title="Quest-4B blog proxy")
-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=ALLOWED_ORIGINS,
-    allow_methods=["GET", "POST", "OPTIONS"],
-    allow_headers=["Content-Type"],
-    allow_credentials=False,
-)
-STATIC_DIR = os.path.dirname(os.path.abspath(__file__))
-@app.get("/")
-def index() -> FileResponse:
-    return FileResponse(os.path.join(STATIC_DIR, "index.html"))
-@app.get("/health")
-def health() -> Dict[str, Any]:
-    return {
-        "ok": True,
-        "has_token": bool(HF_TOKEN),
-        "has_base_url": bool(QUEST_BASE_URL),
-        "model": QUEST_MODEL,
-    }
-def _validate_config() -> None:
-    if not HF_TOKEN:
-        raise HTTPException(500, "HF_TOKEN is not set on the server")
-    if not QUEST_BASE_URL:
-        raise HTTPException(500, "QUEST_BASE_URL is not set on the server")
-def _sanitise_messages(raw: Any) -> List[Dict[str, str]]:
-    if not isinstance(raw, list) or not raw:
-        raise HTTPException(400, "`messages` must be a non-empty array")
-    cleaned: List[Dict[str, str]] = []
-    for m in raw:
-        if not isinstance(m, dict):
-            raise HTTPException(400, "each message must be an object")
-        role = str(m.get("role", "")).strip()
-        content = m.get("content", "")
-        if role not in {"system", "user", "assistant"}:
-            raise HTTPException(400, f"invalid role: {role!r}")
-        if not isinstance(content, str):
-            raise HTTPException(400, "message.content must be a string")
-        cleaned.append({"role": role, "content": content[:8000]})
-    if len(cleaned) > 40:
-        cleaned = cleaned[-40:]
-    return cleaned
-@app.post("/api/chat")
-async def chat(request: Request) -> StreamingResponse:
-    _validate_config()
-    try:
-        body = await request.json()
-    except Exception as exc:
-        raise HTTPException(400, f"invalid json: {exc}") from exc
-    messages = _sanitise_messages(body.get("messages"))
-    temperature = float(body.get("temperature", 0.4))
-    max_tokens = int(body.get("max_tokens", 1024))
-    max_tokens = max(32, min(max_tokens, 4096))
-    payload = {
-        "model": QUEST_MODEL,
-        "messages": messages,
-        "temperature": max(0.0, min(temperature, 1.5)),
-        "max_tokens": max_tokens,
-        "stream": True,
-    }
-    upstream_url = f"{QUEST_BASE_URL}/chat/completions"
-    headers = {
-        "Authorization": f"Bearer {HF_TOKEN}",
-        "Content-Type": "application/json",
-        "Accept": "text/event-stream",
-    }
-    async def relay() -> Any:
-        timeout = httpx.Timeout(REQUEST_TIMEOUT, connect=15.0)
-        async with httpx.AsyncClient(timeout=timeout) as client:
-            try:
-                async with client.stream(
-                    "POST", upstream_url, json=payload, headers=headers
-                ) as upstream:
-                    if upstream.status_code >= 400:
-                        text = (await upstream.aread()).decode("utf-8", errors="replace")
-                        err = json.dumps(
-                            {"error": {"status": upstream.status_code, "body": text[:800]}}
-                        )
-                        yield f"data: {err}\n\n".encode()
-                        yield b"data: [DONE]\n\n"
-                        return
-                    async for chunk in upstream.aiter_raw():
-                        if chunk:
-                            yield chunk
-            except httpx.HTTPError as exc:
-                err = json.dumps({"error": {"status": 502, "body": str(exc)}})
-                yield f"data: {err}\n\n".encode()
-                yield b"data: [DONE]\n\n"
-    return StreamingResponse(relay(), media_type="text/event-stream")
-if __name__ == "__main__":
-    import uvicorn
-    uvicorn.run(
-        "server:app",
-        host=os.environ.get("HOST", "127.0.0.1"),
-        port=int(os.environ.get("PORT", "8000")),
-        reload=False,
-    )