Spaces:

abhivsh
/

ModelTS_SearchEngine

Runtime error

App Files Files Community

abhivsh commited on Mar 11

Commit

086f690

verified ·

1 Parent(s): ee58e34

Upload 2 files

Browse files

Files changed (2) hide show

README.md +83 -0
requirements.txt +19 -0

README.md ADDED Viewed

	@@ -0,0 +1,83 @@

+---
+title: EnggSS RAG ChatBot
+emoji: ⚡
+colorFrom: blue
+colorTo: indigo
+sdk: gradio
+sdk_version: "5.0.0"
+app_file: app.py
+pinned: false
+license: other
+---
+# EnggSS RAG ChatBot
+**Serving-only** HuggingFace Space — reads a pre-built private dataset, no PDF
+processing at runtime.  Build the dataset locally with
+`preprocessing/create_dataset.py`, then deploy this Space to answer questions.
+## How it works
+```
+Local machine (once)
+  PDFs  →  create_dataset.py  →  BAAI/bge-large-en-v1.5 embeddings
+                                        │
+                                        ▼
+                            Private HuggingFace Dataset
+                                        │
+                  ┌─────────────────────┘
+                  ▼  (Space startup)
+         Load dataset → NumPy float32 matrix (L2-normalised)
+                  │
+                  ▼  (each query, ~20 ms)
+         Embed query → cosine scores → MMR top-3
+                  │
+                  ▼
+         Qwen2.5-7B-Instruct (HF Inference API) → answer
+                  │
+                  ▼
+              Gradio UI
+```
+## Tabs
+| Tab | Purpose |
+|-----|---------|
+| 💬 Q&A | Ask questions; see top-3 retrieved contexts + generated answer |
+| 📊 Analytics | Total chunks, documents processed, per-file breakdown |
+## Required Space Secrets
+Set in **Settings → Variables and Secrets**:
+| Secret | Description |
+|--------|-------------|
+| `HF_TOKEN` | HuggingFace token — needs **read** access to the dataset repo |
+| `HF_DATASET_REPO` | e.g. `your-org/enggss-rag-dataset` (created by preprocessing script) |
+## Setup order
+1. **Run preprocessing locally** (once, or when you add new PDFs):
+   ```bash
+   cd preprocessing
+   pip install -r requirements.txt
+   python create_dataset.py ./pdfs --repo your-org/enggss-rag-dataset
+   ```
+2. **Deploy this Space** — upload `app.py` + `requirements.txt` + `README.md`
+3. **Set the two secrets** above in Space Settings → Secrets
+4. Space restarts, loads the dataset, and is ready to answer questions
+To add new PDFs later without rebuilding everything:
+```bash
+python create_dataset.py ./pdfs --repo your-org/enggss-rag-dataset --update
+```
+## Local development
+```bash
+git clone https://huggingface.co/spaces/your-org/enggss-rag-chatbot
+cd enggss-rag-chatbot
+pip install -r requirements.txt
+# create .env with HF_TOKEN and HF_DATASET_REPO
+python app.py
+```

requirements.txt ADDED Viewed

	@@ -0,0 +1,19 @@

+# ── Dataset access ────────────────────────────────────────────────────────────
+datasets>=2.18.0
+# ── Query embedding (local, cached after first download ~1.3 GB) ──────────────
+sentence-transformers
+torch
+# ── LLM chain ─────────────────────────────────────────────────────────────────
+langchain-core>=1.2.0
+langchain-huggingface
+# ── UI  (Gradio 5.x — sdk_version "5.0.0" in README.md) ──────────────────────
+# Gradio 5.x rewrote oauth.py → no HfFolder import issue (was a 4.x bug)
+# pydub is no longer a gradio 5.x dependency → pyaudioop not required
+gradio>=5.0.0
+# ── Utilities ─────────────────────────────────────────────────────────────────
+numpy
+python-dotenv