digifreely commited on
Commit
21afc7b
Β·
verified Β·
1 Parent(s): 2ec04fb

Upload 3 files

Browse files
Files changed (3) hide show
  1. README.md +60 -6
  2. app.py +638 -0
  3. requirements.txt +33 -0
README.md CHANGED
@@ -1,13 +1,67 @@
1
  ---
2
- title: Chatinterface
3
- emoji: πŸ†
4
- colorFrom: indigo
5
- colorTo: red
6
  sdk: gradio
7
- sdk_version: 6.11.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Maria Learning Service
3
+ emoji: πŸ“š
4
+ colorFrom: blue
5
+ colorTo: green
6
  sdk: gradio
7
+ sdk_version: 5.9.1
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
  ---
12
 
13
+ # Maria Learning Service
14
+
15
+ A FastAPI-based AI tutoring service powered by Qwen2.5-1.5B-Instruct (float16, CPU-preloaded) with ZeroGPU.
16
+
17
+ ## Endpoints
18
+
19
+ | Endpoint | Method | Description |
20
+ |------------|--------|----------------------------------------------------------|
21
+ | `/dataset` | POST | Pre-load FAISS index + metadata for a board/class/subject |
22
+ | `/chat` | POST | Main tutoring endpoint (requires `/dataset` called first) |
23
+ | `/health` | GET | Health check |
24
+
25
+ ## Workflow
26
+
27
+ **Always call `/dataset` before `/chat`.**
28
+ `/dataset` loads and caches the knowledge base for the requested board/class/subject.
29
+ `/chat` performs RAG against the cached dataset and runs inference.
30
+
31
+ ## Authentication
32
+
33
+ Pass **one** of these headers per request:
34
+
35
+ | Header | Description |
36
+ |--------|-------------|
37
+ | `auth_code` | Raw value whose SHA-256 must match `HASH_VALUE` secret |
38
+ | `cf-turnstile-token` | Cloudflare Turnstile token verified against `CF_SECRET_KEY` secret |
39
+
40
+ ## Secrets Required
41
+
42
+ Set these in your Space β†’ Settings β†’ Secrets:
43
+
44
+ - `HASH_VALUE` β€” SHA-256 hex digest of your auth code
45
+ - `CF_SECRET_KEY` β€” Cloudflare Turnstile secret key
46
+
47
+ ## `/dataset` Reference
48
+
49
+ **Request**
50
+ ```json
51
+ {
52
+ "board": "NCERT",
53
+ "class": "Class 1",
54
+ "subject": "English"
55
+ }
56
+ ```
57
+
58
+ **Response**
59
+ ```json
60
+ {
61
+ "status": "ready",
62
+ "message": "Dataset Loaded"
63
+ }
64
+ ```
65
+
66
+ Repeated calls with the same `board/class/subject` are no-ops (served from cache).
67
+ Returns `412 Precondition Failed` from `/chat` if `/dataset` has not been called first.
app.py ADDED
@@ -0,0 +1,638 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ─────────────────────────────────────────────────────────────────────────────
2
+ # Maria Learning Service | app.py
3
+ # FastAPI + ZeroGPU (Qwen2.5-1.5B-Instruct, NF4 int4) + FAISS RAG + gTTS
4
+ # ─────────────────────────────────────────────────────────────────────────────
5
+
6
+ import asyncio
7
+ import os
8
+ import gc
9
+ import json
10
+ import base64
11
+ import hashlib
12
+ import logging
13
+ import copy
14
+ from io import BytesIO
15
+ from typing import List, Any, Optional
16
+
17
+ import httpx
18
+ import numpy as np
19
+ import pandas as pd
20
+ import faiss
21
+ import gradio as gr
22
+ from fastapi import FastAPI, HTTPException, Request
23
+ from fastapi.responses import JSONResponse
24
+ from pydantic import BaseModel
25
+ from huggingface_hub import hf_hub_download, snapshot_download
26
+ from gtts import gTTS
27
+
28
+ # ── ZeroGPU: import spaces only when running inside HF Spaces ─────────────────
29
+ try:
30
+ import spaces as _spaces
31
+ _ZEROGPU = True
32
+ except ImportError:
33
+ import types
34
+
35
+ class _spaces: # noqa: N801
36
+ @staticmethod
37
+ def GPU(fn):
38
+ return fn
39
+
40
+ _ZEROGPU = False
41
+
42
+ logging.basicConfig(
43
+ level=logging.INFO,
44
+ format="%(asctime)s %(levelname)-8s %(message)s",
45
+ )
46
+ log = logging.getLogger(__name__)
47
+
48
+ # ─────────────────────────────────────────────────────────────────────────────
49
+ # Config / Secrets
50
+ # ─────────────────────────────────────────────────────────────────────────────
51
+ HASH_VALUE = os.environ.get("HASH_VALUE", "")
52
+ CF_SECRET_KEY = os.environ.get("CF_SECRET_KEY", "")
53
+ HF_REPO_ID = "digifreely/Maria"
54
+ LLM_MODEL_ID = "Qwen/Qwen2.5-1.5B-Instruct" # Change 1: updated model
55
+
56
+
57
+ # ─────────────────────────────────────────────────────────────────────────────
58
+ # Preload: cache model weights to local disk + tokenizer at container start.
59
+ #
60
+ # Why not load directly to CPU in int4?
61
+ # bitsandbytes 4-bit quantized models raise ValueError on .to("cuda") β€”
62
+ # quantized tensors cannot be device-moved after loading. The correct ZeroGPU
63
+ # pattern is:
64
+ # β€’ Startup : snapshot_download β†’ weights land in HF cache on local disk.
65
+ # β€’ GPU call : load with BitsAndBytesConfig + device_map="auto" (CUDA).
66
+ # This is fast because no network I/O occurs (cache hit).
67
+ # β€’ After : delete model object and free GPU memory.
68
+ # ─────────────────────────────────────────────────────────────────────────────
69
+ _llm_tok = None # tokenizer β€” lives on CPU for the container lifetime
70
+ _llm_cache_dir = None # local path returned by snapshot_download
71
+
72
+ def _preload_model():
73
+ """Cache Qwen2.5-1.5B-Instruct weights to disk and load tokenizer on CPU."""
74
+ global _llm_tok, _llm_cache_dir
75
+ from transformers import AutoTokenizer
76
+
77
+ log.info("Downloading / verifying %s weights to local cache…", LLM_MODEL_ID)
78
+ _llm_cache_dir = snapshot_download(repo_id=LLM_MODEL_ID)
79
+ _llm_tok = AutoTokenizer.from_pretrained(_llm_cache_dir, trust_remote_code=True)
80
+ log.info("Model weights cached at: %s", _llm_cache_dir)
81
+
82
+ # Trigger preload immediately when the module is imported
83
+ _preload_model()
84
+
85
+
86
+ # ─────────────────────────────────────────────────────────────────────────────
87
+ # Embedding model (CPU, loaded once per container lifetime)
88
+ # ─────────────────────────────────────────────────────────────────────────────
89
+ _emb_model = None
90
+
91
+ def _get_emb_model(name: str = "sentence-transformers/all-MiniLM-L6-v2"):
92
+ global _emb_model
93
+ if _emb_model is None:
94
+ from sentence_transformers import SentenceTransformer
95
+ log.info("Loading embedding model: %s", name)
96
+ _emb_model = SentenceTransformer(name)
97
+ return _emb_model
98
+
99
+
100
+ # ──────────────────────────────────────────────────────────��──────────────────
101
+ # Security helpers
102
+ # ─────────────────────────────────────────────────────────────────────────────
103
+ def _check_auth_code(code: str) -> bool:
104
+ if not HASH_VALUE:
105
+ return False
106
+ return hashlib.sha256(code.encode()).hexdigest() == HASH_VALUE
107
+
108
+
109
+ async def _check_turnstile(token: str) -> bool:
110
+ if not CF_SECRET_KEY:
111
+ return False
112
+ try:
113
+ async with httpx.AsyncClient(timeout=8.0) as client:
114
+ resp = await client.post(
115
+ "https://challenges.cloudflare.com/turnstile/v0/siteverify",
116
+ data={"secret": CF_SECRET_KEY, "response": token},
117
+ )
118
+ return resp.json().get("success", False)
119
+ except Exception as exc:
120
+ log.error("Turnstile verification error: %s", exc)
121
+ return False
122
+
123
+
124
+ async def _authenticate(request: Request) -> bool:
125
+ auth_code = request.headers.get("auth_code")
126
+ cf_token = request.headers.get("cf-turnstile-token")
127
+ if auth_code:
128
+ return _check_auth_code(auth_code)
129
+ if cf_token:
130
+ return await _check_turnstile(cf_token)
131
+ return False
132
+
133
+
134
+ # ─────────────────────────────────────────────────────────────────────────────
135
+ # Change 3: Dataset cache β€” populated by /dataset, consumed by /chat
136
+ # ─────────────────────────────────────────────────────────────────────────────
137
+ # Key: (board, cls, subject) β†’ (config, faiss_index, metadata)
138
+ _dataset_cache: dict = {}
139
+
140
+
141
+ def _dataset_key(board: str, cls: str, subject: str) -> tuple:
142
+ return (board.strip(), cls.strip(), subject.strip())
143
+
144
+
145
+ def _load_dataset(board: str, cls: str, subject: str):
146
+ """Download config / FAISS index / metadata from HF Hub and return them."""
147
+ prefix = f"knowledgebase/{board}/{cls}/{subject}"
148
+ log.info("Fetching dataset: %s", prefix)
149
+
150
+ config_path = hf_hub_download(
151
+ repo_id=HF_REPO_ID,
152
+ filename=f"{prefix}/config.json",
153
+ repo_type="dataset",
154
+ )
155
+ faiss_path = hf_hub_download(
156
+ repo_id=HF_REPO_ID,
157
+ filename=f"{prefix}/faiss_index.bin",
158
+ repo_type="dataset",
159
+ )
160
+ meta_path = hf_hub_download(
161
+ repo_id=HF_REPO_ID,
162
+ filename=f"{prefix}/metadata.parquet",
163
+ repo_type="dataset",
164
+ )
165
+
166
+ with open(config_path) as fh:
167
+ config = json.load(fh)
168
+
169
+ index = faiss.read_index(faiss_path)
170
+ metadata = pd.read_parquet(meta_path)
171
+ return config, index, metadata
172
+
173
+
174
+ def _rag_search(
175
+ query: str,
176
+ config: dict,
177
+ index,
178
+ metadata: pd.DataFrame,
179
+ k: int = 3,
180
+ ) -> List[str]:
181
+ """Embed query, search FAISS, return top-k text chunks."""
182
+ emb_model_name = config.get(
183
+ "embedding_model", "sentence-transformers/all-MiniLM-L6-v2"
184
+ )
185
+ emb = _get_emb_model(emb_model_name)
186
+ vec = emb.encode([query], normalize_embeddings=True).astype(np.float32)
187
+ _, idxs = index.search(vec, k)
188
+
189
+ text_cols = ["text", "content", "chunk", "passage", "answer", "description"]
190
+ chunks: List[str] = []
191
+ for i in idxs[0]:
192
+ if 0 <= i < len(metadata):
193
+ row = metadata.iloc[i]
194
+ for col in text_cols:
195
+ if col in metadata.columns and pd.notna(row[col]):
196
+ chunks.append(str(row[col])[:600])
197
+ break
198
+ return chunks
199
+
200
+
201
+ # ─────────────────────────────────────────────────────────────────────────────
202
+ # LLM inference β€” NF4 int4 model loaded from disk cache into GPU per call.
203
+ # The @spaces.GPU decorator acquires the ZeroGPU slot for the duration.
204
+ # Tokenizer is reused from _llm_tok (already on CPU). Model is loaded fresh
205
+ # from the local disk cache (_llm_cache_dir) β€” no network I/O after startup.
206
+ # ─────────────────────────────────────────────────────────────────────────────
207
+ def _model_generate(system_prompt: str, user_prompt: str) -> str:
208
+ import torch
209
+ from transformers import AutoModelForCausalLM, BitsAndBytesConfig
210
+
211
+ quant_cfg = BitsAndBytesConfig(
212
+ load_in_4bit=True,
213
+ bnb_4bit_compute_dtype=torch.float16,
214
+ bnb_4bit_use_double_quant=True,
215
+ bnb_4bit_quant_type="nf4",
216
+ )
217
+
218
+ log.info("Loading %s (NF4 int4) from disk cache to GPU…", LLM_MODEL_ID)
219
+ model = AutoModelForCausalLM.from_pretrained(
220
+ _llm_cache_dir, # local disk β€” no download
221
+ quantization_config=quant_cfg,
222
+ device_map="auto", # maps directly to CUDA
223
+ trust_remote_code=True,
224
+ )
225
+ model.eval()
226
+
227
+ messages = [
228
+ {"role": "system", "content": system_prompt},
229
+ {"role": "user", "content": user_prompt},
230
+ ]
231
+
232
+ text = _llm_tok.apply_chat_template(
233
+ messages,
234
+ tokenize=False,
235
+ add_generation_prompt=True,
236
+ )
237
+ inputs = _llm_tok([text], return_tensors="pt").to(model.device)
238
+
239
+ with torch.no_grad():
240
+ out_ids = model.generate(
241
+ **inputs,
242
+ max_new_tokens=300,
243
+ temperature=0.7,
244
+ top_p=0.9,
245
+ do_sample=True,
246
+ repetition_penalty=1.1,
247
+ pad_token_id=_llm_tok.eos_token_id,
248
+ )
249
+
250
+ new_tokens = out_ids[0][inputs.input_ids.shape[1]:]
251
+ result = _llm_tok.decode(new_tokens, skip_special_tokens=True).strip()
252
+
253
+ # Release GPU memory before ZeroGPU slot is returned
254
+ del model
255
+ gc.collect()
256
+ torch.cuda.empty_cache()
257
+
258
+ log.info("Inference complete. Output length: %d chars", len(result))
259
+ return result
260
+
261
+
262
+ # Apply ZeroGPU decorator
263
+ run_inference = _spaces.GPU(_model_generate)
264
+
265
+
266
+ # ─────────────────────────────────────────────────────────────────────────────
267
+ # Text-to-Speech
268
+ # ─────────────────────────────────────────────────────────────────────────────
269
+ def _tts_to_b64(text: str) -> str:
270
+ try:
271
+ tts = gTTS(text=text[:3000], lang="en", tld="co.uk", slow=False)
272
+ buf = BytesIO()
273
+ tts.write_to_fp(buf)
274
+ buf.seek(0)
275
+ return base64.b64encode(buf.read()).decode("utf-8")
276
+ except Exception as exc:
277
+ log.error("TTS error: %s", exc)
278
+ return ""
279
+
280
+
281
+ # ─────────────────────────────────────────────────────────────────────────────
282
+ # Change 1: Optimized prompt builder β€” concise to fit within 300-token output
283
+ # ─────────────────────────────────────────────────────────────────────────────
284
+ def _build_system_prompt(lp: dict, rag_chunks: List[str]) -> str:
285
+ persona = lp.get("teacher_persona", "A friendly and patient teacher")
286
+ student = lp.get("student_name", "Student")
287
+ chat_history = lp.get("chat_history", [])[-3:] # reduced: last 3 turns
288
+ scratchpad = lp.get("scratchpad", [])[-2:] # reduced: last 2 entries
289
+ current_learning = lp.get("assessment_stages", {}).get("current_learning", [])
290
+
291
+ history_block = "\n".join(
292
+ f'S: {h.get("user_input","")}\nT: {h.get("system_output","")}'
293
+ for h in chat_history
294
+ ) or "None."
295
+
296
+ scratch_block = "\n".join(
297
+ f'[{s.get("chat_id","")}] {s.get("thought","")} | {s.get("action","")}'
298
+ for s in scratchpad
299
+ ) or "Empty."
300
+
301
+ rag_block = "\n---\n".join(rag_chunks) if rag_chunks else "No relevant content found."
302
+ cl_block = json.dumps(current_learning, indent=2) if current_learning else "[]"
303
+
304
+ return f"""You are {persona} teaching {student}, aged 6–12. Use simple English. Be warm and brief.
305
+
306
+ STUDENT: {student}
307
+ LEARNING OBJECTIVES:
308
+ {cl_block}
309
+
310
+ KNOWLEDGE BASE:
311
+ {rag_block}
312
+
313
+ RECENT CHAT:
314
+ {history_block}
315
+
316
+ NOTES:
317
+ {scratch_block}
318
+
319
+ TASK: Classify intent, respond to the student, return ONLY valid JSON. Keep "response" under 80 words.
320
+
321
+ INTENT RULES:
322
+ "block" β€” rude/inappropriate message. First time: redirect kindly. Repeat: end gently.
323
+ "questions" β€” off-topic question. Answer briefly from KB if found, then redirect.
324
+ "curriculum" β€” on-topic. Follow stages in order: teach β†’ re_teach β†’ show_and_tell β†’ assess.
325
+ teach: explain using KB. re_teach: ask one check question; re-explain if wrong.
326
+ show_and_tell: ask a similar question. assess: pass=complete, fail=Not_Complete (retry).
327
+ "chitchat" β€” casual talk. Respond warmly, then bring up learning topic.
328
+
329
+ OUTPUT β€” return ONLY this JSON:
330
+ {{
331
+ "intent": "<block|questions|curriculum|chitchat>",
332
+ "response": "<reply to student, max 80 words>",
333
+ "stage_updates": [{{"topic":"<topic>","goal":"<goal>","teach":"<complete|Not_Complete>","re_teach":"<complete|Not_Complete>","show_and_tell":"<complete|Not_Complete>","assess":"<complete|Not_Complete>"}}],
334
+ "thought": "<brief internal reasoning>",
335
+ "action": "<teach|re_teach|show_and_tell|assess|answer|redirect|discourage|end|chitchat>",
336
+ "observation": "<brief student observation>"
337
+ }}"""
338
+
339
+
340
+ # ��────────────────────────────────────────────────────────────────────────────
341
+ # JSON parser (robust β€” handles markdown fences, partial JSON, etc.)
342
+ # ─────────────────────────────────────────────────────────────────────────────
343
+ def _parse_llm_output(raw: str) -> dict:
344
+ text = raw.strip()
345
+
346
+ if "```" in text:
347
+ for part in text.split("```"):
348
+ part = part.strip()
349
+ if part.startswith("json"):
350
+ part = part[4:].strip()
351
+ try:
352
+ return json.loads(part)
353
+ except json.JSONDecodeError:
354
+ continue
355
+
356
+ try:
357
+ return json.loads(text)
358
+ except json.JSONDecodeError:
359
+ pass
360
+
361
+ start = text.find("{")
362
+ end = text.rfind("}") + 1
363
+ if start != -1 and end > start:
364
+ try:
365
+ return json.loads(text[start:end])
366
+ except json.JSONDecodeError:
367
+ pass
368
+
369
+ log.warning("Could not parse JSON from model output. Using raw text as response.")
370
+ return {
371
+ "intent": "questions",
372
+ "response": raw,
373
+ "stage_updates": [],
374
+ "thought": "",
375
+ "action": "answer",
376
+ "observation": "json_parse_failed",
377
+ }
378
+
379
+
380
+ # ─────────────────────────────────────────────────────────────────────────────
381
+ # State updater
382
+ # ─────────────────────────────────────────────────────────────────────────────
383
+ def _apply_state_updates(
384
+ lp: dict,
385
+ parsed: dict,
386
+ user_msg: str,
387
+ ai_msg: str,
388
+ ) -> dict:
389
+ lp = copy.deepcopy(lp)
390
+
391
+ history = lp.setdefault("chat_history", [])
392
+ new_id = (history[-1]["chat_id"] + 1) if history else 1
393
+ history.append({
394
+ "chat_id": new_id,
395
+ "user_input": user_msg,
396
+ "system_output": ai_msg,
397
+ })
398
+
399
+ scratch = lp.setdefault("scratchpad", [])
400
+ scratch.append({
401
+ "chat_id": new_id,
402
+ "thought": parsed.get("thought", ""),
403
+ "action": parsed.get("action", ""),
404
+ "action_input": user_msg,
405
+ "observation": parsed.get("observation", ""),
406
+ })
407
+
408
+ current_learning = lp.get("assessment_stages", {}).get("current_learning", [])
409
+ valid_statuses = {"complete", "Not_Complete"}
410
+
411
+ for upd in parsed.get("stage_updates", []):
412
+ for item in current_learning:
413
+ if item.get("topic") == upd.get("topic"):
414
+ for obj in item.get("learning_objectives", []):
415
+ if obj.get("goal") == upd.get("goal"):
416
+ for stage in ("teach", "re_teach", "show_and_tell", "assess"):
417
+ val = upd.get(stage)
418
+ if val in valid_statuses:
419
+ obj[stage] = val
420
+
421
+ lp.setdefault("assessment_stages", {})["current_learning"] = current_learning
422
+ return lp
423
+
424
+
425
+ # ─────────────────────────────────────────────────────────────────────────────
426
+ # FastAPI application
427
+ # ─────────────────────────────────────────────────────────────────────────────
428
+ _fastapi = FastAPI(
429
+ title="Maria Learning Service",
430
+ description="AI tutoring API powered by Qwen2.5-1.5B-Instruct with ZeroGPU.",
431
+ version="1.1.0",
432
+ docs_url="/docs",
433
+ redoc_url="/redoc",
434
+ )
435
+
436
+
437
+ class ChatRequest(BaseModel):
438
+ learning_path: dict[str, Any]
439
+ query: dict[str, Any]
440
+
441
+
442
+ class DatasetRequest(BaseModel):
443
+ board: str
444
+ subject: str
445
+ # Pydantic alias so "class" (reserved word) maps to cls_name internally
446
+ class_name: str = ""
447
+
448
+ class Config:
449
+ # Allow the JSON field "class" to populate class_name via alias
450
+ populate_by_name = True
451
+
452
+ @classmethod
453
+ def model_validate_with_class(cls, data: dict):
454
+ data = dict(data)
455
+ if "class" in data:
456
+ data["class_name"] = data.pop("class")
457
+ return cls(**data)
458
+
459
+
460
+ @_fastapi.get("/health", tags=["Utility"])
461
+ async def health():
462
+ return {"status": "ok", "model": LLM_MODEL_ID, "zerogpu": _ZEROGPU}
463
+
464
+
465
+ # ───────────────────────────────────────────────────────���─────────────────────
466
+ # Change 3: /dataset endpoint
467
+ # ─────────────────────────────────────────────────────────────────────────────
468
+ @_fastapi.post("/dataset", tags=["Dataset"])
469
+ async def dataset(request: Request):
470
+ """
471
+ Pre-load the FAISS index, config, and metadata for a given board/class/subject.
472
+ Must be called before /chat. Subsequent calls with the same key are no-ops (cached).
473
+
474
+ Request body:
475
+ { "board": "NCERT", "class": "Class 1", "subject": "English" }
476
+
477
+ Response:
478
+ { "status": "ready", "message": "Dataset Loaded" }
479
+ """
480
+ # ── Authentication ──────────────────────────────────────────────────────
481
+ if not await _authenticate(request):
482
+ raise HTTPException(status_code=403, detail="Forbidden")
483
+
484
+ # ── Parse body manually to handle "class" reserved keyword ─────────────
485
+ try:
486
+ body = await request.json()
487
+ except Exception:
488
+ raise HTTPException(status_code=422, detail="Invalid JSON body")
489
+
490
+ board = str(body.get("board", "")).strip()
491
+ cls = str(body.get("class", "")).strip()
492
+ subject = str(body.get("subject", "")).strip()
493
+
494
+ if not all([board, cls, subject]):
495
+ raise HTTPException(
496
+ status_code=422,
497
+ detail="Request body must contain board, class, and subject",
498
+ )
499
+
500
+ key = _dataset_key(board, cls, subject)
501
+
502
+ # ── Return immediately if already cached ────────────────────────────────
503
+ if key in _dataset_cache:
504
+ log.info("Dataset cache hit: %s", key)
505
+ return JSONResponse({"status": "ready", "message": "Dataset Loaded"})
506
+
507
+ # ── Load and cache β€” run blocking HF I/O in a thread pool so the event
508
+ # loop is not frozen, but we still await completion before responding. ──
509
+ try:
510
+ config, faiss_index, metadata = await asyncio.to_thread(
511
+ _load_dataset, board, cls, subject
512
+ )
513
+ _dataset_cache[key] = (config, faiss_index, metadata)
514
+ log.info("Dataset cached for key: %s", key)
515
+ except Exception as exc:
516
+ log.error("Dataset load error: %s", exc)
517
+ raise HTTPException(
518
+ status_code=500,
519
+ detail=f"Could not load dataset for {board}/{cls}/{subject}: {exc}",
520
+ )
521
+
522
+ return JSONResponse({"status": "ready", "message": "Dataset Loaded"})
523
+
524
+
525
+ # ─────────────────────────────────────────────────────────────────────────────
526
+ # /chat endpoint β€” Change 4: uses dataset preloaded via /dataset
527
+ # ─────────────────────────────────────────────────────────────────────────────
528
+ @_fastapi.post("/chat", tags=["Tutor"])
529
+ async def chat(request: Request, body: ChatRequest):
530
+ # ── 1. Authentication ───────────────────────────────────────────────────
531
+ if not await _authenticate(request):
532
+ raise HTTPException(status_code=403, detail="Forbidden")
533
+
534
+ # ── 2. Validate request body ────────────────────────────────────────────
535
+ lp = body.learning_path
536
+ msg = body.query.get("request_message", "").strip()
537
+ if not msg:
538
+ raise HTTPException(status_code=422, detail="request_message must not be empty")
539
+
540
+ board = lp.get("board", "").strip()
541
+ cls = lp.get("class", "").strip()
542
+ subject = lp.get("subject", "").strip()
543
+
544
+ if not all([board, cls, subject]):
545
+ raise HTTPException(
546
+ status_code=422,
547
+ detail="learning_path must contain board, class, and subject",
548
+ )
549
+
550
+ # ── 3. Change 4: Retrieve dataset from cache (must call /dataset first) ─
551
+ key = _dataset_key(board, cls, subject)
552
+ if key not in _dataset_cache:
553
+ raise HTTPException(
554
+ status_code=412,
555
+ detail=(
556
+ f"Dataset for {board}/{cls}/{subject} is not loaded. "
557
+ "Please call POST /dataset first."
558
+ ),
559
+ )
560
+ config, faiss_index, metadata = _dataset_cache[key]
561
+
562
+ # ── 4. RAG retrieval ────────────────────────────────────────────────────
563
+ try:
564
+ rag_chunks = _rag_search(msg, config, faiss_index, metadata)
565
+ except Exception as exc:
566
+ log.warning("RAG search failed (%s) β€” continuing without context", exc)
567
+ rag_chunks = []
568
+
569
+ # ── 5. Build prompt and run LLM (Change 2: only CPUβ†’GPU move happens here)
570
+ system_prompt = _build_system_prompt(lp, rag_chunks)
571
+ user_prompt = f"Student: {msg}"
572
+
573
+ try:
574
+ raw_output = run_inference(system_prompt, user_prompt)
575
+ except Exception as exc:
576
+ log.error("Inference error: %s", exc)
577
+ raise HTTPException(status_code=500, detail=f"Inference failed: {exc}")
578
+
579
+ # ── 6. Parse structured output ──────────────────────────────────────────
580
+ parsed = _parse_llm_output(raw_output)
581
+ ai_text = parsed.get("response", raw_output).strip()
582
+
583
+ # ── 7. Text-to-speech ───────────────────────────────────────────────────
584
+ audio_b64 = _tts_to_b64(ai_text)
585
+
586
+ # ── 8. Update learning path state ───────────────────────────────────────
587
+ updated_lp = _apply_state_updates(lp, parsed, msg, ai_text)
588
+
589
+ # ── 9. Return response ──────────────────────────────────────────────────
590
+ return JSONResponse({
591
+ "learning_path": updated_lp,
592
+ "query": {
593
+ "response_message": {
594
+ "text": ai_text,
595
+ "visual": "No",
596
+ "visual_content": "",
597
+ "audio_output": audio_b64,
598
+ }
599
+ },
600
+ })
601
+
602
+
603
+ # ─────────────────────────────────────────────────────────────────────────────
604
+ # Gradio shim
605
+ # ─────────────────────────────────────────────────────────────────────────────
606
+ with gr.Blocks(title="Maria Learning Service") as _gradio_ui:
607
+ gr.Markdown(
608
+ """
609
+ ## Maria Learning Service
610
+ This Space exposes a **REST API** β€” it is not a chat UI.
611
+
612
+ | Endpoint | Method | Description |
613
+ |-----------|--------|------------------------------------|
614
+ | `/dataset`| POST | Pre-load dataset (call before chat)|
615
+ | `/chat` | POST | Main tutoring endpoint |
616
+ | `/health` | GET | Health check |
617
+ | `/docs` | GET | Swagger UI |
618
+
619
+ Authenticate via `auth_code` header or `cf-turnstile-token` header.
620
+ """
621
+ )
622
+
623
+ # Mount Gradio UI at /ui β€” keeps FastAPI routes at root level
624
+ app = gr.mount_gradio_app(_fastapi, _gradio_ui, path="/ui")
625
+
626
+
627
+ # ─────────────────────────────────────────────────────────────────────────────
628
+ # Entry point
629
+ # ─────────────────────────────────────────────────────────────────────────────
630
+ if __name__ == "__main__":
631
+ import uvicorn
632
+ uvicorn.run(
633
+ "app:app",
634
+ host="0.0.0.0",
635
+ port=7860,
636
+ log_level="info",
637
+ workers=1, # Single worker β€” ZeroGPU requires this
638
+ )
requirements.txt ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ── Web framework ─────────────────────────────────────────────────────────────
2
+ fastapi==0.115.6
3
+ uvicorn[standard]==0.32.1
4
+ pydantic==2.10.3
5
+ python-multipart==0.0.20
6
+ httpx==0.28.1
7
+
8
+ # ── HuggingFace ecosystem ─────────────────────────────────────────────────────
9
+ huggingface-hub>=0.27.0
10
+ transformers>=4.50.0
11
+ tokenizers>=0.21.0
12
+ safetensors>=0.5.0
13
+ accelerate>=1.3.0
14
+
15
+ # ── Quantisation (NF4 int4 via bitsandbytes) ──────────────────────────────────
16
+ bitsandbytes>=0.45.0
17
+
18
+ # ── Embeddings ────────────────────────────────────────────────────────────────
19
+ sentence-transformers>=3.3.0
20
+
21
+ # ── Vector search ─────────────────────────────────────────────────────────────
22
+ faiss-cpu>=1.9.0
23
+
24
+ # ── Data ──────────────────────────────────────────────────────────────────────
25
+ pandas>=2.2.0
26
+ pyarrow>=14.0.0
27
+ numpy>=1.26.0,<2.0.0
28
+
29
+ # ── Audio ─────────────────────────────────────────────────────────────────────
30
+ gTTS>=2.5.0
31
+
32
+ # ── ZeroGPU (pre-installed in HF Spaces; listed for local dev) ────────────────
33
+ # spaces # auto-installed by HF Spaces runner β€” do NOT pin; omit if local