digifreely commited on
Commit
854c0d2
Β·
verified Β·
1 Parent(s): f42da42

Upload 3 files

Browse files
Files changed (3) hide show
  1. README.md +60 -6
  2. app.py +609 -0
  3. requirements.txt +27 -0
README.md CHANGED
@@ -1,13 +1,67 @@
1
  ---
2
- title: Chatnew
3
- emoji: 🌍
4
- colorFrom: pink
5
- colorTo: yellow
6
  sdk: gradio
7
- sdk_version: 6.11.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Maria Learning Service
3
+ emoji: πŸ“š
4
+ colorFrom: blue
5
+ colorTo: green
6
  sdk: gradio
7
+ sdk_version: 5.9.1
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
  ---
12
 
13
+ # Maria Learning Service
14
+
15
+ A FastAPI-based AI tutoring service powered by Qwen3-0.6B (float32, CPU-preloaded).
16
+
17
+ ## Endpoints
18
+
19
+ | Endpoint | Method | Description |
20
+ |------------|--------|----------------------------------------------------------|
21
+ | `/dataset` | POST | Pre-load FAISS index + metadata for a board/class/subject |
22
+ | `/chat` | POST | Main tutoring endpoint (requires `/dataset` called first) |
23
+ | `/health` | GET | Health check |
24
+
25
+ ## Workflow
26
+
27
+ **Always call `/dataset` before `/chat`.**
28
+ `/dataset` loads and caches the knowledge base for the requested board/class/subject.
29
+ `/chat` performs RAG against the cached dataset and runs inference.
30
+
31
+ ## Authentication
32
+
33
+ Pass **one** of these headers per request:
34
+
35
+ | Header | Description |
36
+ |--------|-------------|
37
+ | `auth_code` | Raw value whose SHA-256 must match `HASH_VALUE` secret |
38
+ | `cf-turnstile-token` | Cloudflare Turnstile token verified against `CF_SECRET_KEY` secret |
39
+
40
+ ## Secrets Required
41
+
42
+ Set these in your Space β†’ Settings β†’ Secrets:
43
+
44
+ - `HASH_VALUE` β€” SHA-256 hex digest of your auth code
45
+ - `CF_SECRET_KEY` β€” Cloudflare Turnstile secret key
46
+
47
+ ## `/dataset` Reference
48
+
49
+ **Request**
50
+ ```json
51
+ {
52
+ "board": "NCERT",
53
+ "class": "Class 1",
54
+ "subject": "English"
55
+ }
56
+ ```
57
+
58
+ **Response**
59
+ ```json
60
+ {
61
+ "status": "ready",
62
+ "message": "Dataset Loaded"
63
+ }
64
+ ```
65
+
66
+ Repeated calls with the same `board/class/subject` are no-ops (served from cache).
67
+ Returns `412 Precondition Failed` from `/chat` if `/dataset` has not been called first.
app.py ADDED
@@ -0,0 +1,609 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ─────────────────────────────────────────────────────────────────────────────
2
+ # Maria Learning Service | app.py
3
+ # FastAPI + CPU (Qwen3-0.6B, int4 via bitsandbytes) + FAISS RAG + gTTS
4
+ # ─────────────────────────────────────────────────────────────────────────────
5
+
6
+ import asyncio
7
+ import os
8
+ import gc
9
+ import json
10
+ import base64
11
+ import hashlib
12
+ import logging
13
+ import copy
14
+ from io import BytesIO
15
+ from typing import List, Any, Optional
16
+
17
+ import httpx
18
+ import numpy as np
19
+ import pandas as pd
20
+ import faiss
21
+ import gradio as gr
22
+ from fastapi import FastAPI, HTTPException, Request
23
+ from fastapi.responses import JSONResponse
24
+ from pydantic import BaseModel
25
+ from huggingface_hub import hf_hub_download
26
+ from gtts import gTTS
27
+
28
+ logging.basicConfig(
29
+ level=logging.INFO,
30
+ format="%(asctime)s %(levelname)-8s %(message)s",
31
+ )
32
+ log = logging.getLogger(__name__)
33
+
34
+ # ─────────────────────────────────────────────────────────────────────────────
35
+ # Config / Secrets
36
+ # ─────────────────────────────────────────────────────────────────────────────
37
+ HASH_VALUE = os.environ.get("HASH_VALUE", "")
38
+ CF_SECRET_KEY = os.environ.get("CF_SECRET_KEY", "")
39
+ HF_REPO_ID = "digifreely/Maria"
40
+ LLM_MODEL_ID = "Qwen/Qwen3-0.6B" # Qwen3 0.6B β€” CPU int4
41
+
42
+
43
+ # ─────────────────────────────────────────────────────────────────────────────
44
+ # Preload: model + tokenizer loaded once into CPU RAM at container start.
45
+ #
46
+ # bitsandbytes 4-bit is CUDA-only; for CPU we load the 0.6B model in float32
47
+ # which is lightweight enough to reside in memory for the container lifetime.
48
+ # No per-request load/unload cycle β€” the model object is reused directly.
49
+ # ─────────────────────────────────────────────────────────────────────────────
50
+ _llm_tok = None # tokenizer
51
+ _llm_model = None # model β€” lives in CPU RAM for the container lifetime
52
+
53
+ def _preload_model():
54
+ """Load Qwen3-0.6B tokenizer + model into CPU RAM at container start."""
55
+ global _llm_tok, _llm_model
56
+ import torch
57
+ from transformers import AutoTokenizer, AutoModelForCausalLM
58
+
59
+ log.info("Loading %s to CPU RAM…", LLM_MODEL_ID)
60
+ _llm_tok = AutoTokenizer.from_pretrained(LLM_MODEL_ID, trust_remote_code=True)
61
+ _llm_model = AutoModelForCausalLM.from_pretrained(
62
+ LLM_MODEL_ID,
63
+ torch_dtype=torch.float32, # float32 for CPU compatibility
64
+ device_map="cpu",
65
+ trust_remote_code=True,
66
+ )
67
+ _llm_model.eval()
68
+ log.info("Model loaded on CPU β€” ready for inference.")
69
+
70
+ # Trigger preload immediately when the module is imported
71
+ _preload_model()
72
+
73
+
74
+ # ─────────────────────────────────────────────────────────────────────────────
75
+ # Embedding model (CPU, loaded once per container lifetime)
76
+ # ─────────────────────────────────────────────────────────────────────────────
77
+ _emb_model = None
78
+
79
+ def _get_emb_model(name: str = "sentence-transformers/all-MiniLM-L6-v2"):
80
+ global _emb_model
81
+ if _emb_model is None:
82
+ from sentence_transformers import SentenceTransformer
83
+ log.info("Loading embedding model: %s", name)
84
+ _emb_model = SentenceTransformer(name)
85
+ return _emb_model
86
+
87
+
88
+ # ─────────────────────────────────────────────────────────────────────────────
89
+ # Security helpers
90
+ # ─────────────────────────────────────────────────────────────────────────────
91
+ def _check_auth_code(code: str) -> bool:
92
+ if not HASH_VALUE:
93
+ return False
94
+ return hashlib.sha256(code.encode()).hexdigest() == HASH_VALUE
95
+
96
+
97
+ async def _check_turnstile(token: str) -> bool:
98
+ if not CF_SECRET_KEY:
99
+ return False
100
+ try:
101
+ async with httpx.AsyncClient(timeout=8.0) as client:
102
+ resp = await client.post(
103
+ "https://challenges.cloudflare.com/turnstile/v0/siteverify",
104
+ data={"secret": CF_SECRET_KEY, "response": token},
105
+ )
106
+ return resp.json().get("success", False)
107
+ except Exception as exc:
108
+ log.error("Turnstile verification error: %s", exc)
109
+ return False
110
+
111
+
112
+ async def _authenticate(request: Request) -> bool:
113
+ auth_code = request.headers.get("auth_code")
114
+ cf_token = request.headers.get("cf-turnstile-token")
115
+ if auth_code:
116
+ return _check_auth_code(auth_code)
117
+ if cf_token:
118
+ return await _check_turnstile(cf_token)
119
+ return False
120
+
121
+
122
+ # ─────────────────────────────────────────────────────────────────────────────
123
+ # Change 3: Dataset cache β€” populated by /dataset, consumed by /chat
124
+ # ─────────────────────────────────────────────────────────────────────────────
125
+ # Key: (board, cls, subject) β†’ (config, faiss_index, metadata)
126
+ _dataset_cache: dict = {}
127
+
128
+
129
+ def _dataset_key(board: str, cls: str, subject: str) -> tuple:
130
+ return (board.strip(), cls.strip(), subject.strip())
131
+
132
+
133
+ def _load_dataset(board: str, cls: str, subject: str):
134
+ """Download config / FAISS index / metadata from HF Hub and return them."""
135
+ prefix = f"knowledgebase/{board}/{cls}/{subject}"
136
+ log.info("Fetching dataset: %s", prefix)
137
+
138
+ config_path = hf_hub_download(
139
+ repo_id=HF_REPO_ID,
140
+ filename=f"{prefix}/config.json",
141
+ repo_type="dataset",
142
+ )
143
+ faiss_path = hf_hub_download(
144
+ repo_id=HF_REPO_ID,
145
+ filename=f"{prefix}/faiss_index.bin",
146
+ repo_type="dataset",
147
+ )
148
+ meta_path = hf_hub_download(
149
+ repo_id=HF_REPO_ID,
150
+ filename=f"{prefix}/metadata.parquet",
151
+ repo_type="dataset",
152
+ )
153
+
154
+ with open(config_path) as fh:
155
+ config = json.load(fh)
156
+
157
+ index = faiss.read_index(faiss_path)
158
+ metadata = pd.read_parquet(meta_path)
159
+ return config, index, metadata
160
+
161
+
162
+ def _rag_search(
163
+ query: str,
164
+ config: dict,
165
+ index,
166
+ metadata: pd.DataFrame,
167
+ k: int = 3,
168
+ ) -> List[str]:
169
+ """Embed query, search FAISS, return top-k text chunks."""
170
+ emb_model_name = config.get(
171
+ "embedding_model", "sentence-transformers/all-MiniLM-L6-v2"
172
+ )
173
+ emb = _get_emb_model(emb_model_name)
174
+ vec = emb.encode([query], normalize_embeddings=True).astype(np.float32)
175
+ _, idxs = index.search(vec, k)
176
+
177
+ text_cols = ["text", "content", "chunk", "passage", "answer", "description"]
178
+ chunks: List[str] = []
179
+ for i in idxs[0]:
180
+ if 0 <= i < len(metadata):
181
+ row = metadata.iloc[i]
182
+ for col in text_cols:
183
+ if col in metadata.columns and pd.notna(row[col]):
184
+ chunks.append(str(row[col])[:600])
185
+ break
186
+ return chunks
187
+
188
+
189
+ # ─────────────────────────────────────────────────────────────────────────────
190
+ # LLM inference β€” uses the preloaded CPU model; no per-call load/unload.
191
+ # ─────────────────────────────────────────────────────────────────────────────
192
+ def _model_generate(system_prompt: str, user_prompt: str) -> str:
193
+ import torch
194
+
195
+ messages = [
196
+ {"role": "system", "content": system_prompt},
197
+ {"role": "user", "content": user_prompt},
198
+ ]
199
+
200
+ text = _llm_tok.apply_chat_template(
201
+ messages,
202
+ tokenize=False,
203
+ add_generation_prompt=True,
204
+ )
205
+ inputs = _llm_tok([text], return_tensors="pt").to(_llm_model.device)
206
+
207
+ with torch.no_grad():
208
+ out_ids = _llm_model.generate(
209
+ **inputs,
210
+ max_new_tokens=150,
211
+ temperature=0.7,
212
+ top_p=0.9,
213
+ do_sample=True,
214
+ repetition_penalty=1.1,
215
+ pad_token_id=_llm_tok.eos_token_id,
216
+ )
217
+
218
+ new_tokens = out_ids[0][inputs.input_ids.shape[1]:]
219
+ result = _llm_tok.decode(new_tokens, skip_special_tokens=True).strip()
220
+
221
+ log.info("Inference complete. Output length: %d chars", len(result))
222
+ return result
223
+
224
+
225
+ # CPU inference β€” direct reference (no ZeroGPU decorator)
226
+ run_inference = _model_generate
227
+
228
+
229
+ # ──────────────────────────────────────────────────��──────────────────────────
230
+ # Text-to-Speech
231
+ # ─────────────────────────────────────────────────────────────────────────────
232
+ def _tts_to_b64(text: str) -> str:
233
+ try:
234
+ tts = gTTS(text=text[:3000], lang="en", tld="co.uk", slow=False)
235
+ buf = BytesIO()
236
+ tts.write_to_fp(buf)
237
+ buf.seek(0)
238
+ return base64.b64encode(buf.read()).decode("utf-8")
239
+ except Exception as exc:
240
+ log.error("TTS error: %s", exc)
241
+ return ""
242
+
243
+
244
+ # ─────────────────────────────────────────────────────────────────────────────
245
+ # Prompt builder β€” trimmed for 150-token output budget (Qwen3-0.6B, CPU)
246
+ # ─────────────────────────────────────────────────────────────────────────────
247
+ def _build_system_prompt(lp: dict, rag_chunks: List[str]) -> str:
248
+ persona = lp.get("teacher_persona", "A friendly and patient teacher")
249
+ student = lp.get("student_name", "Student")
250
+ chat_history = lp.get("chat_history", [])[-2:] # last 2 turns only
251
+ scratchpad = lp.get("scratchpad", [])[-1:] # last 1 entry only
252
+ current_learning = lp.get("assessment_stages", {}).get("current_learning", [])
253
+
254
+ history_block = "\n".join(
255
+ f'S: {h.get("user_input","")}\nT: {h.get("system_output","")}'
256
+ for h in chat_history
257
+ ) or "None."
258
+
259
+ scratch_block = "\n".join(
260
+ f'[{s.get("chat_id","")}] {s.get("thought","")} | {s.get("action","")}'
261
+ for s in scratchpad
262
+ ) or "Empty."
263
+
264
+ rag_block = "\n---\n".join(rag_chunks) if rag_chunks else "No relevant content found."
265
+ cl_block = json.dumps(current_learning, indent=2) if current_learning else "[]"
266
+
267
+ return f"""You are {persona} teaching {student}, aged 6–12. Use simple English. Be warm and brief.
268
+
269
+ STUDENT: {student}
270
+ LEARNING OBJECTIVES:
271
+ {cl_block}
272
+
273
+ KNOWLEDGE BASE:
274
+ {rag_block}
275
+
276
+ RECENT CHAT:
277
+ {history_block}
278
+
279
+ NOTES:
280
+ {scratch_block}
281
+
282
+ TASK: Classify intent, respond to the student, return ONLY valid JSON. Keep "response" under 60 words.
283
+
284
+ INTENT RULES:
285
+ "block" β€” rude/inappropriate message. First time: redirect kindly. Repeat: end gently.
286
+ "questions" β€” off-topic question. Answer briefly from KB if found, then redirect.
287
+ "curriculum" β€” on-topic. Follow stages in order: teach β†’ re_teach β†’ show_and_tell β†’ assess.
288
+ teach: explain using KB. re_teach: ask one check question; re-explain if wrong.
289
+ show_and_tell: ask a similar question. assess: pass=complete, fail=Not_Complete (retry).
290
+ "chitchat" β€” casual talk. Respond warmly, then bring up learning topic.
291
+
292
+ OUTPUT β€” return ONLY this JSON:
293
+ {{
294
+ "intent": "<block|questions|curriculum|chitchat>",
295
+ "response": "<reply to student, max 60 words>",
296
+ "stage_updates": [{{"topic":"<topic>","goal":"<goal>","teach":"<complete|Not_Complete>","re_teach":"<complete|Not_Complete>","show_and_tell":"<complete|Not_Complete>","assess":"<complete|Not_Complete>"}}],
297
+ "thought": "<brief internal reasoning>",
298
+ "action": "<teach|re_teach|show_and_tell|assess|answer|redirect|discourage|end|chitchat>",
299
+ "observation": "<brief student observation>"
300
+ }}\
301
+ """
302
+
303
+
304
+ # ─────────────────────────────────────────────────────────────────────────────
305
+ # JSON parser (robust β€” handles markdown fences, partial JSON, etc.)
306
+ # ─────────────────────────────────────────────────────────────────────────────
307
+ def _parse_llm_output(raw: str) -> dict:
308
+ text = raw.strip()
309
+
310
+ if "```" in text:
311
+ for part in text.split("```"):
312
+ part = part.strip()
313
+ if part.startswith("json"):
314
+ part = part[4:].strip()
315
+ try:
316
+ return json.loads(part)
317
+ except json.JSONDecodeError:
318
+ continue
319
+
320
+ try:
321
+ return json.loads(text)
322
+ except json.JSONDecodeError:
323
+ pass
324
+
325
+ start = text.find("{")
326
+ end = text.rfind("}") + 1
327
+ if start != -1 and end > start:
328
+ try:
329
+ return json.loads(text[start:end])
330
+ except json.JSONDecodeError:
331
+ pass
332
+
333
+ log.warning("Could not parse JSON from model output. Using raw text as response.")
334
+ return {
335
+ "intent": "questions",
336
+ "response": raw,
337
+ "stage_updates": [],
338
+ "thought": "",
339
+ "action": "answer",
340
+ "observation": "json_parse_failed",
341
+ }
342
+
343
+
344
+ # ─────────────────────────────────────────────────────────────────────────────
345
+ # State updater
346
+ # ─────────────────────────────────────────────────────────────────────────────
347
+ def _apply_state_updates(
348
+ lp: dict,
349
+ parsed: dict,
350
+ user_msg: str,
351
+ ai_msg: str,
352
+ ) -> dict:
353
+ lp = copy.deepcopy(lp)
354
+
355
+ history = lp.setdefault("chat_history", [])
356
+ new_id = (history[-1]["chat_id"] + 1) if history else 1
357
+ history.append({
358
+ "chat_id": new_id,
359
+ "user_input": user_msg,
360
+ "system_output": ai_msg,
361
+ })
362
+
363
+ scratch = lp.setdefault("scratchpad", [])
364
+ scratch.append({
365
+ "chat_id": new_id,
366
+ "thought": parsed.get("thought", ""),
367
+ "action": parsed.get("action", ""),
368
+ "action_input": user_msg,
369
+ "observation": parsed.get("observation", ""),
370
+ })
371
+
372
+ current_learning = lp.get("assessment_stages", {}).get("current_learning", [])
373
+ valid_statuses = {"complete", "Not_Complete"}
374
+
375
+ for upd in parsed.get("stage_updates", []):
376
+ for item in current_learning:
377
+ if item.get("topic") == upd.get("topic"):
378
+ for obj in item.get("learning_objectives", []):
379
+ if obj.get("goal") == upd.get("goal"):
380
+ for stage in ("teach", "re_teach", "show_and_tell", "assess"):
381
+ val = upd.get(stage)
382
+ if val in valid_statuses:
383
+ obj[stage] = val
384
+
385
+ lp.setdefault("assessment_stages", {})["current_learning"] = current_learning
386
+ return lp
387
+
388
+
389
+ # ─────────────────────────────────────────────────────────────────────────────
390
+ # FastAPI application
391
+ # ─────────────────────────────────────────────────────────────────────────────
392
+ _fastapi = FastAPI(
393
+ title="Maria Learning Service",
394
+ description="AI tutoring API powered by Qwen3-0.6B on CPU.",
395
+ version="1.1.0",
396
+ docs_url="/docs",
397
+ redoc_url="/redoc",
398
+ )
399
+
400
+
401
+ class ChatRequest(BaseModel):
402
+ learning_path: dict[str, Any]
403
+ query: dict[str, Any]
404
+
405
+
406
+ class DatasetRequest(BaseModel):
407
+ board: str
408
+ subject: str
409
+ # Pydantic alias so "class" (reserved word) maps to cls_name internally
410
+ class_name: str = ""
411
+
412
+ class Config:
413
+ # Allow the JSON field "class" to populate class_name via alias
414
+ populate_by_name = True
415
+
416
+ @classmethod
417
+ def model_validate_with_class(cls, data: dict):
418
+ data = dict(data)
419
+ if "class" in data:
420
+ data["class_name"] = data.pop("class")
421
+ return cls(**data)
422
+
423
+
424
+ @_fastapi.get("/health", tags=["Utility"])
425
+ async def health():
426
+ return {"status": "ok", "model": LLM_MODEL_ID}
427
+
428
+ @_fastapi.get("/ping", tags=["Utility"])
429
+ async def ping(request: Request):
430
+ """Health-check endpoint – wakes the Space if sleeping."""
431
+ if not await _authenticate(request):
432
+ raise HTTPException(status_code=403, detail="Forbidden")
433
+ return JSONResponse(content={"status": "alive"})
434
+
435
+
436
+ # ─────────────────────────────────────────────────────────────────────────────
437
+ # Change 3: /dataset endpoint
438
+ # ─────────────────────────────────────────────────────────────────────────────
439
+ @_fastapi.post("/dataset", tags=["Dataset"])
440
+ async def dataset(request: Request):
441
+ """
442
+ Pre-load the FAISS index, config, and metadata for a given board/class/subject.
443
+ Must be called before /chat. Subsequent calls with the same key are no-ops (cached).
444
+
445
+ Request body:
446
+ { "board": "NCERT", "class": "Class 1", "subject": "English" }
447
+
448
+ Response:
449
+ { "status": "ready", "message": "Dataset Loaded" }
450
+ """
451
+ # ── Authentication ──────────────────────────────────────────────────────
452
+ if not await _authenticate(request):
453
+ raise HTTPException(status_code=403, detail="Forbidden")
454
+
455
+ # ── Parse body manually to handle "class" reserved keyword ─────────────
456
+ try:
457
+ body = await request.json()
458
+ except Exception:
459
+ raise HTTPException(status_code=422, detail="Invalid JSON body")
460
+
461
+ board = str(body.get("board", "")).strip()
462
+ cls = str(body.get("class", "")).strip()
463
+ subject = str(body.get("subject", "")).strip()
464
+
465
+ if not all([board, cls, subject]):
466
+ raise HTTPException(
467
+ status_code=422,
468
+ detail="Request body must contain board, class, and subject",
469
+ )
470
+
471
+ key = _dataset_key(board, cls, subject)
472
+
473
+ # ── Return immediately if already cached ────────────────────────────────
474
+ if key in _dataset_cache:
475
+ log.info("Dataset cache hit: %s", key)
476
+ return JSONResponse({"status": "ready", "message": "Dataset Loaded"})
477
+
478
+ # ── Load and cache β€” run blocking HF I/O in a thread pool so the event
479
+ # loop is not frozen, but we still await completion before responding. ──
480
+ try:
481
+ config, faiss_index, metadata = await asyncio.to_thread(
482
+ _load_dataset, board, cls, subject
483
+ )
484
+ _dataset_cache[key] = (config, faiss_index, metadata)
485
+ log.info("Dataset cached for key: %s", key)
486
+ except Exception as exc:
487
+ log.error("Dataset load error: %s", exc)
488
+ raise HTTPException(
489
+ status_code=500,
490
+ detail=f"Could not load dataset for {board}/{cls}/{subject}: {exc}",
491
+ )
492
+
493
+ return JSONResponse({"status": "ready", "message": "Dataset Loaded"})
494
+
495
+
496
+ # ─────────────────────────────────────────────────────────────────────────────
497
+ # /chat endpoint β€” Change 4: uses dataset preloaded via /dataset
498
+ # ─────────────────────────────────────────────────────────────────────────────
499
+ @_fastapi.post("/chat", tags=["Tutor"])
500
+ async def chat(request: Request, body: ChatRequest):
501
+ # ── 1. Authentication ───────────────────────────────────────────────────
502
+ if not await _authenticate(request):
503
+ raise HTTPException(status_code=403, detail="Forbidden")
504
+
505
+ # ── 2. Validate request body ────────────────────────────────────────────
506
+ lp = body.learning_path
507
+ msg = body.query.get("request_message", "").strip()
508
+ if not msg:
509
+ raise HTTPException(status_code=422, detail="request_message must not be empty")
510
+
511
+ board = lp.get("board", "").strip()
512
+ cls = lp.get("class", "").strip()
513
+ subject = lp.get("subject", "").strip()
514
+
515
+ if not all([board, cls, subject]):
516
+ raise HTTPException(
517
+ status_code=422,
518
+ detail="learning_path must contain board, class, and subject",
519
+ )
520
+
521
+ # ── 3. Change 4: Retrieve dataset from cache (must call /dataset first) ─
522
+ key = _dataset_key(board, cls, subject)
523
+ if key not in _dataset_cache:
524
+ raise HTTPException(
525
+ status_code=412,
526
+ detail=(
527
+ f"Dataset for {board}/{cls}/{subject} is not loaded. "
528
+ "Please call POST /dataset first."
529
+ ),
530
+ )
531
+ config, faiss_index, metadata = _dataset_cache[key]
532
+
533
+ # ── 4. RAG retrieval ────────────────────────────────────────────────────
534
+ try:
535
+ rag_chunks = _rag_search(msg, config, faiss_index, metadata)
536
+ except Exception as exc:
537
+ log.warning("RAG search failed (%s) β€” continuing without context", exc)
538
+ rag_chunks = []
539
+
540
+ # ── 5. Build prompt and run LLM (Change 2: only CPUβ†’GPU move happens here)
541
+ system_prompt = _build_system_prompt(lp, rag_chunks)
542
+ user_prompt = f"Student: {msg}"
543
+
544
+ try:
545
+ raw_output = run_inference(system_prompt, user_prompt)
546
+ except Exception as exc:
547
+ log.error("Inference error: %s", exc)
548
+ raise HTTPException(status_code=500, detail=f"Inference failed: {exc}")
549
+
550
+ # ── 6. Parse structured output ──────────────────────────────────────────
551
+ parsed = _parse_llm_output(raw_output)
552
+ ai_text = parsed.get("response", raw_output).strip()
553
+
554
+ # ── 7. Text-to-speech ───────────────────────────────────────────────────
555
+ audio_b64 = _tts_to_b64(ai_text)
556
+
557
+ # ── 8. Update learning path state ───────────────────────────────────��───
558
+ updated_lp = _apply_state_updates(lp, parsed, msg, ai_text)
559
+
560
+ # ── 9. Return response ──────────────────────────────────────────────────
561
+ return JSONResponse({
562
+ "learning_path": updated_lp,
563
+ "query": {
564
+ "response_message": {
565
+ "text": ai_text,
566
+ "visual": "No",
567
+ "visual_content": "",
568
+ "audio_output": audio_b64,
569
+ }
570
+ },
571
+ })
572
+
573
+
574
+ # ─────────────────────────────────────────────────────────────────────────────
575
+ # Gradio shim
576
+ # ─────────────────────────────────────────────────────────────────────────────
577
+ with gr.Blocks(title="Maria Learning Service") as _gradio_ui:
578
+ gr.Markdown(
579
+ """
580
+ ## Maria Learning Service
581
+ This Space exposes a **REST API** β€” it is not a chat UI.
582
+
583
+ | Endpoint | Method | Description |
584
+ |-----------|--------|------------------------------------|
585
+ | `/dataset`| POST | Pre-load dataset (call before chat)|
586
+ | `/chat` | POST | Main tutoring endpoint |
587
+ | `/health` | GET | Health check |
588
+ | `/docs` | GET | Swagger UI |
589
+
590
+ Authenticate via `auth_code` header or `cf-turnstile-token` header.
591
+ """
592
+ )
593
+
594
+ # Mount Gradio UI at /ui β€” keeps FastAPI routes at root level
595
+ app = gr.mount_gradio_app(_fastapi, _gradio_ui, path="/ui")
596
+
597
+
598
+ # ─────────────────────────────────────────────────────────────────────────────
599
+ # Entry point
600
+ # ─────────────────────────────────────────────────────────────────────────────
601
+ if __name__ == "__main__":
602
+ import uvicorn
603
+ uvicorn.run(
604
+ "app:app",
605
+ host="0.0.0.0",
606
+ port=7860,
607
+ log_level="info",
608
+ workers=1, # Single worker β€” shared in-memory model object
609
+ )
requirements.txt ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ── Web framework ─────────────────────────────────────────────────────────────
2
+ fastapi==0.115.6
3
+ uvicorn[standard]==0.32.1
4
+ pydantic==2.10.3
5
+ python-multipart==0.0.20
6
+ httpx==0.28.1
7
+
8
+ # ── HuggingFace ecosystem ─────────────────────────────────────────────────────
9
+ huggingface-hub>=0.27.0
10
+ transformers>=4.50.0
11
+ tokenizers>=0.21.0
12
+ safetensors>=0.5.0
13
+ accelerate>=1.3.0
14
+
15
+ # ── Embeddings ────────────────────────────────────────────────────────────────
16
+ sentence-transformers>=3.3.0
17
+
18
+ # ── Vector search ─────────────────────────────────────────────────────────────
19
+ faiss-cpu>=1.9.0
20
+
21
+ # ── Data ──────────────────────────────────────────────────────────────────────
22
+ pandas>=2.2.0
23
+ pyarrow>=14.0.0
24
+ numpy>=1.26.0,<2.0.0
25
+
26
+ # ── Audio ─────────────────────────────────────────────────────────────────────
27
+ gTTS>=2.5.0