Spaces:

lablab-ai-amd-developer-hackathon
/

pathshala-ai

Running

App Files Files Community

prasai-ap commited on 3 days ago

Commit

c46c77f

verified ·

1 Parent(s): 7742d2e

Upload 3 files

Browse files

Files changed (2) hide show

README.md +8 -85
app.py +211 -593

README.md CHANGED Viewed

@@ -13,90 +13,13 @@ pinned: false
 Pathshala AI is a bilingual AI tutor demo for rural primary students in Nepal.
-The Gradio Space mirrors the local Streamlit/web app flow. It can upload a text-based
-PDF directly inside Hugging Face Spaces, accept a student question in English, Nepali,
-or romanized Nepali, retrieve relevant textbook portions, then returns:
-- English explanation
-- Nepali explanation
-- 3 simple quiz questions
-- Retrieved textbook sources
-- Basic quiz grading in Space-local mode
-- Parent/teacher summary note in Space-local mode
-## Deploy To Hugging Face Spaces
-1. Create a new Hugging Face Space.
-2. Choose `Gradio` as the SDK.
-3. Upload the files from this `hf_space/` folder into the root of the Space:
-   - `app.py`
-   - `requirements.txt`
-   - `README.md`
-4. Commit the files. Hugging Face will build and run the Space automatically.
-You can also deploy with Git:
-```bash
-git clone https://huggingface.co/spaces/YOUR_USERNAME/pathshala-ai
-cp hf_space/app.py pathshala-ai/app.py
-cp hf_space/requirements.txt pathshala-ai/requirements.txt
-cp hf_space/README.md pathshala-ai/README.md
-cd pathshala-ai
-git add .
-git commit -m "Deploy Pathshala AI Gradio demo"
-git push
-```
-## Recommended Submission Mode
-For the easiest hackathon submission, deploy the Space without `BACKEND_URL`.
-It will run a Space-local workflow:
-1. Upload a text-based PDF.
-2. Extract text with PyMuPDF.
-3. Create embeddings with `sentence-transformers`.
-4. Search the uploaded book in memory.
-5. Show Nepali quiz questions and retrieved textbook portions.
-For the full RAG workflow, first deploy the FastAPI backend somewhere public, then set `BACKEND_URL` in the Space settings.
-## Backend Mode
-Set `BACKEND_URL` to use the FastAPI backend:
-```bash
-BACKEND_URL=https://your-backend.example.com
-```
-In Hugging Face Spaces, add it under:
-```text
-Space settings -> Variables and secrets -> New variable
-```
-The app calls:
-- `POST /upload-textbook` for PDF uploads
-- `POST /ask` for bilingual textbook-grounded answers
-- `POST /grade-quiz` for quiz grading
-- `GET /parent-summary/{student_id}` for the parent/teacher summary
-The `/ask` request sends both the student question and the optional textbook context.
-If a user types context in the Space, the backend can answer from that context even when no PDF has been uploaded.
-If the backend returns `normalized_question`, the Space shows the interpreted question above the English explanation.
-## Mock Mode
-If `BACKEND_URL` is missing or the backend is unavailable, the Space uses local PDF extraction and in-memory retrieval. This supports text-based PDFs. For scanned PDFs or persistent student progress, deploy the backend and set `BACKEND_URL`.
-Example question:
-```text
-soil erosion vaneko ke ho
-```
-You can also try mixed romanized Nepali questions such as:
-```text
-photosynthesis vaneko ke ho vana
-```

 Pathshala AI is a bilingual AI tutor demo for rural primary students in Nepal.
+This Hugging Face Space supports:
+- Uploading a text-based PDF textbook directly in the Space
+- Asking questions in English, Nepali, or romanized Nepali
+- Retrieving relevant textbook portions from the uploaded PDF
+- Showing a simple English answer and Nepali explanation
+- Generating Nepali quiz questions
+- Basic quiz grading
+For scanned PDF OCR and persistent progress, deploy the FastAPI backend separately and add a Space variable named `BACKEND_URL`.

app.py CHANGED Viewed

@@ -1,6 +1,5 @@
 import json
 import os
-from typing import Any
 from functools import lru_cache
 from dotenv import load_dotenv
@@ -13,62 +12,31 @@ load_dotenv()
 APP_NAME = os.getenv("APP_NAME", "Pathshala AI")
 BACKEND_URL = os.getenv("BACKEND_URL", "").rstrip("/")
-UPLOAD_TIMEOUT_SECONDS = 900
-ASK_TIMEOUT_SECONDS = 180
-SHORT_TIMEOUT_SECONDS = 45
-EXAMPLE_QUESTION = "soil erosion vaneko ke ho"
-EXAMPLE_CONTEXT = (
-    "Soil erosion is the removal of topsoil by wind, water, or other natural forces. "
-    "It can make farmland less fertile and can be reduced by planting trees and grass."
-)
-MIN_CHUNK_CHARS = 250
-MAX_CHUNK_CHARS = 900
 EMBEDDING_MODEL = os.getenv(
     "EMBEDDING_MODEL",
     "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
 )
 def upload_textbook(pdf_path):
     if not pdf_path:
         return "Choose a PDF first.", "{}", gr.update()
-    if not BACKEND_URL:
-        return upload_textbook_locally(pdf_path)
-    try:
-        with open(pdf_path, "rb") as pdf_file:
-            response = requests.post(
-                f"{BACKEND_URL}/upload-textbook",
-                files={"file": (os.path.basename(pdf_path), pdf_file, "application/pdf")},
-                timeout=UPLOAD_TIMEOUT_SECONDS,
-            )
-        if response.ok:
-            result = response.json()
-            extraction_method = result.get("extraction_method")
-            method_text = f" Text extraction: {extraction_method}." if extraction_method else ""
-            return (
-                f"Uploaded {result['filename']} with {result['page_count']} pages "
-                f"and {result['chunk_count']} chunks.{method_text}",
-                "{}",
-                gr.update(value=""),
-            )
-        return _response_error(response, "Upload failed."), "{}", gr.update()
-    except requests.Timeout:
-        return "Backend is still processing the PDF. Try a smaller PDF for the demo.", "{}", gr.update()
-    except requests.RequestException as exc:
-        return f"Could not reach backend: {exc}", "{}", gr.update()
-    except OSError as exc:
-        return f"Could not read uploaded PDF: {exc}", "{}", gr.update()
-def upload_textbook_locally(pdf_path):
     try:
         extracted = extract_pdf_text(pdf_path)
         chunks = chunk_text(extracted["text"])
         if not chunks:
             return "No readable text chunks could be created from this PDF.", "{}", gr.update()
@@ -77,38 +45,48 @@ def upload_textbook_locally(pdf_path):
             "filename": os.path.basename(pdf_path),
             "page_count": extracted["page_count"],
             "chunk_count": len(chunks),
-            "extraction_method": extracted["extraction_method"],
             "chunks": chunks,
             "embeddings": embeddings.tolist(),
         }
-        return (
-            (
-                f"Uploaded {state['filename']} inside this Space with "
-                f"{state['page_count']} pages and {state['chunk_count']} chunks. "
-                f"Text extraction: {state['extraction_method']}."
-            ),
-            encode_state(state),
-            gr.update(value=""),
         )
     except Exception as exc:
-        return f"Could not process uploaded PDF in this Space: {exc}", "{}", gr.update()
-def ask_tutor(
-    question,
-    student_id,
-    textbook_context,
-    textbook_state,
-):
-    question = question.strip()
     student_id = (student_id or "hf-space-demo").strip()
-    textbook_context = textbook_context.strip()
     if not question:
         return (
             "Please type a student question.",
             "कृपया विद्यार्थीको प्रश्न लेख्नुहोस्।",
-            "1. Add a question first.\n2. Then try again.\n3. Use a textbook topic.",
             "",
             "Waiting for a question.",
             "{}",
@@ -116,259 +94,177 @@ def ask_tutor(
     if BACKEND_URL:
         backend_result = ask_backend(question, student_id, textbook_context)
-        if backend_result and not is_insufficient_backend_result(backend_result):
             return backend_result
-    return local_response(
-        question=question,
-        student_id=student_id,
-        textbook_context=textbook_context,
-        textbook_state=decode_state(textbook_state),
     )
-def ask_backend(
-    question: str,
-    student_id: str,
-    textbook_context: str,
-) -> tuple[str, str, str, str, str, dict[str, Any]] | None:
-    payload: dict[str, Any] = {
         "question": question,
         "student_id": student_id,
         "language_support": "English and Nepali",
     }
     if textbook_context:
         payload["textbook_context"] = textbook_context
     try:
-        response = requests.post(
-            f"{BACKEND_URL}/ask",
-            json=payload,
-            timeout=ASK_TIMEOUT_SECONDS,
-        )
-        response.raise_for_status()
         data = response.json()
-    except requests.RequestException:
-        return None
-    except ValueError:
         return None
-    return format_backend_response(data, student_id=student_id)
-def format_backend_response(
-    data: dict[str, Any],
-    student_id: str,
-) -> tuple[str, str, str, str, str, dict[str, Any]]:
-    english_answer = str(data.get("answer_english", "No English answer returned."))
-    normalized_question = str(data.get("normalized_question") or "").strip()
-    if normalized_question:
-        english_answer = f"Interpreted question: {normalized_question}\n\n{english_answer}"
     quiz_questions = data.get("quiz_questions", [])
-    state = {
         "quiz_id": data.get("quiz_id"),
         "quiz_questions": quiz_questions,
         "student_id": student_id,
     }
     return (
-        english_answer,
         str(data.get("answer_nepali", "नेपाली उत्तर प्राप्त भएन।")),
         format_quiz(quiz_questions),
         format_sources(data.get("retrieved_sources", [])),
         "Answered with the backend RAG workflow.",
-        encode_state(state),
     )
-def grade_quiz(
-    answer_1,
-    answer_2,
-    answer_3,
-    student_id,
-    quiz_state,
-):
-    quiz_state = decode_state(quiz_state)
-    quiz_id = quiz_state.get("quiz_id")
-    if not BACKEND_URL:
-        return grade_quiz_locally([answer_1, answer_2, answer_3], quiz_state)
-    if not quiz_id:
-        return "Ask the tutor first so a quiz can be created."
-    try:
-        response = requests.post(
-            f"{BACKEND_URL}/grade-quiz",
-            json={
-                "student_id": (student_id or "hf-space-demo").strip(),
-                "quiz_id": quiz_id,
-                "answers": [answer_1, answer_2, answer_3],
-            },
-            timeout=SHORT_TIMEOUT_SECONDS,
-        )
-        if not response.ok:
-            return _response_error(response, "Quiz grading failed.")
-        return format_grade(response.json())
-    except requests.Timeout:
-        return "Quiz grading timed out. Please try again."
-    except requests.RequestException as exc:
-        return f"Could not reach backend: {exc}"
-    except ValueError:
-        return "Quiz grading returned an invalid response."
-def grade_quiz_locally(answers: list[str], quiz_state: dict[str, Any]) -> str:
-    questions = quiz_state.get("quiz_questions", [])
-    expected_answers = quiz_state.get("expected_answers", [])
     if not questions:
         return "Ask the tutor first so a quiz can be created."
     score = 0
     lines = []
     for index, question in enumerate(questions[:3]):
-        student_answer = answers[index].strip() if index < len(answers) else ""
-        expected_answer = str(expected_answers[index] if index < len(expected_answers) else "")
-        is_correct = is_answer_close(student_answer, expected_answer)
-        if is_correct:
-            score += 1
-        status = "Correct" if is_correct else "Needs practice"
-        lines.append(f"{status}: {question}")
-        if not is_correct and expected_answer:
-            lines.append(f"Expected idea: {expected_answer}")
     return f"Score: {score} / {min(len(questions), 3)}\n" + "\n".join(lines)
-def is_answer_close(student_answer: str, expected_answer: str) -> bool:
-    student_tokens = set(normalize_answer(student_answer).split())
-    expected_tokens = set(normalize_answer(expected_answer).split())
-    if not student_tokens or not expected_tokens:
-        return False
-    overlap = len(student_tokens & expected_tokens) / max(len(expected_tokens), 1)
-    return overlap >= 0.35 or normalize_answer(student_answer) in normalize_answer(expected_answer)
-def normalize_answer(answer: str) -> str:
-    return " ".join(
-        word.strip(".,?!:;()[]{}\"'।").lower()
-        for word in answer.split()
-        if word.strip(".,?!:;()[]{}\"'।")
-    )
 def parent_summary(student_id):
     if not BACKEND_URL:
         return (
             "Parent/teacher summary\n\n"
-            "The student has practiced with the uploaded or pasted textbook context in this Space. "
-            "For persistent progress across sessions, deploy the FastAPI backend and set BACKEND_URL."
         )
-    student_id = (student_id or "hf-space-demo").strip()
     try:
         response = requests.get(
-            f"{BACKEND_URL}/parent-summary/{student_id}",
-            timeout=SHORT_TIMEOUT_SECONDS,
         )
         if not response.ok:
-            return _response_error(response, "Summary failed.")
-        summary = response.json()
-    except requests.Timeout:
-        return "Summary request timed out. Please try again."
-    except requests.RequestException as exc:
-        return f"Could not reach backend: {exc}"
-    except ValueError:
-        return "Summary returned an invalid response."
-    strengths = "\n".join(f"- {item}" for item in summary.get("strengths", []))
-    weak_topics = summary.get("weak_topics", [])
-    weak_topic_text = "\n".join(f"- {item}" for item in weak_topics) if weak_topics else "No weak topics recorded yet."
     return (
         f"Strengths\n{strengths}\n\n"
-        f"Weak topics\n{weak_topic_text}\n\n"
-        f"Suggested next practice\n{summary.get('suggested_next_practice', '')}\n\n"
-        f"Encouraging note\n{summary.get('encouraging_note', '')}\n\n"
-        f"Questions asked: {summary.get('questions_asked', 0)}"
     )
-def is_insufficient_backend_result(result: tuple[str, str, str, str, str, dict[str, Any]]) -> bool:
-    combined = " ".join(str(item) for item in result[:5]).lower()
-    markers = [
-        "not have enough textbook context",
-        "not enough textbook context",
-        "insufficient context",
-        "पर्याप्त जानकारी छैन",
-        "पर्याप्त सन्दर्भ छैन",
-    ]
-    return any(marker in combined for marker in markers)
-def extract_pdf_text(pdf_path: str) -> dict[str, Any]:
     import fitz
     page_texts = []
     with fitz.open(pdf_path) as document:
         for page in document:
             text = page.get_text("text").strip()
             if text:
                 page_texts.append(text)
-        page_count = document.page_count
     text = "\n\n".join(page_texts).strip()
     if not text:
         raise ValueError(
-            "No selectable text was found. For scanned PDFs, deploy with a backend "
-            "or paste a short textbook paragraph into the context box."
         )
-    return {
-        "text": text,
-        "page_count": page_count,
-        "extraction_method": "pymupdf-local",
-    }
-def chunk_text(text: str) -> list[str]:
     paragraphs = [part.strip() for part in text.splitlines() if part.strip()]
     chunks = []
     current = ""
     for paragraph in paragraphs:
         if len(current) + len(paragraph) + 2 <= MAX_CHUNK_CHARS:
             current = f"{current}\n{paragraph}".strip()
-            continue
-        if len(current) >= MIN_CHUNK_CHARS:
             chunks.append(current)
             current = paragraph
         else:
             current = f"{current}\n{paragraph}".strip()
     if current:
         chunks.append(current)
     return chunks or ([text.strip()] if text.strip() else [])
@@ -379,7 +275,7 @@ def get_embedding_model():
     return SentenceTransformer(EMBEDDING_MODEL)
-def embed_texts(texts: list[str]) -> np.ndarray:
     model = get_embedding_model()
     return np.asarray(
         model.encode(
@@ -391,27 +287,21 @@ def embed_texts(texts: list[str]) -> np.ndarray:
     )
-def retrieve_local_sources(
-    question: str,
-    textbook_state: dict[str, Any],
-    limit: int = 5,
-) -> list[dict[str, Any]]:
-    chunks = [str(chunk) for chunk in textbook_state.get("chunks", [])]
-    embeddings = np.asarray(textbook_state.get("embeddings", []), dtype=float)
     if not chunks or embeddings.size == 0:
         return []
     query_embedding = embed_texts([question])[0]
     scores = embeddings @ query_embedding
     top_indices = np.argsort(scores)[::-1][:limit]
     return [
         {
             "score": float(scores[index]),
             "text": chunks[index],
             "metadata": {
-                "filename": textbook_state.get("filename", "uploaded-textbook"),
                 "chunk_index": int(index),
             },
         }
@@ -419,179 +309,52 @@ def retrieve_local_sources(
     ]
-def mock_response(question: str, textbook_context: str) -> tuple[str, str, str, str, str, dict[str, Any]]:
-    context = textbook_context or EXAMPLE_CONTEXT
-    normalized_question = normalize_question_mock(question)
-    concept_answer = mock_english_explanation(normalized_question, context)
-    english = f"Interpreted question: {normalized_question}\n\n{concept_answer}"
-    nepali = mock_nepali_explanation(normalized_question, context)
-    quiz_questions = mock_quiz_questions(normalized_question)
-    return (
-        english,
-        nepali,
-        format_quiz(quiz_questions),
-        format_sources(
-            [
-                {
-                    "score": 1.0,
-                    "text": context,
-                    "metadata": {"filename": "demo-context", "chunk_index": 0},
-                }
-            ]
-        ),
-        "Demo fallback is active. Configure BACKEND_URL in Space settings for PDF upload, RAG search, quiz grading, and parent summary.",
-        encode_state({"quiz_questions": quiz_questions}),
-    )
-def local_response(
-    question: str,
-    student_id: str,
-    textbook_context: str,
-    textbook_state: dict[str, Any],
-) -> tuple[str, str, str, str, str, dict[str, Any]]:
-    normalized_question = normalize_question_mock(question)
-    sources = []
-    if textbook_context.strip():
-        sources = [
-            {
-                "score": 1.0,
-                "text": chunk,
-                "metadata": {"filename": "pasted-context", "chunk_index": index},
-            }
-            for index, chunk in enumerate(chunk_text(textbook_context)[:5])
-        ]
-    elif textbook_state.get("chunks") and textbook_state.get("embeddings"):
-        sources = retrieve_local_sources(normalized_question, textbook_state, limit=5)
-    context = "\n\n".join(str(source.get("text", "")) for source in sources).strip()
-    if not context:
-        return mock_response(question=question, textbook_context=textbook_context)
-    english = (
-        f"Interpreted question: {normalized_question}\n\n"
-        f"Answer from the uploaded textbook context:\n{truncate(context, max_length=700)}"
-    )
-    nepali = local_nepali_answer(normalized_question, context)
-    quiz_questions = local_nepali_quiz_questions(context)
-    quiz_state = {
-        "student_id": student_id,
-        "quiz_questions": quiz_questions,
-        "expected_answers": [source_answer(sources)] * 3,
-    }
-    return (
-        english,
-        nepali,
-        format_quiz(quiz_questions),
-        format_sources(sources),
-        "Answered with the Hugging Face Space local PDF workflow.",
-        encode_state(quiz_state),
-    )
-def mock_english_explanation(normalized_question: str, context: str) -> str:
-    text = f"{normalized_question} {context}".lower()
-    if "reflection" in text or "mirror" in text:
-        return (
-            "Reflection of light means light bounces back after hitting a surface. "
-            "A mirror reflects light in an orderly way, so we can see a clear image "
-            "of an object in it. Smooth, flat surfaces make clearer reflections, "
-            "while rough surfaces scatter light and do not show a clear image."
-        )
-    if "soil erosion" in text:
-        return (
-            "Soil erosion means the top fertile layer of soil is carried away by "
-            "water, wind, or other causes. It makes land less useful for growing "
-            "plants, so planting trees and grass helps protect the soil."
-        )
-    if "photosynthesis" in text:
-        return (
-            "Photosynthesis is the process by which green plants make their own food "
-            "using sunlight, water, and carbon dioxide. Chlorophyll in leaves helps "
-            "plants capture sunlight, and oxygen is released during the process."
-        )
-    if "fraction" in text:
-        return (
-            "A fraction shows a part of a whole. The top number tells how many parts "
-            "we have, and the bottom number tells how many equal parts the whole was "
-            "divided into."
-        )
-    return (
-        "Demo answer from the pasted textbook context: "
-        f"{truncate(context, max_length=450)}"
-    )
-def mock_nepali_explanation(normalized_question: str, context: str = "") -> str:
-    text = f"{normalized_question} {context}".lower()
-    if "reflection" in text or "mirror" in text:
-        return (
-            "प्रकाशको परावर्तन भनेको प्रकाश कुनै सतहमा ठोक्किएर फर्कनु हो। ऐनाले "
-            "प्रकाशलाई राम्रोसँग फर्काउँछ, त्यसैले त्यसमा वस्तुको प्रतिबिम्ब देखिन्छ। "
-            "समथर र चिल्लो सतहमा प्रतिबिम्ब प्रस्ट देखिन्छ, तर खस्रो सतहमा प्रकाश धेरै "
-            "दिशामा छरिने भएकाले प्रतिबिम्ब प्रस्ट देखिँदैन।"
-        )
-    if "soil erosion" in text:
         return (
-            "माटो कटान भनेको हावा, पानी वा अरू प्राकृतिक कारणले माटोको माथिल्लो "
-            "मलिलो भाग हट्नु हो। यसले खेतको माटो कमजोर बनाउन सक्छ। रूख र घाँस "
-            "लगाउँदा माटो जोगाउन मद्दत हुन्छ।"
         )
-    if "photosynthesis" in text:
         return (
             "प्रकाश संश्लेषण भनेको हरिया बिरुवाले घामको प्रकाश, पानी र कार्बन "
-            "डाइअक्साइड प्रयोग गरेर आफ्नो खाना बनाउने प्रक्रिया हो। यस क्रममा "
-            "अक्सिजन पनि निस्कन्छ।"
-        )
-    if "fraction" in text:
-        return (
-            "भिन्न भनेको कुनै पूर्ण वस्तुको भाग देखाउने संख्या हो। जस्तै, एउटा "
-            "रोटी बराबर भागमा काट्दा एक भागलाई भिन्नबाट देखाउन सकिन्छ।"
-        )
-    if "oxygen" in text:
-        return (
-            "अक्सिजन एउटा ग्यास हो। मानिस, जनावर र धेरै जीवहरूले सास फेर्दा "
-            "अक्सिजन प्रयोग गर्छन्। यो जीवनका लागि महत्त्वपूर्ण हुन्छ।"
         )
-    return "यो विषयलाई सरल रूपमा बुझ्न पाठ्यपुस्तकको सन्दर्भ पढेर मुख्य कुरा सम्झनुहोस्।"
-def local_nepali_answer(normalized_question: str, context: str) -> str:
-    known_answer = mock_nepali_explanation(normalized_question, context)
-    if known_answer != "यो विषयलाई सरल रूपमा बुझ्न पाठ्यपुस्तकको सन्दर्भ पढेर मुख्य कुरा सम्झनुहोस्।":
-        return known_answer
     if has_devanagari(context):
-        return (
-            "अपलोड गरिएको पाठ्यपुस्तकको सन्दर्भअनुसार मुख्य कुरा यस्तो छ:\n\n"
-            f"{truncate(context, max_length=700)}"
-        )
     return (
         "अपलोड गरिएको पाठ्यपुस्तकको सन्दर्भअनुसार यो विषय महत्त्वपूर्ण छ। "
-        "मुख्य शब्दहरू पढ्नुहोस्, उदाहरणसँग जोड्नुहोस्, र आफ्नै सरल शब्दमा उत्तर लेख्ने अभ्यास गर्नुहोस्।"
     )
-def local_nepali_quiz_questions(context: str) -> list[str]:
-    short_context = truncate(first_sentence(context), max_length=140)
     return [
         "प्राप्त पाठ्यपुस्तक सन्दर्भको मुख्य कुरा के हो?",
         f"यो वाक्यले के बुझाउँछ: {short_context}",
@@ -599,256 +362,144 @@ def local_nepali_quiz_questions(context: str) -> list[str]:
     ]
-def source_answer(sources: list[dict[str, Any]]) -> str:
     if not sources:
         return "पाठ्यपुस्तकको मुख्य कुरा।"
     text = str(sources[0].get("text", "")).strip()
-    return truncate(first_sentence(text) or text, max_length=220)
-def first_sentence(text: str) -> str:
     for separator in ["।", ".", "?", "!"]:
         if separator in text:
             return text.split(separator, 1)[0].strip() + separator
     return text.strip()
-def has_devanagari(text: str) -> bool:
     return any("\u0900" <= character <= "\u097f" for character in text)
-def normalize_question_mock(question: str) -> str:
-    text = question.lower()
-    if "soil erosion" in text or ("mato" in text and "katan" in text):
-        return "What is soil erosion?"
-    if "reflection" in text or "mirror" in text or "ainaa" in text or "aaina" in text:
-        return "What is reflection of light?"
-    if "photosynthesis" in text or ("prakash" in text and "sansleshan" in text):
-        return "What is photosynthesis?"
-    if "fraction" in text or "bhinn" in text:
-        return "What is a fraction?"
-    if "oxygen" in text or "aksijan" in text:
-        return "What is oxygen?"
-    mixed_topic = extract_mixed_language_topic(text)
-    if mixed_topic:
-        return f"What is {mixed_topic}?"
-    return question
-def extract_mixed_language_topic(text: str) -> str:
-    markers = [
-        " vaneko ",
-        " bhaneko ",
-        " vanya ",
-        " bhanya ",
-        " vanne ",
-        " bhanne ",
-    ]
-    if not any(marker in f" {text} " for marker in markers):
-        return ""
-    topic = f" {text} "
-    removable_phrases = [
-        " vaneko ",
-        " bhaneko ",
-        " vanya ",
-        " bhanya ",
-        " vanne ",
-        " bhanne ",
-        " ke ho ",
-        " k ho ",
-        " kya ho ",
-        " vana ",
-        " bhana ",
-        " ho ",
-        " ? ",
-    ]
-    for phrase in removable_phrases:
-        topic = topic.replace(phrase, " ")
-    topic = " ".join(topic.split()).strip(" ?.,")
-    if not topic or len(topic) > 80:
-        return ""
-    return topic
-def mock_quiz_questions(normalized_question: str) -> list[str]:
-    text = normalized_question.lower()
-    if "reflection" in text:
-        return [
-            "What happens to light during reflection?",
-            "Why does a mirror show a clear image?",
-            "Why do rough surfaces not show clear reflections?",
-        ]
-    return [
-        "What is the main idea from the explanation?",
-        "Can you give one simple example?",
-        "Can you explain it in your own words?",
-    ]
-def format_quiz(quiz_questions: list[Any]) -> str:
-    questions = [
-        str(question).strip()
-        for question in quiz_questions
-        if str(question).strip()
-    ]
-    if not questions:
-        questions = [
-            "What did you learn from the explanation?",
-            "Can you give one example?",
-            "Can you explain it to a friend?",
-        ]
     return "\n".join(
-        f"{index}. {question}"
-        for index, question in enumerate(questions[:3], start=1)
     )
-def format_sources(sources: list[Any]) -> str:
     if not sources:
         return "No retrieved sources returned."
     formatted = []
     for source in sources[:5]:
-        if not isinstance(source, dict):
-            continue
-        metadata = source.get("metadata", {}) if isinstance(source.get("metadata"), dict) else {}
         filename = metadata.get("filename", "textbook")
         chunk_index = metadata.get("chunk_index", "unknown")
-        score = source.get("score", 0)
-        text = str(source.get("text", "")).strip()
-        formatted.append(
-            f"Source: {filename}, chunk {chunk_index}, score {float(score):.3f}\n{text}"
-        )
-    return "\n\n".join(formatted) if formatted else "No retrieved sources returned."
-def format_grade(data: dict[str, Any]) -> str:
     lines = [f"Score: {data.get('score', 0)} / {data.get('total', 0)}"]
-    weak_areas = data.get("weak_areas", [])
-    if weak_areas:
-        lines.append(f"Weak areas: {', '.join(str(item) for item in weak_areas)}")
     for item in data.get("results", []):
         status = "Correct" if item.get("is_correct") else "Needs practice"
         lines.append(f"{status}: {item.get('question', '')}")
         if not item.get("is_correct"):
             lines.append(f"Expected idea: {item.get('expected_answer', '')}")
     return "\n".join(lines)
-def _response_error(response: requests.Response, fallback: str) -> str:
-    try:
-        return str(response.json().get("detail", fallback))
-    except ValueError:
-        return fallback
-def encode_state(state: dict[str, Any]) -> str:
     return json.dumps(state, ensure_ascii=False)
-def decode_state(state: Any) -> dict[str, Any]:
     if isinstance(state, dict):
         return state
     if not state:
         return {}
     try:
         decoded = json.loads(str(state))
     except (TypeError, ValueError):
         return {}
     return decoded if isinstance(decoded, dict) else {}
-def truncate(text: str, max_length: int) -> str:
     if len(text) <= max_length:
         return text
-    return f"{text[: max_length - 3]}..."
 with gr.Blocks(title=APP_NAME, theme=gr.themes.Soft()) as demo:
     gr.Markdown(
         """
         # Pathshala AI
-        Bilingual AI tutor for rural primary students in Nepal. Upload a PDF directly
-        in this Space, or connect a public backend for the full production workflow.
         """
     )
-    quiz_state = gr.State("{}")
     textbook_state = gr.State("{}")
     with gr.Row():
-        student_id_input = gr.Textbox(
-            label="Student ID",
-            value="hf-space-demo",
-            scale=1,
-        )
         status_output = gr.Textbox(
             label="Status",
             value=(
                 "Backend connected." if BACKEND_URL else
-                "Space-local PDF upload is active. Set BACKEND_URL for the full backend workflow."
             ),
             interactive=False,
-            scale=2,
         )
     with gr.Tab("Ask"):
         with gr.Row():
-            with gr.Column(scale=1):
-                pdf_input = gr.File(label="Upload textbook or worksheet PDF", file_types=[".pdf"], type="filepath")
                 upload_button = gr.Button("Upload PDF")
                 upload_output = gr.Textbox(label="Upload result", lines=3, interactive=False)
                 question_input = gr.Textbox(
                     label="Student question",
-                    placeholder=EXAMPLE_QUESTION,
                     value=EXAMPLE_QUESTION,
                     lines=2,
                 )
                 context_input = gr.Textbox(
                     label="Optional textbook context",
-                    placeholder="Paste a short textbook paragraph here.",
                     value=EXAMPLE_CONTEXT,
-                    lines=7,
                 )
                 ask_button = gr.Button("Ask Tutor", variant="primary")
-            with gr.Column(scale=1):
                 english_output = gr.Textbox(label="English explanation", lines=8)
                 nepali_output = gr.Textbox(label="Nepali explanation", lines=8)
                 quiz_output = gr.Textbox(label="3 quiz questions", lines=5)
         sources_output = gr.Textbox(label="Retrieved sources", lines=8)
     with gr.Tab("Quiz"):
@@ -860,40 +511,7 @@ with gr.Blocks(title=APP_NAME, theme=gr.themes.Soft()) as demo:
     with gr.Tab("Parent Summary"):
         summary_button = gr.Button("Show Parent/Teacher Summary")
-        summary_output = gr.Textbox(label="Summary", lines=14)
-    gr.Examples(
-        examples=[
-            [EXAMPLE_QUESTION, EXAMPLE_CONTEXT],
-            [
-                "What is reflection of light?",
-                (
-                    "When an object is placed in front of the mirror, the image is formed "
-                    "due to reflection of light from the mirror. Flat and smooth surfaces "
-                    "reflect light clearly, while rough surfaces do not."
-                ),
-            ],
-            [
-                "photosynthesis vaneko ke ho vana",
-                (
-                    "Photosynthesis is the process by which green plants use sunlight, "
-                    "water, and carbon dioxide to make food."
-                ),
-            ],
-        ],
-        inputs=[question_input, context_input],
-        outputs=[
-            english_output,
-            nepali_output,
-            quiz_output,
-            sources_output,
-            status_output,
-            quiz_state,
-        ],
-        fn=lambda question, context: ask_tutor(question, "hf-space-demo", context, "{}"),
-        api_name=False,
-        cache_examples=False,
-    )
     upload_button.click(
         fn=upload_textbook,

 import json
 import os
 from functools import lru_cache
 from dotenv import load_dotenv
 APP_NAME = os.getenv("APP_NAME", "Pathshala AI")
 BACKEND_URL = os.getenv("BACKEND_URL", "").rstrip("/")
 EMBEDDING_MODEL = os.getenv(
     "EMBEDDING_MODEL",
     "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
 )
+EXAMPLE_QUESTION = "mato katan bhaneko ke ho"
+EXAMPLE_CONTEXT = (
+    "माटो कटान भनेको पानी, हावा वा अरू कारणले माटोको माथिल्लो मलिलो भाग बग्नु हो। "
+    "रूख र घाँस रोप्दा माटो जोगाउन मद्दत हुन्छ।"
+)
+MIN_CHUNK_CHARS = 250
+MAX_CHUNK_CHARS = 900
 def upload_textbook(pdf_path):
     if not pdf_path:
         return "Choose a PDF first.", "{}", gr.update()
+    if BACKEND_URL:
+        backend_result = upload_to_backend(pdf_path)
+        if backend_result:
+            return backend_result
     try:
         extracted = extract_pdf_text(pdf_path)
         chunks = chunk_text(extracted["text"])
         if not chunks:
             return "No readable text chunks could be created from this PDF.", "{}", gr.update()
             "filename": os.path.basename(pdf_path),
             "page_count": extracted["page_count"],
             "chunk_count": len(chunks),
             "chunks": chunks,
             "embeddings": embeddings.tolist(),
         }
+        message = (
+            f"Uploaded {state['filename']} inside this Space with "
+            f"{state['page_count']} pages and {state['chunk_count']} chunks."
         )
+        return message, encode_state(state), gr.update(value="")
     except Exception as exc:
+        return f"Could not process uploaded PDF: {exc}", "{}", gr.update()
+def upload_to_backend(pdf_path):
+    try:
+        with open(pdf_path, "rb") as pdf_file:
+            response = requests.post(
+                f"{BACKEND_URL}/upload-textbook",
+                files={"file": (os.path.basename(pdf_path), pdf_file, "application/pdf")},
+                timeout=900,
+            )
+        if not response.ok:
+            return None
+        result = response.json()
+        message = (
+            f"Uploaded {result['filename']} with {result['page_count']} pages "
+            f"and {result['chunk_count']} chunks."
+        )
+        return message, "{}", gr.update(value="")
+    except (OSError, requests.RequestException, ValueError):
+        return None
+def ask_tutor(question, student_id, textbook_context, textbook_state):
+    question = (question or "").strip()
     student_id = (student_id or "hf-space-demo").strip()
+    textbook_context = (textbook_context or "").strip()
     if not question:
         return (
             "Please type a student question.",
             "कृपया विद्यार्थीको प्रश्न लेख्नुहोस्।",
+            "",
             "",
             "Waiting for a question.",
             "{}",
     if BACKEND_URL:
         backend_result = ask_backend(question, student_id, textbook_context)
+        if backend_result:
             return backend_result
+    state = decode_state(textbook_state)
+    sources = sources_from_context(textbook_context)
+    if not sources and state:
+        sources = retrieve_local_sources(normalize_question(question), state, limit=5)
+    if not sources:
+        sources = sources_from_context(EXAMPLE_CONTEXT)
+    context = "\n\n".join(source["text"] for source in sources)
+    english = (
+        f"Interpreted question: {normalize_question(question)}\n\n"
+        f"Answer from textbook context:\n{truncate(context, 700)}"
+    )
+    nepali = nepali_answer(normalize_question(question), context)
+    quiz_questions = nepali_quiz_questions(context)
+    quiz_state = {
+        "quiz_questions": quiz_questions,
+        "expected_answers": [source_answer(sources)] * 3,
+    }
+    return (
+        english,
+        nepali,
+        format_quiz(quiz_questions),
+        format_sources(sources),
+        "Answered with the Hugging Face Space local PDF workflow.",
+        encode_state(quiz_state),
     )
+def ask_backend(question, student_id, textbook_context):
+    payload = {
         "question": question,
         "student_id": student_id,
         "language_support": "English and Nepali",
     }
     if textbook_context:
         payload["textbook_context"] = textbook_context
     try:
+        response = requests.post(f"{BACKEND_URL}/ask", json=payload, timeout=180)
+        if not response.ok:
+            return None
         data = response.json()
+    except (requests.RequestException, ValueError):
         return None
     quiz_questions = data.get("quiz_questions", [])
+    english = str(data.get("answer_english", "No English answer returned."))
+    normalized = str(data.get("normalized_question") or "").strip()
+    if normalized:
+        english = f"Interpreted question: {normalized}\n\n{english}"
+    quiz_state = {
         "quiz_id": data.get("quiz_id"),
         "quiz_questions": quiz_questions,
         "student_id": student_id,
     }
     return (
+        english,
         str(data.get("answer_nepali", "नेपाली उत्तर प्राप्त भएन।")),
         format_quiz(quiz_questions),
         format_sources(data.get("retrieved_sources", [])),
         "Answered with the backend RAG workflow.",
+        encode_state(quiz_state),
     )
+def grade_quiz(answer_1, answer_2, answer_3, student_id, quiz_state):
+    state = decode_state(quiz_state)
+    if BACKEND_URL and state.get("quiz_id"):
+        try:
+            response = requests.post(
+                f"{BACKEND_URL}/grade-quiz",
+                json={
+                    "student_id": (student_id or "hf-space-demo").strip(),
+                    "quiz_id": state["quiz_id"],
+                    "answers": [answer_1, answer_2, answer_3],
+                },
+                timeout=45,
+            )
+            if response.ok:
+                return format_grade(response.json())
+        except (requests.RequestException, ValueError):
+            pass
+    questions = state.get("quiz_questions", [])
+    expected_answers = state.get("expected_answers", [])
     if not questions:
         return "Ask the tutor first so a quiz can be created."
+    answers = [answer_1, answer_2, answer_3]
     score = 0
     lines = []
     for index, question in enumerate(questions[:3]):
+        expected = str(expected_answers[index] if index < len(expected_answers) else "")
+        answer = str(answers[index] if index < len(answers) else "")
+        is_correct = is_answer_close(answer, expected)
+        score += 1 if is_correct else 0
+        lines.append(f"{'Correct' if is_correct else 'Needs practice'}: {question}")
+        if not is_correct and expected:
+            lines.append(f"Expected idea: {expected}")
     return f"Score: {score} / {min(len(questions), 3)}\n" + "\n".join(lines)
 def parent_summary(student_id):
     if not BACKEND_URL:
         return (
             "Parent/teacher summary\n\n"
+            "The student practiced with uploaded or pasted textbook context in this Space. "
+            "For persistent progress, deploy the FastAPI backend and set BACKEND_URL."
         )
     try:
         response = requests.get(
+            f"{BACKEND_URL}/parent-summary/{student_id or 'hf-space-demo'}",
+            timeout=45,
         )
         if not response.ok:
+            return "Summary failed."
+        data = response.json()
+    except (requests.RequestException, ValueError):
+        return "Summary failed."
+    strengths = "\n".join(f"- {item}" for item in data.get("strengths", []))
+    weak_topics = data.get("weak_topics", [])
+    weak_text = "\n".join(f"- {item}" for item in weak_topics) if weak_topics else "No weak topics recorded yet."
     return (
         f"Strengths\n{strengths}\n\n"
+        f"Weak topics\n{weak_text}\n\n"
+        f"Suggested next practice\n{data.get('suggested_next_practice', '')}\n\n"
+        f"Encouraging note\n{data.get('encouraging_note', '')}"
     )
+def extract_pdf_text(pdf_path):
     import fitz
     page_texts = []
     with fitz.open(pdf_path) as document:
+        page_count = document.page_count
         for page in document:
             text = page.get_text("text").strip()
             if text:
                 page_texts.append(text)
     text = "\n\n".join(page_texts).strip()
     if not text:
         raise ValueError(
+            "No selectable text found. For scanned PDFs, use backend OCR or paste a paragraph."
         )
+    return {"text": text, "page_count": page_count}
+def chunk_text(text):
     paragraphs = [part.strip() for part in text.splitlines() if part.strip()]
     chunks = []
     current = ""
     for paragraph in paragraphs:
         if len(current) + len(paragraph) + 2 <= MAX_CHUNK_CHARS:
             current = f"{current}\n{paragraph}".strip()
+        elif len(current) >= MIN_CHUNK_CHARS:
             chunks.append(current)
             current = paragraph
         else:
             current = f"{current}\n{paragraph}".strip()
     if current:
         chunks.append(current)
     return chunks or ([text.strip()] if text.strip() else [])
     return SentenceTransformer(EMBEDDING_MODEL)
+def embed_texts(texts):
     model = get_embedding_model()
     return np.asarray(
         model.encode(
     )
+def retrieve_local_sources(question, state, limit=5):
+    chunks = [str(chunk) for chunk in state.get("chunks", [])]
+    embeddings = np.asarray(state.get("embeddings", []), dtype=float)
     if not chunks or embeddings.size == 0:
         return []
     query_embedding = embed_texts([question])[0]
     scores = embeddings @ query_embedding
     top_indices = np.argsort(scores)[::-1][:limit]
     return [
         {
             "score": float(scores[index]),
             "text": chunks[index],
             "metadata": {
+                "filename": state.get("filename", "uploaded-textbook"),
                 "chunk_index": int(index),
             },
         }
     ]
+def sources_from_context(text):
+    chunks = chunk_text(text)
+    return [
+        {
+            "score": 1.0,
+            "text": chunk,
+            "metadata": {"filename": "pasted-context", "chunk_index": index},
+        }
+        for index, chunk in enumerate(chunks[:5])
+    ]
+def normalize_question(question):
+    text = question.lower()
+    if "mato" in text and "katan" in text:
+        return "What is soil erosion?"
+    if "prakash" in text and "sansleshan" in text:
+        return "What is photosynthesis?"
+    if "bhinn" in text or "fraction" in text:
+        return "What is a fraction?"
+    return question
+def nepali_answer(question, context):
+    text = f"{question} {context}".lower()
+    if "soil erosion" in text or "माटो कटान" in context:
         return (
+            "माटो कटान भनेको पानी, हावा वा अरू कारणले माटोको माथिल्लो मलिलो भाग "
+            "बग्नु वा हट्नु हो। यसले जमिनको उर्वर शक्ति घटाउँछ। रूख, घाँस र बिरुवा "
+            "रोप्दा माटो जोगाउन मद्दत हुन्छ।"
         )
+    if "photosynthesis" in text or "प्रकाश संश्लेषण" in context:
         return (
             "प्रकाश संश्लेषण भनेको हरिया बिरुवाले घामको प्रकाश, पानी र कार्बन "
+            "डाइअक्साइड प्रयोग गरेर खाना बनाउने प्रक्रिया हो। यस क्रममा अक्सिजन पनि निस्कन्छ।"
         )
     if has_devanagari(context):
+        return "अपलोड गरिएको पाठ्यपुस्तकको सन्दर्भअनुसार मुख्य कुरा यस्तो छ:\n\n" + truncate(context, 700)
     return (
         "अपलोड गरिएको पाठ्यपुस्तकको सन्दर्भअनुसार यो विषय महत्त्वपूर्ण छ। "
+        "मुख्य शब्दहरू पढेर आफ्नै सरल शब्दमा उत्तर लेख्ने अभ्यास गर्नुहोस्।"
     )
+def nepali_quiz_questions(context):
+    short_context = truncate(first_sentence(context), 140)
     return [
         "प्राप्त पाठ्यपुस्तक सन्दर्भको मुख्य कुरा के हो?",
         f"यो वाक्यले के बुझाउँछ: {short_context}",
     ]
+def source_answer(sources):
     if not sources:
         return "पाठ्यपुस्तकको मुख्य कुरा।"
     text = str(sources[0].get("text", "")).strip()
+    return truncate(first_sentence(text) or text, 220)
+def first_sentence(text):
     for separator in ["।", ".", "?", "!"]:
         if separator in text:
             return text.split(separator, 1)[0].strip() + separator
     return text.strip()
+def has_devanagari(text):
     return any("\u0900" <= character <= "\u097f" for character in text)
+def is_answer_close(student_answer, expected_answer):
+    student = normalize_answer(student_answer)
+    expected = normalize_answer(expected_answer)
+    if not student or not expected:
+        return False
+    student_tokens = set(student.split())
+    expected_tokens = set(expected.split())
+    overlap = len(student_tokens & expected_tokens) / max(len(expected_tokens), 1)
+    return overlap >= 0.35 or student in expected or expected in student
+def normalize_answer(answer):
+    return " ".join(
+        word.strip(".,?!:;()[]{}\"'।").lower()
+        for word in str(answer).split()
+        if word.strip(".,?!:;()[]{}\"'।")
+    )
+def format_quiz(questions):
+    clean_questions = [str(question).strip() for question in questions if str(question).strip()]
     return "\n".join(
+        f"{index}. {question}" for index, question in enumerate(clean_questions[:3], start=1)
     )
+def format_sources(sources):
     if not sources:
         return "No retrieved sources returned."
     formatted = []
     for source in sources[:5]:
+        metadata = source.get("metadata", {}) if isinstance(source, dict) else {}
         filename = metadata.get("filename", "textbook")
         chunk_index = metadata.get("chunk_index", "unknown")
+        score = float(source.get("score", 0)) if isinstance(source, dict) else 0
+        text = str(source.get("text", "")).strip() if isinstance(source, dict) else ""
+        formatted.append(f"Source: {filename}, chunk {chunk_index}, score {score:.3f}\n{text}")
+    return "\n\n".join(formatted)
+def format_grade(data):
     lines = [f"Score: {data.get('score', 0)} / {data.get('total', 0)}"]
     for item in data.get("results", []):
         status = "Correct" if item.get("is_correct") else "Needs practice"
         lines.append(f"{status}: {item.get('question', '')}")
         if not item.get("is_correct"):
             lines.append(f"Expected idea: {item.get('expected_answer', '')}")
     return "\n".join(lines)
+def encode_state(state):
     return json.dumps(state, ensure_ascii=False)
+def decode_state(state):
     if isinstance(state, dict):
         return state
     if not state:
         return {}
     try:
         decoded = json.loads(str(state))
     except (TypeError, ValueError):
         return {}
     return decoded if isinstance(decoded, dict) else {}
+def truncate(text, max_length):
+    text = str(text)
     if len(text) <= max_length:
         return text
+    return text[: max_length - 3] + "..."
 with gr.Blocks(title=APP_NAME, theme=gr.themes.Soft()) as demo:
     gr.Markdown(
         """
         # Pathshala AI
+        Upload a textbook PDF, ask a question, and get textbook-grounded bilingual help.
         """
     )
     textbook_state = gr.State("{}")
+    quiz_state = gr.State("{}")
     with gr.Row():
+        student_id_input = gr.Textbox(label="Student ID", value="hf-space-demo")
         status_output = gr.Textbox(
             label="Status",
             value=(
                 "Backend connected." if BACKEND_URL else
+                "Space-local PDF upload is active. Set BACKEND_URL for full backend OCR/progress."
             ),
             interactive=False,
         )
     with gr.Tab("Ask"):
         with gr.Row():
+            with gr.Column():
+                pdf_input = gr.File(
+                    label="Upload textbook or worksheet PDF",
+                    file_types=[".pdf"],
+                    type="filepath",
+                )
                 upload_button = gr.Button("Upload PDF")
                 upload_output = gr.Textbox(label="Upload result", lines=3, interactive=False)
                 question_input = gr.Textbox(
                     label="Student question",
                     value=EXAMPLE_QUESTION,
                     lines=2,
                 )
                 context_input = gr.Textbox(
                     label="Optional textbook context",
                     value=EXAMPLE_CONTEXT,
+                    lines=6,
                 )
                 ask_button = gr.Button("Ask Tutor", variant="primary")
+            with gr.Column():
                 english_output = gr.Textbox(label="English explanation", lines=8)
                 nepali_output = gr.Textbox(label="Nepali explanation", lines=8)
                 quiz_output = gr.Textbox(label="3 quiz questions", lines=5)
         sources_output = gr.Textbox(label="Retrieved sources", lines=8)
     with gr.Tab("Quiz"):
     with gr.Tab("Parent Summary"):
         summary_button = gr.Button("Show Parent/Teacher Summary")
+        summary_output = gr.Textbox(label="Summary", lines=10)
     upload_button.click(
         fn=upload_textbook,