# MathLingua — System Architecture Document ## 1. System Overview MathLingua is a bilingual adaptive math tutoring application for Spanish-speaking students (grades 6–8) transitioning to English-medium mathematics education. The system presents math word problems with 4 scaffolded hint levels and uses a hybrid adaptive algorithm to personalize difficulty progression. ``` ┌─────────────────────────────────────────────────────────────────────┐ │ MathLingua System │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │ │ │ Frontend │ │ Backend │ │ External Services │ │ │ │ (Next.js) │◄─►│ (Firebase) │◄─►│ (LLM / SLM) │ │ │ └──────┬───────┘ └──────┬───────┘ └──────────┬───────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │ │ │ Adaptive │ │ Firestore │ │ V1: Gemini API │ │ │ │ Engine │ │ Database │ │ V2: Qwen2.5-3B SLM │ │ │ │ (Client JS) │ │ │ │ (HF Inference EP) │ │ │ └──────────────┘ └──────────────┘ └──────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ ``` --- ## 2. Component Architecture ### 2.1 Frontend — React / Next.js Application **Technology**: Next.js 14+ (App Router), TypeScript, Tailwind CSS **Hosting**: Firebase Hosting or Vercel #### Key Pages/Routes | Route | Component | Purpose | |---|---|---| | `/` | `LandingPage` | Login/signup, language preference | | `/dashboard` | `StudentDashboard` | Progress overview, session history, MCS/LDS charts | | `/practice` | `PracticeSession` | Adaptive practice from question database | | `/solve` | `CustomProblem` | "Input your question" — Gemini/SLM processes user-submitted problems | | `/session-report` | `SessionReport` | End-of-session summary with performance analytics | #### Core Frontend Components ``` src/ ├── components/ │ ├── ProblemDisplay/ │ │ ├── MathProblem.tsx # Renders word problem text │ │ ├── HintScaffold.tsx # L1/L2/L3/L4 progressive hint UI │ │ ├── AnswerInput.tsx # Numeric/expression answer entry │ │ └── SolutionReveal.tsx # L4 step-by-step solution display │ ├── Adaptive/ │ │ ├── DifficultyIndicator.tsx # Visual current-level indicator │ │ ├── ProgressBar.tsx # Session progress (e.g., 7/20) │ │ └── SessionTimer.tsx # Time tracking per problem │ ├── Dashboard/ │ │ ├── EloChart.tsx # Elo rating over time (Recharts) │ │ ├── TopicHeatmap.tsx # Performance by math topic │ │ ├── LDSMCSPanel.tsx # Language Dependency & Math Confidence │ │ └── StreakBadge.tsx # Gamification elements │ └── Shared/ │ ├── BilingualToggle.tsx # EN/ES interface language switch │ ├── MathRenderer.tsx # KaTeX for math expressions │ └── LoadingSkeleton.tsx ├── lib/ │ ├── adaptive-engine.ts # Elo + BKT + Thompson Sampling (client-side) │ ├── feature-engineer.ts # LDS & MCS computation │ ├── firebase.ts # Firebase SDK initialization │ └── llm-client.ts # Gemini/SLM API abstraction ├── hooks/ │ ├── useAdaptiveSession.ts # Manages session state + engine calls │ ├── useStudentProfile.ts # Reads/writes Firestore student state │ └── useQuestionQueue.ts # Pre-fetches next batch of questions └── types/ └── index.ts # TypeScript interfaces for all data structures ``` #### Hint Scaffold UI Flow ``` ┌─────────────────────────────────────┐ │ Problem displayed in original │ │ English at student's current level │ │ │ │ [Try to solve] [I need a hint →] │ └──────────────────────┬──────────────┘ │ click ▼ ┌─────────────────────────────────────┐ │ L1: Simplified English │ │ "A store has 24 apples..." │ │ │ │ [Got it!] [Still stuck →] │ └──────────────────────┬──────────────┘ │ click ▼ ┌─────────────────────────────────────┐ │ L2: Bilingual Keywords Inline │ │ "A store has 24 apples (manzanas)" │ │ "divided equally (dividido │ │ igualmente) among 6 boxes" │ │ │ │ [Got it!] [Still stuck →] │ └──────────────────────┬──────────────┘ │ click ▼ ┌─────────────────────────────────────┐ │ L3: Full Spanish Translation │ │ "Una tienda tiene 24 manzanas │ │ divididas igualmente entre 6 │ │ cajas. ¿Cuántas manzanas hay │ │ en cada caja?" │ │ │ │ [Got it!] [Show me the answer →] │ └──────────────────────┬──────────────┘ │ click ▼ ┌─────────────────────────────────────┐ │ L4: Step-by-Step Solution │ │ Step 1: Identify — 24 ÷ 6 │ │ Step 2: Calculate — 24 ÷ 6 = 4 │ │ Step 3: Answer — 4 apples per box │ │ │ │ [Next Problem →] │ └─────────────────────────────────────┘ ``` Each hint interaction is logged with timestamp to compute `escalation_speed` and `scaffold_time_ratio` for the LDS formula. --- ### 2.2 Adaptive Engine (Client-Side JavaScript) The adaptive engine runs **entirely in the browser** — no server round-trip needed for difficulty decisions. This ensures instant feedback and works offline after initial question batch load. #### Engine Components ``` ┌─────────────────────────────────────────────────┐ │ Adaptive Engine (client-side) │ │ │ │ ┌─────────────┐ ┌──────────┐ ┌────────────┐ │ │ │ Elo Rating │ │ BKT │ │ Thompson │ │ │ │ System │ │ Engine │ │ Sampler │ │ │ │ │ │ │ │ │ │ │ │ Updates │ │ P(know) │ │ Beta prior │ │ │ │ student & │ │ per │ │ per level, │ │ │ │ question │ │ topic │ │ ZPD window │ │ │ │ ratings │ │ │ │ │ │ │ └──────┬──────┘ └────┬─────┘ └─────┬──────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌───────────────────────────────────────────┐ │ │ │ Decision Orchestrator │ │ │ │ │ │ │ │ Input: weighted_outcome, features │ │ │ │ Output: next_level, decision_type │ │ │ │ (increase/maintain/decrease) │ │ │ └───────────────────────────────────────────┘ │ └─────────────────────────────────────────────────┘ ``` #### Elo Update Formula ``` weighted_outcome = { no_hint: 1.00 (solved without any scaffold) L1_only: 0.75 (needed simplified English) L2_used: 0.50 (needed bilingual keywords) L3_used: 0.25 (needed full translation) L4_used: 0.00 (needed answer reveal) } E_student = 1 / (1 + 10^((R_question - R_student) / 400)) R_student_new = R_student + K × (weighted_outcome - E_student) K = 32 (default), increased to 48 for first 10 interactions (cold-start acceleration) ``` #### BKT Parameters (per topic) | Parameter | Symbol | Default | Description | |---|---|---|---| | Prior knowledge | P(L₀) | 0.10 | Initial probability student knows topic | | Learn rate | P(T) | 0.15 | Probability of learning per opportunity | | Slip | P(S) | 0.10 | Probability of incorrect despite knowing | | Guess | P(G) | 0.25 | Probability of correct despite not knowing | Slip is adjusted based on hint usage: ``` P(S)_adjusted = P(S) × (1 + 0.5 × hint_depth_normalized) ``` This models the intuition that using more scaffolds means apparent "correctness" is less certain. #### Thompson Sampling with ZPD Windowing ``` For each candidate level l in ZPD window [current - 2, current + 3]: sample θ_l ~ Beta(α_l, β_l) score_l = θ_l × proximity_bonus(l, target_elo) Select level = argmax(score_l) proximity_bonus(l, target) = exp(-0.5 × ((elo_l - target) / 100)²) ``` ZPD window is asymmetric (+3 upward, -2 downward) to encourage upward progression while preventing catastrophic failure. #### Progression Decision Rules | Condition | Decision | Action | |---|---|---| | weighted_outcome ≥ 0.75 AND P(know) ≥ 0.70 | **Increase** | Move up 1 sub-level | | weighted_outcome ≥ 0.85 AND streak ≥ 3 | **Skip** | Move up 2 sub-levels | | 0.40 ≤ weighted_outcome < 0.75 | **Maintain** | Stay at current level | | weighted_outcome < 0.40 OR streak_wrong ≥ 2 | **Decrease** | Move down 1 sub-level | | weighted_outcome < 0.25 AND P(know) < 0.30 | **Rapid Decrease** | Move down 2 sub-levels | --- ### 2.3 Firebase Backend **Services Used**: - Firebase Authentication (Google Sign-In, Email/Password) - Cloud Firestore (student state, question database, session logs) - Cloud Functions (LLM API calls, batch question generation, session reports) - Firebase Hosting (static frontend assets) #### Firestore Data Model ``` firestore/ ├── users/ │ └── {uid}/ │ ├── profile: { │ │ displayName, email, gradeLevel, preferredLanguage, │ │ createdAt, lastActive │ │ } │ ├── adaptiveState: { │ │ currentElo: number, // e.g., 1050 │ │ currentLevel: string, // e.g., "2.1" │ │ totalInteractions: number, │ │ topicMastery: { // BKT P(know) per topic │ │ "arithmetic": 0.72, │ │ "fractions": 0.45, │ │ "algebra_basic": 0.31, │ │ ... │ │ }, │ │ thompsonPriors: { // Beta(α,β) per level │ │ "1.1": { alpha: 12, beta: 3 }, │ │ "1.2": { alpha: 8, beta: 5 }, │ │ ... │ │ }, │ │ featureAverages: { │ │ avgLDS: 0.42, │ │ avgMCS: 0.61, │ │ recentLDS_5: [0.3, 0.4, 0.5, 0.35, 0.45], │ │ recentMCS_5: [0.6, 0.65, 0.58, 0.62, 0.7] │ │ }, │ │ streakCount: number, │ │ lastUpdated: timestamp │ │ } │ └── sessions/ │ └── {sessionId}/ │ ├── metadata: { │ │ startTime, endTime, questionsAttempted, │ │ questionsCorrect, avgWeightedOutcome, │ │ startElo, endElo, sessionLDS, sessionMCS │ │ } │ └── interactions/ │ └── {interactionId}: { │ questionId, level, topic, │ startTime, endTime, timeSpentMs, │ hintsUsed: [0,1,2,3,4], // which levels accessed │ hintTimestamps: { L1: ts, L2: ts, ... }, │ maxHintLevel: number, │ answer: string, │ isCorrect: boolean, │ attempts: number, │ weightedOutcome: number, │ lds: number, │ mcs: number, │ eloBeforeUpdate: number, │ eloAfterUpdate: number, │ adaptiveDecision: string │ } │ ├── questions/ │ └── {questionId}: { │ id, level, topic, subtopic, │ problemText, answer, answerNumeric, │ solutionSteps: [...], │ scaffolds: { │ L1_simplified: string, │ L2_bilingual: string, │ L3_spanish: string, │ L4_solution: string │ }, │ readability: { │ fleschKincaid: number, │ wordCount: number, │ difficultWords: number, │ avgSyllables: number │ }, │ eloRating: number, │ timesServed: number, │ avgOutcome: number, │ metadata: { │ source: "curated" | "generated", │ generatedBy: "gemini-2.0" | "qwen2.5-3b" | null, │ reviewedBy: string | null, │ createdAt: timestamp │ } │ } │ ├── questionIndex/ // Denormalized for fast queries │ └── byLevel/ │ └── {level}: { │ questionIds: [...], │ count: number │ } │ └── analytics/ // Aggregated (Cloud Functions) ├── dailyStats/ │ └── {date}: { activeUsers, sessionsCompleted, ... } └── cohortProgress/ └── {cohortId}: { avgElo, avgLDS, avgMCS, ... } ``` #### Firestore Security Rules ```javascript rules_version = '2'; service cloud.firestore { match /databases/{database}/documents { // Users can only read/write their own data match /users/{uid}/{document=**} { allow read, write: if request.auth != null && request.auth.uid == uid; } // Questions are readable by all authenticated users match /questions/{questionId} { allow read: if request.auth != null; allow write: if false; // Only admin/Cloud Functions } // Question index readable by all authenticated users match /questionIndex/{document=**} { allow read: if request.auth != null; allow write: if false; } // Analytics only accessible by admin match /analytics/{document=**} { allow read, write: if false; // Cloud Functions only } } } ``` --- ### 2.4 Cloud Functions (Serverless Backend) ``` functions/ ├── onUserCreate.ts # Initialize adaptive state for new user ├── generateScaffolds.ts # Call Gemini/SLM to create L1-L4 for a problem ├── batchGenerateQuestions.ts # Generate next 20 questions for session queue ├── processCustomProblem.ts # "Input your question" flow ├── generateSessionReport.ts # End-of-session analytics ├── updateQuestionStats.ts # Update question difficulty from outcomes └── scheduledAnalytics.ts # Daily aggregation (cron-triggered) ``` #### Key Cloud Function: `generateScaffolds` ```typescript // Triggered when student submits a custom problem or when // pre-generating scaffolds for database questions interface ScaffoldRequest { problemText: string; studentGradeLevel: number; currentLDS: number; // Informs simplification level } interface ScaffoldResponse { L1_simplified: string; // Simplified English L2_bilingual: string; // English with inline Spanish keywords L3_spanish: string; // Full Spanish translation L4_solution: string; // Step-by-step solution answer: string; answerNumeric: number; } // Prompt template for LLM const SCAFFOLD_PROMPT = ` You are a bilingual math tutor helping Spanish-speaking students (grades 6-8) learn math in English. Given this math word problem: "{problemText}" Generate 4 scaffold levels: **L1 (Simplified English):** Rewrite using shorter sentences, simpler vocabulary (grade {adjustedGrade} reading level). Keep all math content identical. **L2 (Bilingual Keywords):** Take the original problem and add Spanish translations in parentheses for key math and context vocabulary. Format: "English word (palabra en español)". **L3 (Full Spanish Translation):** Translate the complete problem to natural, grade-appropriate Spanish. Ensure mathematical precision is maintained. **L4 (Step-by-Step Solution):** Provide a clear, numbered step-by-step solution in English with the final numerical answer. Return as JSON with keys: L1_simplified, L2_bilingual, L3_spanish, L4_solution, answer, answerNumeric. `; ``` #### Key Cloud Function: `batchGenerateQuestions` ```typescript // Called when student reaches question 17 of 20 (prefetch trigger) // Selects next 20 questions from database based on adaptive state export const batchGenerateQuestions = onCall(async (request) => { const { uid } = request.auth; const state = await getAdaptiveState(uid); // Thompson Sampling selects level distribution for next batch const levelDistribution = thompsonSampleBatch( state.thompsonPriors, state.currentLevel, batchSize: 20 ); // e.g., { "2.1": 5, "2.2": 8, "2.3": 5, "2.4": 2 } // Select questions avoiding recently served ones const recentIds = await getRecentQuestionIds(uid, lookback: 100); const questions = await selectQuestions( levelDistribution, excludeIds: recentIds, topicBalance: state.topicMastery // Favor weaker topics ); // Ensure all questions have scaffolds generated const withScaffolds = await ensureScaffoldsGenerated(questions); return { questions: withScaffolds, sessionBatchId: generateId() }; }); ``` --- ### 2.5 LLM Service Layer #### V1: Gemini API (Current) ``` ┌────────────┐ HTTPS/REST ┌──────────────────┐ │ Cloud │ ──────────────────►│ Google Gemini │ │ Function │ ◄──────────────────│ 2.0 Flash API │ └────────────┘ └──────────────────┘ Cost: ~$0.075 per 1M input tokens, ~$0.30 per 1M output tokens Latency: 200-800ms per scaffold generation Rate limit: 60 RPM (free tier), 1000 RPM (paid) ``` #### V2: Qwen2.5-3B SLM (Planned) ``` ┌────────────┐ HTTPS/REST ┌──────────────────────────┐ │ Cloud │ ──────────────────►│ HF Inference Endpoint │ │ Function │ ◄──────────────────│ Qwen2.5-3B-Instruct │ └────────────┘ │ (QLoRA fine-tuned) │ │ GPU: T4 or L4 │ └──────────────────────────┘ Cost: ~$0.60/hr (T4) or ~$1.04/hr (L4) Latency: 100-400ms per scaffold generation Rate limit: Unlimited (dedicated endpoint) ``` #### LLM Client Abstraction ```typescript // lib/llm-client.ts — Provider-agnostic interface interface LLMProvider { generateScaffolds(problem: string, context: ScaffoldContext): Promise; generateQuestion(level: string, topic: string): Promise; validateAnswer(problem: string, studentAnswer: string, correctAnswer: string): Promise; } class GeminiProvider implements LLMProvider { ... } class QwenSLMProvider implements LLMProvider { ... } // Factory with fallback function createLLMClient(): LLMProvider { if (config.useSLM && config.slmEndpointAvailable) { return new QwenSLMProvider(config.slmEndpoint); } return new GeminiProvider(config.geminiApiKey); } ``` --- ### 2.6 SLM Fine-Tuning Pipeline ``` ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │ Training │ │ Fine-Tune │ │ Deploy │ │ Data Prep │───►│ QLoRA SFT │───►│ HF Inference EP │ └──────────────┘ └──────────────┘ └──────────────────┘ Step 1: Collect 2,000-5,000 scaffold examples from Gemini V1 usage Step 2: Human review + quality filter → ~1,500 gold examples Step 3: QLoRA fine-tune Qwen2.5-3B-Instruct Step 4: Evaluate on held-out test set (BLEU, math accuracy, readability) Step 5: Deploy to HF Inference Endpoint Step 6: Shadow-test alongside Gemini (serve both, compare quality) Step 7: Full cutover when SLM matches Gemini quality ``` **Fine-tuning Configuration:** | Parameter | Value | Rationale | |---|---|---| | Base model | Qwen2.5-3B-Instruct | Best math+Spanish at 3B scale | | Method | QLoRA (4-bit NF4) | Fits single 16GB GPU | | LoRA rank (r) | 32 | Balance quality/efficiency for small dataset | | LoRA alpha | 64 | Standard 2× rank | | Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | Full attention + MLP | | Learning rate | 2e-4 | Standard for QLoRA | | Epochs | 3-5 | Small dataset, monitor val loss | | Batch size | 4 (effective 16 with grad accum) | Memory constraint | | Max sequence length | 1024 | Sufficient for problem + all 4 scaffolds | | Warmup ratio | 0.05 | Short warmup for small dataset | --- ## 3. Data Flow Diagrams ### 3.1 Flow A: "Practice Problems" Mode ``` Student clicks "Start Practice" │ ▼ ┌─────────────────────────────────┐ │ 1. Load adaptive state from │ │ Firestore (Elo, BKT, priors) │ └────────────┬────────────────────┘ │ ▼ ┌─────────────────────────────────┐ │ 2. Thompson Sampling selects │ │ next question level │ │ (ZPD window: current ±2/+3) │ └────────────┬────────────────────┘ │ ▼ ┌─────────────────────────────────┐ │ 3. Fetch question from Firestore│ │ by level + topic balancing │ │ (avoid recently served) │ └────────────┬────────────────────┘ │ ▼ ┌─────────────────────────────────┐ │ 4. Display problem, start timer │ │ Student reads and attempts │ └────────────┬────────────────────┘ │ ┌────────┴────────┐ │ Needs hints? │ ▼ No ▼ Yes ┌─────────┐ ┌───────────────────┐ │ Submit │ │ L1 → L2 → L3 → L4│ │ answer │ │ (each click logged │ └────┬────┘ │ with timestamp) │ │ └────────┬───────────┘ │ │ ▼ ▼ ┌─────────────────────────────────┐ │ 5. Compute weighted_outcome │ │ based on correctness + hints │ │ Compute LDS and MCS │ └────────────┬────────────────────┘ │ ▼ ┌─────────────────────────────────┐ │ 6. Update Elo (student + Q) │ │ Update BKT P(know) for topic │ │ Update Thompson Beta priors │ └────────────┬────────────────────┘ │ ▼ ┌─────────────────────────────────┐ │ 7. Progression decision: │ │ increase / maintain / decrease│ │ Select next level │ └────────────┬────────────────────┘ │ ▼ ┌─────────────────────────────────┐ │ 8. Save interaction to Firestore│ │ Display "Next Problem" │ └────────────┬────────────────────┘ │ ▼ ┌────────┴────────┐ │ Q17 of 20? │ ▼ Yes ▼ No ┌─────────────┐ ┌──────────┐ │ Prefetch │ │ Loop to │ │ next batch │ │ step 2 │ │ (Cloud Fn) │ └──────────┘ └─────────────┘ │ At Q20: ▼ ┌─────────────────────────────────┐ │ 9. Generate session report │ │ (Cloud Function) │ │ Show summary to student │ └─────────────────────────────────┘ ``` ### 3.2 Flow B: "Input Your Question" Mode ``` Student types/pastes a math word problem │ ▼ ┌─────────────────────────────────┐ │ 1. Cloud Function: │ │ processCustomProblem │ │ - Validate it's a math │ │ word problem │ │ - Extract answer/solution │ │ - Call Gemini/SLM to generate│ │ L1, L2, L3, L4 scaffolds │ └────────────┬────────────────────┘ │ ▼ ┌─────────────────────────────────┐ │ 2. Estimate difficulty level │ │ using readability metrics │ │ (FK grade, word count, etc.) │ │ Map to nearest Elo rating │ └────────────┬────────────────────┘ │ ▼ ┌─────────────────────────────────┐ │ 3. Display problem with │ │ scaffold buttons active │ │ (same UI as Practice mode) │ └────────────┬────────────────────┘ │ ▼ ┌─────────────────────────────────┐ │ 4. Student interacts, solves │ │ Same hint tracking as │ │ Practice mode │ └────────────┬────────────────────┘ │ ▼ ┌─────────────────────────────────┐ │ 5. Update adaptive state │ │ (Elo, BKT, Thompson) │ │ Log interaction │ └────────────┬────────────────────┘ │ ▼ ┌─────────────────────────────────┐ │ 6. Offer: "Try another?" or │ │ "Switch to Practice Mode" │ │ (where engine auto-selects) │ └─────────────────────────────────┘ ``` --- ## 4. API Contracts ### 4.1 Client → Cloud Functions ```typescript // POST /generateScaffolds interface GenerateScaffoldsRequest { problemText: string; gradeLevel: number; // 6, 7, or 8 currentLDS: number; // 0.0-1.0, informs simplification } interface GenerateScaffoldsResponse { scaffolds: { L1_simplified: string; L2_bilingual: string; L3_spanish: string; L4_solution: string; }; answer: string; answerNumeric: number; estimatedLevel: string; // e.g., "2.3" estimatedElo: number; // e.g., 1100 processingTimeMs: number; } // POST /batchGenerateQuestions interface BatchRequest { batchSize: number; // default 20 // Auth token provides uid → adaptive state looked up server-side } interface BatchResponse { questions: QuestionWithScaffolds[]; sessionBatchId: string; } // POST /submitInteraction interface InteractionSubmission { sessionId: string; questionId: string; answer: string; isCorrect: boolean; timeSpentMs: number; hintsUsed: number[]; // [0], [0,1], [0,1,2], etc. hintTimestamps: Record; attempts: number; } interface InteractionResponse { weightedOutcome: number; lds: number; mcs: number; newElo: number; newLevel: string; decision: "increase" | "maintain" | "decrease" | "skip" | "rapid_decrease"; nextQuestion: QuestionWithScaffolds; // Pre-selected } // POST /generateSessionReport interface SessionReportRequest { sessionId: string; } interface SessionReportResponse { summary: { questionsAttempted: number; questionsCorrect: number; avgWeightedOutcome: number; eloChange: number; topicsStrong: string[]; topicsWeak: string[]; avgLDS: number; avgMCS: number; languageProgressNote: string; // Generated text about L2 progress }; recommendations: string[]; // e.g., "Focus on fractions vocabulary" } ``` --- ## 5. Deployment Architecture ### 5.1 V1 Deployment (MVP) ``` ┌──────────────────────────────────────────────────────────┐ │ Firebase Project │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌──────────────────┐ │ │ │ Firebase │ │ Cloud │ │ Cloud │ │ │ │ Hosting │ │ Firestore │ │ Functions │ │ │ │ (Next.js) │ │ (Database) │ │ (Node.js 20) │ │ │ │ │ │ │ │ │ │ │ │ Static + │ │ Student │ │ LLM calls │ │ │ │ SSR pages │ │ state, │ │ Batch gen │ │ │ │ │ │ questions, │ │ Reports │ │ │ │ │ │ sessions │ │ │ │ │ └──────────────┘ └──────────────┘ └───────┬──────────┘ │ │ │ │ └──────────────────────────────────────────────┼─────────────┘ │ HTTPS │ ▼ ┌──────────────────┐ │ Google Gemini │ │ 2.0 Flash API │ └──────────────────┘ Estimated monthly cost (100 students, 5 sessions/week): - Firebase Hosting: Free tier (~$0) - Firestore: ~$5/mo (reads/writes within free tier mostly) - Cloud Functions: ~$10/mo (invocations + compute) - Gemini API: ~$15-25/mo (scaffold generation) - Total: ~$30-40/mo ``` ### 5.2 V2 Deployment (SLM) ``` ┌──────────────────────────────────────────────────────────┐ │ Firebase Project │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌──────────────────┐ │ │ │ Firebase │ │ Cloud │ │ Cloud │ │ │ │ Hosting │ │ Firestore │ │ Functions │ │ │ └──────────────┘ └──────────────┘ └───────┬──────────┘ │ │ │ │ └──────────────────────────────────────────────┼─────────────┘ │ ┌─────────────────┼──────────────┐ │ │ │ ▼ ▼ │ ┌──────────────────┐ ┌──────────────┐ │ │ HF Inference │ │ Gemini API │ │ │ Endpoint │ │ (fallback) │ │ │ Qwen2.5-3B │ └──────────────┘ │ │ QLoRA FT │ │ │ (T4 GPU) │ Shadow testing: │ └──────────────────┘ Both called, SLM │ response served, │ Gemini response │ logged for QA │ ─────────────────────┘ Estimated monthly cost (100 students): - Firebase: ~$15/mo (same as V1) - HF Inference Endpoint (T4, scale-to-zero): ~$50-100/mo (active only during school hours, ~8hrs/day × 20 days) - Gemini fallback: ~$5/mo (only when SLM is cold) - Total: ~$70-120/mo (but no per-token costs at scale) ``` ### 5.3 V3 Deployment (Scale) ``` When student count exceeds 500+, migrate to: ┌─────────────────────────────────────────────────────────────┐ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │ │ │ Vercel │ │ Firebase │ │ Cloud Run │ │ │ │ (Next.js) │ │ Firestore │ │ (API server) │ │ │ └──────────────┘ └──────────────┘ └───────┬──────────┘ │ │ │ │ │ ┌─────────────────────────┼──────┐ │ │ │ │ │ │ │ ▼ ▼ │ │ │ ┌──────────────────┐ ┌─────────────────┐ │ │ │ │ HF Inference EP │ │ IRT/DKT Model │ │ │ │ │ Qwen2.5-3B │ │ Server │ │ │ │ │ (Auto-scaling) │ │ (Python/FastAPI)│ │ │ │ └──────────────────┘ └─────────────────┘ │ │ │ │ │ │ + Deep Knowledge Tracing (DKT) replaces BKT │ │ │ + IRT item calibration from pooled student data │ │ │ + A/B testing framework for algorithm improvements │ │ └─────────────────────────────────────────────────────────────┘ ``` --- ## 6. Technology Stack Summary | Layer | Technology | Justification | |---|---|---| | Frontend Framework | Next.js 14+ (App Router) | SSR for SEO, React ecosystem, TypeScript | | UI Styling | Tailwind CSS + shadcn/ui | Rapid prototyping, consistent design | | Math Rendering | KaTeX | Fast client-side LaTeX rendering | | Charts | Recharts | React-native charting for dashboards | | Authentication | Firebase Auth | Google Sign-In, simple integration | | Database | Cloud Firestore | Real-time sync, offline support, serverless | | Serverless Functions | Firebase Cloud Functions (Node.js 20) | Low latency, Firebase integration | | LLM (V1) | Google Gemini 2.0 Flash | Low cost, fast, good multilingual | | SLM (V2) | Qwen2.5-3B-Instruct (QLoRA fine-tuned) | Best math+Spanish at 3B, Apache 2.0 | | SLM Hosting | HF Inference Endpoints (T4, scale-to-zero) | Cost-effective, no infra management | | Adaptive Engine | Client-side TypeScript | Zero-latency decisions, works offline | | State Management | Zustand + Firestore sync | Lightweight, persists across sessions | | Testing | Vitest + Playwright | Unit + E2E testing | | CI/CD | GitHub Actions | Automated testing + Firebase deploy | | Monitoring | Firebase Analytics + Crashlytics | User behavior + error tracking | --- ## 7. Security & Privacy Considerations ### 7.1 Data Protection - **COPPA Compliance**: Students are minors (ages 11-14). No personally identifiable information stored beyond email/display name. No third-party tracking. - **FERPA Alignment**: Performance data (Elo, LDS, MCS) is associated with uid only. Teachers/admins see aggregate data, never individual student identifiers. - **Data Encryption**: Firestore encrypts at rest (AES-256). All API calls over HTTPS/TLS 1.3. ### 7.2 API Security - Firebase Auth tokens required for all Cloud Function calls - Gemini/SLM API keys stored in Firebase environment secrets (never client-side) - Rate limiting on Cloud Functions to prevent abuse (max 10 scaffold generations per minute per user) ### 7.3 Content Safety - All LLM-generated scaffolds pass through a validation function checking: - Mathematical accuracy (answer matches expected) - Appropriate content (no adult/violent themes) - Language accuracy (Spanish translation verified against expected pattern) - Questions from the curated database are pre-reviewed; generated questions flagged for human review --- ## 8. Performance Targets | Metric | Target | Measurement | |---|---|---| | Time to first problem display | < 2 seconds | Lighthouse / Firebase Performance | | Adaptive decision latency | < 50ms | Client-side (no network) | | Scaffold generation (Gemini) | < 1.5 seconds | Cloud Function logs | | Scaffold generation (SLM) | < 800ms | HF Inference EP metrics | | Batch prefetch trigger → ready | < 5 seconds | 20 questions fetched at Q17 | | Offline capability | Full session | After initial batch load | | Concurrent users (V1) | 50 | Firebase free/Blaze tier | | Concurrent users (V2) | 500+ | HF auto-scaling endpoint |