mathlingua-spec / system_architecture.md

Add system architecture document

3bc409d verified 11 days ago

42.9 kB

	# MathLingua — System Architecture Document

	## 1. System Overview

	MathLingua is a bilingual adaptive math tutoring application for Spanish-speaking students (grades 6–8) transitioning to English-medium mathematics education. The system presents math word problems with 4 scaffolded hint levels and uses a hybrid adaptive algorithm to personalize difficulty progression.

	```
	┌─────────────────────────────────────────────────────────────────────┐
	│ MathLingua System │
	│ │
	│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
	│ │ Frontend │ │ Backend │ │ External Services │ │
	│ │ (Next.js) │◄─►│ (Firebase) │◄─►│ (LLM / SLM) │ │
	│ └──────┬───────┘ └──────┬───────┘ └──────────┬───────────┘ │
	│ │ │ │ │
	│ ▼ ▼ ▼ │
	│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
	│ │ Adaptive │ │ Firestore │ │ V1: Gemini API │ │
	│ │ Engine │ │ Database │ │ V2: Qwen2.5-3B SLM │ │
	│ │ (Client JS) │ │ │ │ (HF Inference EP) │ │
	│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
	└─────────────────────────────────────────────────────────────────────┘
	```

	---

	## 2. Component Architecture

	### 2.1 Frontend — React / Next.js Application

	Technology: Next.js 14+ (App Router), TypeScript, Tailwind CSS
	Hosting: Firebase Hosting or Vercel

	#### Key Pages/Routes

	\| Route \| Component \| Purpose \|
	\|---\|---\|---\|
	\| `/` \| `LandingPage` \| Login/signup, language preference \|
	\| `/dashboard` \| `StudentDashboard` \| Progress overview, session history, MCS/LDS charts \|
	\| `/practice` \| `PracticeSession` \| Adaptive practice from question database \|
	\| `/solve` \| `CustomProblem` \| "Input your question" — Gemini/SLM processes user-submitted problems \|
	\| `/session-report` \| `SessionReport` \| End-of-session summary with performance analytics \|

	#### Core Frontend Components

	```
	src/
	├── components/
	│ ├── ProblemDisplay/
	│ │ ├── MathProblem.tsx # Renders word problem text
	│ │ ├── HintScaffold.tsx # L1/L2/L3/L4 progressive hint UI
	│ │ ├── AnswerInput.tsx # Numeric/expression answer entry
	│ │ └── SolutionReveal.tsx # L4 step-by-step solution display
	│ ├── Adaptive/
	│ │ ├── DifficultyIndicator.tsx # Visual current-level indicator
	│ │ ├── ProgressBar.tsx # Session progress (e.g., 7/20)
	│ │ └── SessionTimer.tsx # Time tracking per problem
	│ ├── Dashboard/
	│ │ ├── EloChart.tsx # Elo rating over time (Recharts)
	│ │ ├── TopicHeatmap.tsx # Performance by math topic
	│ │ ├── LDSMCSPanel.tsx # Language Dependency & Math Confidence
	│ │ └── StreakBadge.tsx # Gamification elements
	│ └── Shared/
	│ ├── BilingualToggle.tsx # EN/ES interface language switch
	│ ├── MathRenderer.tsx # KaTeX for math expressions
	│ └── LoadingSkeleton.tsx
	├── lib/
	│ ├── adaptive-engine.ts # Elo + BKT + Thompson Sampling (client-side)
	│ ├── feature-engineer.ts # LDS & MCS computation
	│ ├── firebase.ts # Firebase SDK initialization
	│ └── llm-client.ts # Gemini/SLM API abstraction
	├── hooks/
	│ ├── useAdaptiveSession.ts # Manages session state + engine calls
	│ ├── useStudentProfile.ts # Reads/writes Firestore student state
	│ └── useQuestionQueue.ts # Pre-fetches next batch of questions
	└── types/
	└── index.ts # TypeScript interfaces for all data structures
	```

	#### Hint Scaffold UI Flow

	```
	┌─────────────────────────────────────┐
	│ Problem displayed in original │
	│ English at student's current level │
	│ │
	│ [Try to solve] [I need a hint →] │
	└──────────────────────┬──────────────┘
	│ click
	▼
	┌─────────────────────────────────────┐
	│ L1: Simplified English │
	│ "A store has 24 apples..." │
	│ │
	│ [Got it!] [Still stuck →] │
	└──────────────────────┬──────────────┘
	│ click
	▼
	┌─────────────────────────────────────┐
	│ L2: Bilingual Keywords Inline │
	│ "A store has 24 apples (manzanas)" │
	│ "divided equally (dividido │
	│ igualmente) among 6 boxes" │
	│ │
	│ [Got it!] [Still stuck →] │
	└──────────────────────┬──────────────┘
	│ click
	▼
	┌─────────────────────────────────────┐
	│ L3: Full Spanish Translation │
	│ "Una tienda tiene 24 manzanas │
	│ divididas igualmente entre 6 │
	│ cajas. ¿Cuántas manzanas hay │
	│ en cada caja?" │
	│ │
	│ [Got it!] [Show me the answer →] │
	└──────────────────────┬──────────────┘
	│ click
	▼
	┌─────────────────────────────────────┐
	│ L4: Step-by-Step Solution │
	│ Step 1: Identify — 24 ÷ 6 │
	│ Step 2: Calculate — 24 ÷ 6 = 4 │
	│ Step 3: Answer — 4 apples per box │
	│ │
	│ [Next Problem →] │
	└─────────────────────────────────────┘
	```

	Each hint interaction is logged with timestamp to compute `escalation_speed` and `scaffold_time_ratio` for the LDS formula.

	---

	### 2.2 Adaptive Engine (Client-Side JavaScript)

	The adaptive engine runs entirely in the browser — no server round-trip needed for difficulty decisions. This ensures instant feedback and works offline after initial question batch load.

	#### Engine Components

	```
	┌─────────────────────────────────────────────────┐
	│ Adaptive Engine (client-side) │
	│ │
	│ ┌─────────────┐ ┌──────────┐ ┌────────────┐ │
	│ │ Elo Rating │ │ BKT │ │ Thompson │ │
	│ │ System │ │ Engine │ │ Sampler │ │
	│ │ │ │ │ │ │ │
	│ │ Updates │ │ P(know) │ │ Beta prior │ │
	│ │ student & │ │ per │ │ per level, │ │
	│ │ question │ │ topic │ │ ZPD window │ │
	│ │ ratings │ │ │ │ │ │
	│ └──────┬──────┘ └────┬─────┘ └─────┬──────┘ │
	│ │ │ │ │
	│ ▼ ▼ ▼ │
	│ ┌───────────────────────────────────────────┐ │
	│ │ Decision Orchestrator │ │
	│ │ │ │
	│ │ Input: weighted_outcome, features │ │
	│ │ Output: next_level, decision_type │ │
	│ │ (increase/maintain/decrease) │ │
	│ └───────────────────────────────────────────┘ │
	└─────────────────────────────────────────────────┘
	```

	#### Elo Update Formula

	```
	weighted_outcome = {
	no_hint: 1.00 (solved without any scaffold)
	L1_only: 0.75 (needed simplified English)
	L2_used: 0.50 (needed bilingual keywords)
	L3_used: 0.25 (needed full translation)
	L4_used: 0.00 (needed answer reveal)
	}

	E_student = 1 / (1 + 10^((R_question - R_student) / 400))
	R_student_new = R_student + K × (weighted_outcome - E_student)

	K = 32 (default), increased to 48 for first 10 interactions (cold-start acceleration)
	```

	#### BKT Parameters (per topic)

	\| Parameter \| Symbol \| Default \| Description \|
	\|---\|---\|---\|---\|
	\| Prior knowledge \| P(L₀) \| 0.10 \| Initial probability student knows topic \|
	\| Learn rate \| P(T) \| 0.15 \| Probability of learning per opportunity \|
	\| Slip \| P(S) \| 0.10 \| Probability of incorrect despite knowing \|
	\| Guess \| P(G) \| 0.25 \| Probability of correct despite not knowing \|

	Slip is adjusted based on hint usage:
	```
	P(S)_adjusted = P(S) × (1 + 0.5 × hint_depth_normalized)
	```
	This models the intuition that using more scaffolds means apparent "correctness" is less certain.

	#### Thompson Sampling with ZPD Windowing

	```
	For each candidate level l in ZPD window [current - 2, current + 3]:
	sample θ_l ~ Beta(α_l, β_l)
	score_l = θ_l × proximity_bonus(l, target_elo)

	Select level = argmax(score_l)

	proximity_bonus(l, target) = exp(-0.5 × ((elo_l - target) / 100)²)
	```

	ZPD window is asymmetric (+3 upward, -2 downward) to encourage upward progression while preventing catastrophic failure.

	#### Progression Decision Rules

	\| Condition \| Decision \| Action \|
	\|---\|---\|---\|
	\| weighted_outcome ≥ 0.75 AND P(know) ≥ 0.70 \| Increase \| Move up 1 sub-level \|
	\| weighted_outcome ≥ 0.85 AND streak ≥ 3 \| Skip \| Move up 2 sub-levels \|
	\| 0.40 ≤ weighted_outcome < 0.75 \| Maintain \| Stay at current level \|
	\| weighted_outcome < 0.40 OR streak_wrong ≥ 2 \| Decrease \| Move down 1 sub-level \|
	\| weighted_outcome < 0.25 AND P(know) < 0.30 \| Rapid Decrease \| Move down 2 sub-levels \|

	---

	### 2.3 Firebase Backend

	Services Used:
	- Firebase Authentication (Google Sign-In, Email/Password)
	- Cloud Firestore (student state, question database, session logs)
	- Cloud Functions (LLM API calls, batch question generation, session reports)
	- Firebase Hosting (static frontend assets)

	#### Firestore Data Model

	```
	firestore/
	├── users/
	│ └── {uid}/
	│ ├── profile: {
	│ │ displayName, email, gradeLevel, preferredLanguage,
	│ │ createdAt, lastActive
	│ │ }
	│ ├── adaptiveState: {
	│ │ currentElo: number, // e.g., 1050
	│ │ currentLevel: string, // e.g., "2.1"
	│ │ totalInteractions: number,
	│ │ topicMastery: { // BKT P(know) per topic
	│ │ "arithmetic": 0.72,
	│ │ "fractions": 0.45,
	│ │ "algebra_basic": 0.31,
	│ │ ...
	│ │ },
	│ │ thompsonPriors: { // Beta(α,β) per level
	│ │ "1.1": { alpha: 12, beta: 3 },
	│ │ "1.2": { alpha: 8, beta: 5 },
	│ │ ...
	│ │ },
	│ │ featureAverages: {
	│ │ avgLDS: 0.42,
	│ │ avgMCS: 0.61,
	│ │ recentLDS_5: [0.3, 0.4, 0.5, 0.35, 0.45],
	│ │ recentMCS_5: [0.6, 0.65, 0.58, 0.62, 0.7]
	│ │ },
	│ │ streakCount: number,
	│ │ lastUpdated: timestamp
	│ │ }
	│ └── sessions/
	│ └── {sessionId}/
	│ ├── metadata: {
	│ │ startTime, endTime, questionsAttempted,
	│ │ questionsCorrect, avgWeightedOutcome,
	│ │ startElo, endElo, sessionLDS, sessionMCS
	│ │ }
	│ └── interactions/
	│ └── {interactionId}: {
	│ questionId, level, topic,
	│ startTime, endTime, timeSpentMs,
	│ hintsUsed: [0,1,2,3,4], // which levels accessed
	│ hintTimestamps: { L1: ts, L2: ts, ... },
	│ maxHintLevel: number,
	│ answer: string,
	│ isCorrect: boolean,
	│ attempts: number,
	│ weightedOutcome: number,
	│ lds: number,
	│ mcs: number,
	│ eloBeforeUpdate: number,
	│ eloAfterUpdate: number,
	│ adaptiveDecision: string
	│ }
	│
	├── questions/
	│ └── {questionId}: {
	│ id, level, topic, subtopic,
	│ problemText, answer, answerNumeric,
	│ solutionSteps: [...],
	│ scaffolds: {
	│ L1_simplified: string,
	│ L2_bilingual: string,
	│ L3_spanish: string,
	│ L4_solution: string
	│ },
	│ readability: {
	│ fleschKincaid: number,
	│ wordCount: number,
	│ difficultWords: number,
	│ avgSyllables: number
	│ },
	│ eloRating: number,
	│ timesServed: number,
	│ avgOutcome: number,
	│ metadata: {
	│ source: "curated" \| "generated",
	│ generatedBy: "gemini-2.0" \| "qwen2.5-3b" \| null,
	│ reviewedBy: string \| null,
	│ createdAt: timestamp
	│ }
	│ }
	│
	├── questionIndex/ // Denormalized for fast queries
	│ └── byLevel/
	│ └── {level}: {
	│ questionIds: [...],
	│ count: number
	│ }
	│
	└── analytics/ // Aggregated (Cloud Functions)
	├── dailyStats/
	│ └── {date}: { activeUsers, sessionsCompleted, ... }
	└── cohortProgress/
	└── {cohortId}: { avgElo, avgLDS, avgMCS, ... }
	```

	#### Firestore Security Rules

	```javascript
	rules_version = '2';
	service cloud.firestore {
	match /databases/{database}/documents {
	// Users can only read/write their own data
	match /users/{uid}/{document=**} {
	allow read, write: if request.auth != null && request.auth.uid == uid;
	}
	// Questions are readable by all authenticated users
	match /questions/{questionId} {
	allow read: if request.auth != null;
	allow write: if false; // Only admin/Cloud Functions
	}
	// Question index readable by all authenticated users
	match /questionIndex/{document=**} {
	allow read: if request.auth != null;
	allow write: if false;
	}
	// Analytics only accessible by admin
	match /analytics/{document=**} {
	allow read, write: if false; // Cloud Functions only
	}
	}
	}
	```

	---

	### 2.4 Cloud Functions (Serverless Backend)

	```
	functions/
	├── onUserCreate.ts # Initialize adaptive state for new user
	├── generateScaffolds.ts # Call Gemini/SLM to create L1-L4 for a problem
	├── batchGenerateQuestions.ts # Generate next 20 questions for session queue
	├── processCustomProblem.ts # "Input your question" flow
	├── generateSessionReport.ts # End-of-session analytics
	├── updateQuestionStats.ts # Update question difficulty from outcomes
	└── scheduledAnalytics.ts # Daily aggregation (cron-triggered)
	```

	#### Key Cloud Function: `generateScaffolds`

	```typescript
	// Triggered when student submits a custom problem or when
	// pre-generating scaffolds for database questions

	interface ScaffoldRequest {
	problemText: string;
	studentGradeLevel: number;
	currentLDS: number; // Informs simplification level
	}

	interface ScaffoldResponse {
	L1_simplified: string; // Simplified English
	L2_bilingual: string; // English with inline Spanish keywords
	L3_spanish: string; // Full Spanish translation
	L4_solution: string; // Step-by-step solution
	answer: string;
	answerNumeric: number;
	}

	// Prompt template for LLM
	const SCAFFOLD_PROMPT = `
	You are a bilingual math tutor helping Spanish-speaking students
	(grades 6-8) learn math in English.

	Given this math word problem:
	"{problemText}"

	Generate 4 scaffold levels:

	L1 (Simplified English): Rewrite using shorter sentences,
	simpler vocabulary (grade {adjustedGrade} reading level).
	Keep all math content identical.

	L2 (Bilingual Keywords): Take the original problem and add
	Spanish translations in parentheses for key math and context
	vocabulary. Format: "English word (palabra en español)".

	L3 (Full Spanish Translation): Translate the complete problem
	to natural, grade-appropriate Spanish. Ensure mathematical
	precision is maintained.

	L4 (Step-by-Step Solution): Provide a clear, numbered
	step-by-step solution in English with the final numerical answer.

	Return as JSON with keys: L1_simplified, L2_bilingual, L3_spanish,
	L4_solution, answer, answerNumeric.
	`;
	```

	#### Key Cloud Function: `batchGenerateQuestions`

	```typescript
	// Called when student reaches question 17 of 20 (prefetch trigger)
	// Selects next 20 questions from database based on adaptive state

	export const batchGenerateQuestions = onCall(async (request) => {
	const { uid } = request.auth;
	const state = await getAdaptiveState(uid);

	// Thompson Sampling selects level distribution for next batch
	const levelDistribution = thompsonSampleBatch(
	state.thompsonPriors,
	state.currentLevel,
	batchSize: 20
	);
	// e.g., { "2.1": 5, "2.2": 8, "2.3": 5, "2.4": 2 }

	// Select questions avoiding recently served ones
	const recentIds = await getRecentQuestionIds(uid, lookback: 100);
	const questions = await selectQuestions(
	levelDistribution,
	excludeIds: recentIds,
	topicBalance: state.topicMastery // Favor weaker topics
	);

	// Ensure all questions have scaffolds generated
	const withScaffolds = await ensureScaffoldsGenerated(questions);

	return { questions: withScaffolds, sessionBatchId: generateId() };
	});
	```

	---

	### 2.5 LLM Service Layer

	#### V1: Gemini API (Current)

	```
	┌────────────┐ HTTPS/REST ┌──────────────────┐
	│ Cloud │ ──────────────────►│ Google Gemini │
	│ Function │ ◄──────────────────│ 2.0 Flash API │
	└────────────┘ └──────────────────┘

	Cost: ~$0.075 per 1M input tokens, ~$0.30 per 1M output tokens
	Latency: 200-800ms per scaffold generation
	Rate limit: 60 RPM (free tier), 1000 RPM (paid)
	```

	#### V2: Qwen2.5-3B SLM (Planned)

	```
	┌────────────┐ HTTPS/REST ┌──────────────────────────┐
	│ Cloud │ ──────────────────►│ HF Inference Endpoint │
	│ Function │ ◄──────────────────│ Qwen2.5-3B-Instruct │
	└────────────┘ │ (QLoRA fine-tuned) │
	│ GPU: T4 or L4 │
	└──────────────────────────┘

	Cost: ~$0.60/hr (T4) or ~$1.04/hr (L4)
	Latency: 100-400ms per scaffold generation
	Rate limit: Unlimited (dedicated endpoint)
	```

	#### LLM Client Abstraction

	```typescript
	// lib/llm-client.ts — Provider-agnostic interface

	interface LLMProvider {
	generateScaffolds(problem: string, context: ScaffoldContext): Promise<ScaffoldResponse>;
	generateQuestion(level: string, topic: string): Promise<QuestionWithScaffolds>;
	validateAnswer(problem: string, studentAnswer: string, correctAnswer: string): Promise<AnswerValidation>;
	}

	class GeminiProvider implements LLMProvider { ... }
	class QwenSLMProvider implements LLMProvider { ... }

	// Factory with fallback
	function createLLMClient(): LLMProvider {
	if (config.useSLM && config.slmEndpointAvailable) {
	return new QwenSLMProvider(config.slmEndpoint);
	}
	return new GeminiProvider(config.geminiApiKey);
	}
	```

	---

	### 2.6 SLM Fine-Tuning Pipeline

	```
	┌──────────────┐ ┌──────────────┐ ┌──────────────────┐
	│ Training │ │ Fine-Tune │ │ Deploy │
	│ Data Prep │───►│ QLoRA SFT │───►│ HF Inference EP │
	└──────────────┘ └──────────────┘ └──────────────────┘

	Step 1: Collect 2,000-5,000 scaffold examples from Gemini V1 usage
	Step 2: Human review + quality filter → ~1,500 gold examples
	Step 3: QLoRA fine-tune Qwen2.5-3B-Instruct
	Step 4: Evaluate on held-out test set (BLEU, math accuracy, readability)
	Step 5: Deploy to HF Inference Endpoint
	Step 6: Shadow-test alongside Gemini (serve both, compare quality)
	Step 7: Full cutover when SLM matches Gemini quality
	```

	Fine-tuning Configuration:

	\| Parameter \| Value \| Rationale \|
	\|---\|---\|---\|
	\| Base model \| Qwen2.5-3B-Instruct \| Best math+Spanish at 3B scale \|
	\| Method \| QLoRA (4-bit NF4) \| Fits single 16GB GPU \|
	\| LoRA rank (r) \| 32 \| Balance quality/efficiency for small dataset \|
	\| LoRA alpha \| 64 \| Standard 2× rank \|
	\| Target modules \| q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj \| Full attention + MLP \|
	\| Learning rate \| 2e-4 \| Standard for QLoRA \|
	\| Epochs \| 3-5 \| Small dataset, monitor val loss \|
	\| Batch size \| 4 (effective 16 with grad accum) \| Memory constraint \|
	\| Max sequence length \| 1024 \| Sufficient for problem + all 4 scaffolds \|
	\| Warmup ratio \| 0.05 \| Short warmup for small dataset \|

	---

	## 3. Data Flow Diagrams

	### 3.1 Flow A: "Practice Problems" Mode

	```
	Student clicks "Start Practice"
	│
	▼
	┌─────────────────────────────────┐
	│ 1. Load adaptive state from │
	│ Firestore (Elo, BKT, priors) │
	└────────────┬────────────────────┘
	│
	▼
	┌─────────────────────────────────┐
	│ 2. Thompson Sampling selects │
	│ next question level │
	│ (ZPD window: current ±2/+3) │
	└────────────┬────────────────────┘
	│
	▼
	┌─────────────────────────────────┐
	│ 3. Fetch question from Firestore│
	│ by level + topic balancing │
	│ (avoid recently served) │
	└────────────┬────────────────────┘
	│
	▼
	┌─────────────────────────────────┐
	│ 4. Display problem, start timer │
	│ Student reads and attempts │
	└────────────┬────────────────────┘
	│
	┌────────┴────────┐
	│ Needs hints? │
	▼ No ▼ Yes
	┌─────────┐ ┌───────────────────┐
	│ Submit │ │ L1 → L2 → L3 → L4│
	│ answer │ │ (each click logged │
	└────┬────┘ │ with timestamp) │
	│ └────────┬───────────┘
	│ │
	▼ ▼
	┌─────────────────────────────────┐
	│ 5. Compute weighted_outcome │
	│ based on correctness + hints │
	│ Compute LDS and MCS │
	└────────────┬────────────────────┘
	│
	▼
	┌─────────────────────────────────┐
	│ 6. Update Elo (student + Q) │
	│ Update BKT P(know) for topic │
	│ Update Thompson Beta priors │
	└────────────┬────────────────────┘
	│
	▼
	┌─────────────────────────────────┐
	│ 7. Progression decision: │
	│ increase / maintain / decrease│
	│ Select next level │
	└────────────┬────────────────────┘
	│
	▼
	┌─────────────────────────────────┐
	│ 8. Save interaction to Firestore│
	│ Display "Next Problem" │
	└────────────┬────────────────────┘
	│
	▼
	┌────────┴────────┐
	│ Q17 of 20? │
	▼ Yes ▼ No
	┌─────────────┐ ┌──────────┐
	│ Prefetch │ │ Loop to │
	│ next batch │ │ step 2 │
	│ (Cloud Fn) │ └──────────┘
	└─────────────┘
	│
	At Q20: ▼
	┌─────────────────────────────────┐
	│ 9. Generate session report │
	│ (Cloud Function) │
	│ Show summary to student │
	└─────────────────────────────────┘
	```

	### 3.2 Flow B: "Input Your Question" Mode

	```
	Student types/pastes a math word problem
	│
	▼
	┌─────────────────────────────────┐
	│ 1. Cloud Function: │
	│ processCustomProblem │
	│ - Validate it's a math │
	│ word problem │
	│ - Extract answer/solution │
	│ - Call Gemini/SLM to generate│
	│ L1, L2, L3, L4 scaffolds │
	└────────────┬────────────────────┘
	│
	▼
	┌─────────────────────────────────┐
	│ 2. Estimate difficulty level │
	│ using readability metrics │
	│ (FK grade, word count, etc.) │
	│ Map to nearest Elo rating │
	└────────────┬────────────────────┘
	│
	▼
	┌─────────────────────────────────┐
	│ 3. Display problem with │
	│ scaffold buttons active │
	│ (same UI as Practice mode) │
	└────────────┬────────────────────┘
	│
	▼
	┌─────────────────────────────────┐
	│ 4. Student interacts, solves │
	│ Same hint tracking as │
	│ Practice mode │
	└────────────┬────────────────────┘
	│
	▼
	┌─────────────────────────────────┐
	│ 5. Update adaptive state │
	│ (Elo, BKT, Thompson) │
	│ Log interaction │
	└────────────┬────────────────────┘
	│
	▼
	┌─────────────────────────────────┐
	│ 6. Offer: "Try another?" or │
	│ "Switch to Practice Mode" │
	│ (where engine auto-selects) │
	└─────────────────────────────────┘
	```

	---

	## 4. API Contracts

	### 4.1 Client → Cloud Functions

	```typescript
	// POST /generateScaffolds
	interface GenerateScaffoldsRequest {
	problemText: string;
	gradeLevel: number; // 6, 7, or 8
	currentLDS: number; // 0.0-1.0, informs simplification
	}
	interface GenerateScaffoldsResponse {
	scaffolds: {
	L1_simplified: string;
	L2_bilingual: string;
	L3_spanish: string;
	L4_solution: string;
	};
	answer: string;
	answerNumeric: number;
	estimatedLevel: string; // e.g., "2.3"
	estimatedElo: number; // e.g., 1100
	processingTimeMs: number;
	}

	// POST /batchGenerateQuestions
	interface BatchRequest {
	batchSize: number; // default 20
	// Auth token provides uid → adaptive state looked up server-side
	}
	interface BatchResponse {
	questions: QuestionWithScaffolds[];
	sessionBatchId: string;
	}

	// POST /submitInteraction
	interface InteractionSubmission {
	sessionId: string;
	questionId: string;
	answer: string;
	isCorrect: boolean;
	timeSpentMs: number;
	hintsUsed: number[]; // [0], [0,1], [0,1,2], etc.
	hintTimestamps: Record<string, number>;
	attempts: number;
	}
	interface InteractionResponse {
	weightedOutcome: number;
	lds: number;
	mcs: number;
	newElo: number;
	newLevel: string;
	decision: "increase" \| "maintain" \| "decrease" \| "skip" \| "rapid_decrease";
	nextQuestion: QuestionWithScaffolds; // Pre-selected
	}

	// POST /generateSessionReport
	interface SessionReportRequest {
	sessionId: string;
	}
	interface SessionReportResponse {
	summary: {
	questionsAttempted: number;
	questionsCorrect: number;
	avgWeightedOutcome: number;
	eloChange: number;
	topicsStrong: string[];
	topicsWeak: string[];
	avgLDS: number;
	avgMCS: number;
	languageProgressNote: string; // Generated text about L2 progress
	};
	recommendations: string[]; // e.g., "Focus on fractions vocabulary"
	}
	```

	---

	## 5. Deployment Architecture

	### 5.1 V1 Deployment (MVP)

	```
	┌──────────────────────────────────────────────────────────┐
	│ Firebase Project │
	│ │
	│ ┌─────────────┐ ┌─────────────┐ ┌──────────────────┐ │
	│ │ Firebase │ │ Cloud │ │ Cloud │ │
	│ │ Hosting │ │ Firestore │ │ Functions │ │
	│ │ (Next.js) │ │ (Database) │ │ (Node.js 20) │ │
	│ │ │ │ │ │ │ │
	│ │ Static + │ │ Student │ │ LLM calls │ │
	│ │ SSR pages │ │ state, │ │ Batch gen │ │
	│ │ │ │ questions, │ │ Reports │ │
	│ │ │ │ sessions │ │ │ │
	│ └──────────────┘ └──────────────┘ └───────┬──────────┘ │
	│ │ │
	└──────────────────────────────────────────────┼─────────────┘
	│
	HTTPS │
	▼
	┌──────────────────┐
	│ Google Gemini │
	│ 2.0 Flash API │
	└──────────────────┘

	Estimated monthly cost (100 students, 5 sessions/week):
	- Firebase Hosting: Free tier (~$0)
	- Firestore: ~$5/mo (reads/writes within free tier mostly)
	- Cloud Functions: ~$10/mo (invocations + compute)
	- Gemini API: ~$15-25/mo (scaffold generation)
	- Total: ~$30-40/mo
	```

	### 5.2 V2 Deployment (SLM)

	```
	┌──────────────────────────────────────────────────────────┐
	│ Firebase Project │
	│ │
	│ ┌─────────────┐ ┌─────────────┐ ┌──────────────────┐ │
	│ │ Firebase │ │ Cloud │ │ Cloud │ │
	│ │ Hosting │ │ Firestore │ │ Functions │ │
	│ └──────────────┘ └──────────────┘ └───────┬──────────┘ │
	│ │ │
	└──────────────────────────────────────────────┼─────────────┘
	│
	┌─────────────────┼──────────────┐
	│ │ │
	▼ ▼ │
	┌──────────────────┐ ┌──────────────┐ │
	│ HF Inference │ │ Gemini API │ │
	│ Endpoint │ │ (fallback) │ │
	│ Qwen2.5-3B │ └──────────────┘ │
	│ QLoRA FT │ │
	│ (T4 GPU) │ Shadow testing: │
	└──────────────────┘ Both called, SLM │
	response served, │
	Gemini response │
	logged for QA │
	─────────────────────┘

	Estimated monthly cost (100 students):
	- Firebase: ~$15/mo (same as V1)
	- HF Inference Endpoint (T4, scale-to-zero): ~$50-100/mo
	(active only during school hours, ~8hrs/day × 20 days)
	- Gemini fallback: ~$5/mo (only when SLM is cold)
	- Total: ~$70-120/mo (but no per-token costs at scale)
	```

	### 5.3 V3 Deployment (Scale)

	```
	When student count exceeds 500+, migrate to:

	┌─────────────────────────────────────────────────────────────┐
	│ │
	│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
	│ │ Vercel │ │ Firebase │ │ Cloud Run │ │
	│ │ (Next.js) │ │ Firestore │ │ (API server) │ │
	│ └──────────────┘ └──────────────┘ └───────┬──────────┘ │
	│ │ │
	│ ┌─────────────────────────┼──────┐ │
	│ │ │ │ │
	│ ▼ ▼ │ │
	│ ┌──────────────────┐ ┌─────────────────┐ │ │
	│ │ HF Inference EP │ │ IRT/DKT Model │ │ │
	│ │ Qwen2.5-3B │ │ Server │ │ │
	│ │ (Auto-scaling) │ │ (Python/FastAPI)│ │ │
	│ └──────────────────┘ └─────────────────┘ │ │
	│ │ │
	│ + Deep Knowledge Tracing (DKT) replaces BKT │ │
	│ + IRT item calibration from pooled student data │ │
	│ + A/B testing framework for algorithm improvements │ │
	└─────────────────────────────────────────────────────────────┘
	```

	---

	## 6. Technology Stack Summary

	\| Layer \| Technology \| Justification \|
	\|---\|---\|---\|
	\| Frontend Framework \| Next.js 14+ (App Router) \| SSR for SEO, React ecosystem, TypeScript \|
	\| UI Styling \| Tailwind CSS + shadcn/ui \| Rapid prototyping, consistent design \|
	\| Math Rendering \| KaTeX \| Fast client-side LaTeX rendering \|
	\| Charts \| Recharts \| React-native charting for dashboards \|
	\| Authentication \| Firebase Auth \| Google Sign-In, simple integration \|
	\| Database \| Cloud Firestore \| Real-time sync, offline support, serverless \|
	\| Serverless Functions \| Firebase Cloud Functions (Node.js 20) \| Low latency, Firebase integration \|
	\| LLM (V1) \| Google Gemini 2.0 Flash \| Low cost, fast, good multilingual \|
	\| SLM (V2) \| Qwen2.5-3B-Instruct (QLoRA fine-tuned) \| Best math+Spanish at 3B, Apache 2.0 \|
	\| SLM Hosting \| HF Inference Endpoints (T4, scale-to-zero) \| Cost-effective, no infra management \|
	\| Adaptive Engine \| Client-side TypeScript \| Zero-latency decisions, works offline \|
	\| State Management \| Zustand + Firestore sync \| Lightweight, persists across sessions \|
	\| Testing \| Vitest + Playwright \| Unit + E2E testing \|
	\| CI/CD \| GitHub Actions \| Automated testing + Firebase deploy \|
	\| Monitoring \| Firebase Analytics + Crashlytics \| User behavior + error tracking \|

	---

	## 7. Security & Privacy Considerations

	### 7.1 Data Protection
	- COPPA Compliance: Students are minors (ages 11-14). No personally identifiable information stored beyond email/display name. No third-party tracking.
	- FERPA Alignment: Performance data (Elo, LDS, MCS) is associated with uid only. Teachers/admins see aggregate data, never individual student identifiers.
	- Data Encryption: Firestore encrypts at rest (AES-256). All API calls over HTTPS/TLS 1.3.

	### 7.2 API Security
	- Firebase Auth tokens required for all Cloud Function calls
	- Gemini/SLM API keys stored in Firebase environment secrets (never client-side)
	- Rate limiting on Cloud Functions to prevent abuse (max 10 scaffold generations per minute per user)

	### 7.3 Content Safety
	- All LLM-generated scaffolds pass through a validation function checking:
	- Mathematical accuracy (answer matches expected)
	- Appropriate content (no adult/violent themes)
	- Language accuracy (Spanish translation verified against expected pattern)
	- Questions from the curated database are pre-reviewed; generated questions flagged for human review

	---

	## 8. Performance Targets

	\| Metric \| Target \| Measurement \|
	\|---\|---\|---\|
	\| Time to first problem display \| < 2 seconds \| Lighthouse / Firebase Performance \|
	\| Adaptive decision latency \| < 50ms \| Client-side (no network) \|
	\| Scaffold generation (Gemini) \| < 1.5 seconds \| Cloud Function logs \|
	\| Scaffold generation (SLM) \| < 800ms \| HF Inference EP metrics \|
	\| Batch prefetch trigger → ready \| < 5 seconds \| 20 questions fetched at Q17 \|
	\| Offline capability \| Full session \| After initial batch load \|
	\| Concurrent users (V1) \| 50 \| Firebase free/Blaze tier \|
	\| Concurrent users (V2) \| 500+ \| HF auto-scaling endpoint \|