MathLingua β System Architecture Document
1. System Overview
MathLingua is a bilingual adaptive math tutoring application for Spanish-speaking students (grades 6β8) transitioning to English-medium mathematics education. The system presents math word problems with 4 scaffolded hint levels and uses a hybrid adaptive algorithm to personalize difficulty progression.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MathLingua System β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββ β
β β Frontend β β Backend β β External Services β β
β β (Next.js) ββββΊβ (Firebase) ββββΊβ (LLM / SLM) β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββββββ¬ββββββββββββ β
β β β β β
β βΌ βΌ βΌ β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββ β
β β Adaptive β β Firestore β β V1: Gemini API β β
β β Engine β β Database β β V2: Qwen2.5-3B SLM β β
β β (Client JS) β β β β (HF Inference EP) β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2. Component Architecture
2.1 Frontend β React / Next.js Application
Technology: Next.js 14+ (App Router), TypeScript, Tailwind CSS
Hosting: Firebase Hosting or Vercel
Key Pages/Routes
| Route | Component | Purpose |
|---|---|---|
/ |
LandingPage |
Login/signup, language preference |
/dashboard |
StudentDashboard |
Progress overview, session history, MCS/LDS charts |
/practice |
PracticeSession |
Adaptive practice from question database |
/solve |
CustomProblem |
"Input your question" β Gemini/SLM processes user-submitted problems |
/session-report |
SessionReport |
End-of-session summary with performance analytics |
Core Frontend Components
src/
βββ components/
β βββ ProblemDisplay/
β β βββ MathProblem.tsx # Renders word problem text
β β βββ HintScaffold.tsx # L1/L2/L3/L4 progressive hint UI
β β βββ AnswerInput.tsx # Numeric/expression answer entry
β β βββ SolutionReveal.tsx # L4 step-by-step solution display
β βββ Adaptive/
β β βββ DifficultyIndicator.tsx # Visual current-level indicator
β β βββ ProgressBar.tsx # Session progress (e.g., 7/20)
β β βββ SessionTimer.tsx # Time tracking per problem
β βββ Dashboard/
β β βββ EloChart.tsx # Elo rating over time (Recharts)
β β βββ TopicHeatmap.tsx # Performance by math topic
β β βββ LDSMCSPanel.tsx # Language Dependency & Math Confidence
β β βββ StreakBadge.tsx # Gamification elements
β βββ Shared/
β βββ BilingualToggle.tsx # EN/ES interface language switch
β βββ MathRenderer.tsx # KaTeX for math expressions
β βββ LoadingSkeleton.tsx
βββ lib/
β βββ adaptive-engine.ts # Elo + BKT + Thompson Sampling (client-side)
β βββ feature-engineer.ts # LDS & MCS computation
β βββ firebase.ts # Firebase SDK initialization
β βββ llm-client.ts # Gemini/SLM API abstraction
βββ hooks/
β βββ useAdaptiveSession.ts # Manages session state + engine calls
β βββ useStudentProfile.ts # Reads/writes Firestore student state
β βββ useQuestionQueue.ts # Pre-fetches next batch of questions
βββ types/
βββ index.ts # TypeScript interfaces for all data structures
Hint Scaffold UI Flow
βββββββββββββββββββββββββββββββββββββββ
β Problem displayed in original β
β English at student's current level β
β β
β [Try to solve] [I need a hint β] β
ββββββββββββββββββββββββ¬βββββββββββββββ
β click
βΌ
βββββββββββββββββββββββββββββββββββββββ
β L1: Simplified English β
β "A store has 24 apples..." β
β β
β [Got it!] [Still stuck β] β
ββββββββββββββββββββββββ¬βββββββββββββββ
β click
βΌ
βββββββββββββββββββββββββββββββββββββββ
β L2: Bilingual Keywords Inline β
β "A store has 24 apples (manzanas)" β
β "divided equally (dividido β
β igualmente) among 6 boxes" β
β β
β [Got it!] [Still stuck β] β
ββββββββββββββββββββββββ¬βββββββββββββββ
β click
βΌ
βββββββββββββββββββββββββββββββββββββββ
β L3: Full Spanish Translation β
β "Una tienda tiene 24 manzanas β
β divididas igualmente entre 6 β
β cajas. ΒΏCuΓ‘ntas manzanas hay β
β en cada caja?" β
β β
β [Got it!] [Show me the answer β] β
ββββββββββββββββββββββββ¬βββββββββββββββ
β click
βΌ
βββββββββββββββββββββββββββββββββββββββ
β L4: Step-by-Step Solution β
β Step 1: Identify β 24 Γ· 6 β
β Step 2: Calculate β 24 Γ· 6 = 4 β
β Step 3: Answer β 4 apples per box β
β β
β [Next Problem β] β
βββββββββββββββββββββββββββββββββββββββ
Each hint interaction is logged with timestamp to compute escalation_speed and scaffold_time_ratio for the LDS formula.
2.2 Adaptive Engine (Client-Side JavaScript)
The adaptive engine runs entirely in the browser β no server round-trip needed for difficulty decisions. This ensures instant feedback and works offline after initial question batch load.
Engine Components
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Adaptive Engine (client-side) β
β β
β βββββββββββββββ ββββββββββββ ββββββββββββββ β
β β Elo Rating β β BKT β β Thompson β β
β β System β β Engine β β Sampler β β
β β β β β β β β
β β Updates β β P(know) β β Beta prior β β
β β student & β β per β β per level, β β
β β question β β topic β β ZPD window β β
β β ratings β β β β β β
β ββββββββ¬βββββββ ββββββ¬ββββββ βββββββ¬βββββββ β
β β β β β
β βΌ βΌ βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββ β
β β Decision Orchestrator β β
β β β β
β β Input: weighted_outcome, features β β
β β Output: next_level, decision_type β β
β β (increase/maintain/decrease) β β
β βββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Elo Update Formula
weighted_outcome = {
no_hint: 1.00 (solved without any scaffold)
L1_only: 0.75 (needed simplified English)
L2_used: 0.50 (needed bilingual keywords)
L3_used: 0.25 (needed full translation)
L4_used: 0.00 (needed answer reveal)
}
E_student = 1 / (1 + 10^((R_question - R_student) / 400))
R_student_new = R_student + K Γ (weighted_outcome - E_student)
K = 32 (default), increased to 48 for first 10 interactions (cold-start acceleration)
BKT Parameters (per topic)
| Parameter | Symbol | Default | Description |
|---|---|---|---|
| Prior knowledge | P(Lβ) | 0.10 | Initial probability student knows topic |
| Learn rate | P(T) | 0.15 | Probability of learning per opportunity |
| Slip | P(S) | 0.10 | Probability of incorrect despite knowing |
| Guess | P(G) | 0.25 | Probability of correct despite not knowing |
Slip is adjusted based on hint usage:
P(S)_adjusted = P(S) Γ (1 + 0.5 Γ hint_depth_normalized)
This models the intuition that using more scaffolds means apparent "correctness" is less certain.
Thompson Sampling with ZPD Windowing
For each candidate level l in ZPD window [current - 2, current + 3]:
sample ΞΈ_l ~ Beta(Ξ±_l, Ξ²_l)
score_l = ΞΈ_l Γ proximity_bonus(l, target_elo)
Select level = argmax(score_l)
proximity_bonus(l, target) = exp(-0.5 Γ ((elo_l - target) / 100)Β²)
ZPD window is asymmetric (+3 upward, -2 downward) to encourage upward progression while preventing catastrophic failure.
Progression Decision Rules
| Condition | Decision | Action |
|---|---|---|
| weighted_outcome β₯ 0.75 AND P(know) β₯ 0.70 | Increase | Move up 1 sub-level |
| weighted_outcome β₯ 0.85 AND streak β₯ 3 | Skip | Move up 2 sub-levels |
| 0.40 β€ weighted_outcome < 0.75 | Maintain | Stay at current level |
| weighted_outcome < 0.40 OR streak_wrong β₯ 2 | Decrease | Move down 1 sub-level |
| weighted_outcome < 0.25 AND P(know) < 0.30 | Rapid Decrease | Move down 2 sub-levels |
2.3 Firebase Backend
Services Used:
- Firebase Authentication (Google Sign-In, Email/Password)
- Cloud Firestore (student state, question database, session logs)
- Cloud Functions (LLM API calls, batch question generation, session reports)
- Firebase Hosting (static frontend assets)
Firestore Data Model
firestore/
βββ users/
β βββ {uid}/
β βββ profile: {
β β displayName, email, gradeLevel, preferredLanguage,
β β createdAt, lastActive
β β }
β βββ adaptiveState: {
β β currentElo: number, // e.g., 1050
β β currentLevel: string, // e.g., "2.1"
β β totalInteractions: number,
β β topicMastery: { // BKT P(know) per topic
β β "arithmetic": 0.72,
β β "fractions": 0.45,
β β "algebra_basic": 0.31,
β β ...
β β },
β β thompsonPriors: { // Beta(Ξ±,Ξ²) per level
β β "1.1": { alpha: 12, beta: 3 },
β β "1.2": { alpha: 8, beta: 5 },
β β ...
β β },
β β featureAverages: {
β β avgLDS: 0.42,
β β avgMCS: 0.61,
β β recentLDS_5: [0.3, 0.4, 0.5, 0.35, 0.45],
β β recentMCS_5: [0.6, 0.65, 0.58, 0.62, 0.7]
β β },
β β streakCount: number,
β β lastUpdated: timestamp
β β }
β βββ sessions/
β βββ {sessionId}/
β βββ metadata: {
β β startTime, endTime, questionsAttempted,
β β questionsCorrect, avgWeightedOutcome,
β β startElo, endElo, sessionLDS, sessionMCS
β β }
β βββ interactions/
β βββ {interactionId}: {
β questionId, level, topic,
β startTime, endTime, timeSpentMs,
β hintsUsed: [0,1,2,3,4], // which levels accessed
β hintTimestamps: { L1: ts, L2: ts, ... },
β maxHintLevel: number,
β answer: string,
β isCorrect: boolean,
β attempts: number,
β weightedOutcome: number,
β lds: number,
β mcs: number,
β eloBeforeUpdate: number,
β eloAfterUpdate: number,
β adaptiveDecision: string
β }
β
βββ questions/
β βββ {questionId}: {
β id, level, topic, subtopic,
β problemText, answer, answerNumeric,
β solutionSteps: [...],
β scaffolds: {
β L1_simplified: string,
β L2_bilingual: string,
β L3_spanish: string,
β L4_solution: string
β },
β readability: {
β fleschKincaid: number,
β wordCount: number,
β difficultWords: number,
β avgSyllables: number
β },
β eloRating: number,
β timesServed: number,
β avgOutcome: number,
β metadata: {
β source: "curated" | "generated",
β generatedBy: "gemini-2.0" | "qwen2.5-3b" | null,
β reviewedBy: string | null,
β createdAt: timestamp
β }
β }
β
βββ questionIndex/ // Denormalized for fast queries
β βββ byLevel/
β βββ {level}: {
β questionIds: [...],
β count: number
β }
β
βββ analytics/ // Aggregated (Cloud Functions)
βββ dailyStats/
β βββ {date}: { activeUsers, sessionsCompleted, ... }
βββ cohortProgress/
βββ {cohortId}: { avgElo, avgLDS, avgMCS, ... }
Firestore Security Rules
rules_version = '2';
service cloud.firestore {
match /databases/{database}/documents {
// Users can only read/write their own data
match /users/{uid}/{document=**} {
allow read, write: if request.auth != null && request.auth.uid == uid;
}
// Questions are readable by all authenticated users
match /questions/{questionId} {
allow read: if request.auth != null;
allow write: if false; // Only admin/Cloud Functions
}
// Question index readable by all authenticated users
match /questionIndex/{document=**} {
allow read: if request.auth != null;
allow write: if false;
}
// Analytics only accessible by admin
match /analytics/{document=**} {
allow read, write: if false; // Cloud Functions only
}
}
}
2.4 Cloud Functions (Serverless Backend)
functions/
βββ onUserCreate.ts # Initialize adaptive state for new user
βββ generateScaffolds.ts # Call Gemini/SLM to create L1-L4 for a problem
βββ batchGenerateQuestions.ts # Generate next 20 questions for session queue
βββ processCustomProblem.ts # "Input your question" flow
βββ generateSessionReport.ts # End-of-session analytics
βββ updateQuestionStats.ts # Update question difficulty from outcomes
βββ scheduledAnalytics.ts # Daily aggregation (cron-triggered)
Key Cloud Function: generateScaffolds
// Triggered when student submits a custom problem or when
// pre-generating scaffolds for database questions
interface ScaffoldRequest {
problemText: string;
studentGradeLevel: number;
currentLDS: number; // Informs simplification level
}
interface ScaffoldResponse {
L1_simplified: string; // Simplified English
L2_bilingual: string; // English with inline Spanish keywords
L3_spanish: string; // Full Spanish translation
L4_solution: string; // Step-by-step solution
answer: string;
answerNumeric: number;
}
// Prompt template for LLM
const SCAFFOLD_PROMPT = `
You are a bilingual math tutor helping Spanish-speaking students
(grades 6-8) learn math in English.
Given this math word problem:
"{problemText}"
Generate 4 scaffold levels:
**L1 (Simplified English):** Rewrite using shorter sentences,
simpler vocabulary (grade {adjustedGrade} reading level).
Keep all math content identical.
**L2 (Bilingual Keywords):** Take the original problem and add
Spanish translations in parentheses for key math and context
vocabulary. Format: "English word (palabra en espaΓ±ol)".
**L3 (Full Spanish Translation):** Translate the complete problem
to natural, grade-appropriate Spanish. Ensure mathematical
precision is maintained.
**L4 (Step-by-Step Solution):** Provide a clear, numbered
step-by-step solution in English with the final numerical answer.
Return as JSON with keys: L1_simplified, L2_bilingual, L3_spanish,
L4_solution, answer, answerNumeric.
`;
Key Cloud Function: batchGenerateQuestions
// Called when student reaches question 17 of 20 (prefetch trigger)
// Selects next 20 questions from database based on adaptive state
export const batchGenerateQuestions = onCall(async (request) => {
const { uid } = request.auth;
const state = await getAdaptiveState(uid);
// Thompson Sampling selects level distribution for next batch
const levelDistribution = thompsonSampleBatch(
state.thompsonPriors,
state.currentLevel,
batchSize: 20
);
// e.g., { "2.1": 5, "2.2": 8, "2.3": 5, "2.4": 2 }
// Select questions avoiding recently served ones
const recentIds = await getRecentQuestionIds(uid, lookback: 100);
const questions = await selectQuestions(
levelDistribution,
excludeIds: recentIds,
topicBalance: state.topicMastery // Favor weaker topics
);
// Ensure all questions have scaffolds generated
const withScaffolds = await ensureScaffoldsGenerated(questions);
return { questions: withScaffolds, sessionBatchId: generateId() };
});
2.5 LLM Service Layer
V1: Gemini API (Current)
ββββββββββββββ HTTPS/REST ββββββββββββββββββββ
β Cloud β βββββββββββββββββββΊβ Google Gemini β
β Function β ββββββββββββββββββββ 2.0 Flash API β
ββββββββββββββ ββββββββββββββββββββ
Cost: ~$0.075 per 1M input tokens, ~$0.30 per 1M output tokens
Latency: 200-800ms per scaffold generation
Rate limit: 60 RPM (free tier), 1000 RPM (paid)
V2: Qwen2.5-3B SLM (Planned)
ββββββββββββββ HTTPS/REST ββββββββββββββββββββββββββββ
β Cloud β βββββββββββββββββββΊβ HF Inference Endpoint β
β Function β ββββββββββββββββββββ Qwen2.5-3B-Instruct β
ββββββββββββββ β (QLoRA fine-tuned) β
β GPU: T4 or L4 β
ββββββββββββββββββββββββββββ
Cost: ~$0.60/hr (T4) or ~$1.04/hr (L4)
Latency: 100-400ms per scaffold generation
Rate limit: Unlimited (dedicated endpoint)
LLM Client Abstraction
// lib/llm-client.ts β Provider-agnostic interface
interface LLMProvider {
generateScaffolds(problem: string, context: ScaffoldContext): Promise<ScaffoldResponse>;
generateQuestion(level: string, topic: string): Promise<QuestionWithScaffolds>;
validateAnswer(problem: string, studentAnswer: string, correctAnswer: string): Promise<AnswerValidation>;
}
class GeminiProvider implements LLMProvider { ... }
class QwenSLMProvider implements LLMProvider { ... }
// Factory with fallback
function createLLMClient(): LLMProvider {
if (config.useSLM && config.slmEndpointAvailable) {
return new QwenSLMProvider(config.slmEndpoint);
}
return new GeminiProvider(config.geminiApiKey);
}
2.6 SLM Fine-Tuning Pipeline
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ
β Training β β Fine-Tune β β Deploy β
β Data Prep βββββΊβ QLoRA SFT βββββΊβ HF Inference EP β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ
Step 1: Collect 2,000-5,000 scaffold examples from Gemini V1 usage
Step 2: Human review + quality filter β ~1,500 gold examples
Step 3: QLoRA fine-tune Qwen2.5-3B-Instruct
Step 4: Evaluate on held-out test set (BLEU, math accuracy, readability)
Step 5: Deploy to HF Inference Endpoint
Step 6: Shadow-test alongside Gemini (serve both, compare quality)
Step 7: Full cutover when SLM matches Gemini quality
Fine-tuning Configuration:
| Parameter | Value | Rationale |
|---|---|---|
| Base model | Qwen2.5-3B-Instruct | Best math+Spanish at 3B scale |
| Method | QLoRA (4-bit NF4) | Fits single 16GB GPU |
| LoRA rank (r) | 32 | Balance quality/efficiency for small dataset |
| LoRA alpha | 64 | Standard 2Γ rank |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | Full attention + MLP |
| Learning rate | 2e-4 | Standard for QLoRA |
| Epochs | 3-5 | Small dataset, monitor val loss |
| Batch size | 4 (effective 16 with grad accum) | Memory constraint |
| Max sequence length | 1024 | Sufficient for problem + all 4 scaffolds |
| Warmup ratio | 0.05 | Short warmup for small dataset |
3. Data Flow Diagrams
3.1 Flow A: "Practice Problems" Mode
Student clicks "Start Practice"
β
βΌ
βββββββββββββββββββββββββββββββββββ
β 1. Load adaptive state from β
β Firestore (Elo, BKT, priors) β
ββββββββββββββ¬βββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β 2. Thompson Sampling selects β
β next question level β
β (ZPD window: current Β±2/+3) β
ββββββββββββββ¬βββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β 3. Fetch question from Firestoreβ
β by level + topic balancing β
β (avoid recently served) β
ββββββββββββββ¬βββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β 4. Display problem, start timer β
β Student reads and attempts β
ββββββββββββββ¬βββββββββββββββββββββ
β
ββββββββββ΄βββββββββ
β Needs hints? β
βΌ No βΌ Yes
βββββββββββ βββββββββββββββββββββ
β Submit β β L1 β L2 β L3 β L4β
β answer β β (each click logged β
ββββββ¬βββββ β with timestamp) β
β ββββββββββ¬ββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββ
β 5. Compute weighted_outcome β
β based on correctness + hints β
β Compute LDS and MCS β
ββββββββββββββ¬βββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β 6. Update Elo (student + Q) β
β Update BKT P(know) for topic β
β Update Thompson Beta priors β
ββββββββββββββ¬βββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β 7. Progression decision: β
β increase / maintain / decreaseβ
β Select next level β
ββββββββββββββ¬βββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β 8. Save interaction to Firestoreβ
β Display "Next Problem" β
ββββββββββββββ¬βββββββββββββββββββββ
β
βΌ
ββββββββββ΄βββββββββ
β Q17 of 20? β
βΌ Yes βΌ No
βββββββββββββββ ββββββββββββ
β Prefetch β β Loop to β
β next batch β β step 2 β
β (Cloud Fn) β ββββββββββββ
βββββββββββββββ
β
At Q20: βΌ
βββββββββββββββββββββββββββββββββββ
β 9. Generate session report β
β (Cloud Function) β
β Show summary to student β
βββββββββββββββββββββββββββββββββββ
3.2 Flow B: "Input Your Question" Mode
Student types/pastes a math word problem
β
βΌ
βββββββββββββββββββββββββββββββββββ
β 1. Cloud Function: β
β processCustomProblem β
β - Validate it's a math β
β word problem β
β - Extract answer/solution β
β - Call Gemini/SLM to generateβ
β L1, L2, L3, L4 scaffolds β
ββββββββββββββ¬βββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β 2. Estimate difficulty level β
β using readability metrics β
β (FK grade, word count, etc.) β
β Map to nearest Elo rating β
ββββββββββββββ¬βββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β 3. Display problem with β
β scaffold buttons active β
β (same UI as Practice mode) β
ββββββββββββββ¬βββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β 4. Student interacts, solves β
β Same hint tracking as β
β Practice mode β
ββββββββββββββ¬βββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β 5. Update adaptive state β
β (Elo, BKT, Thompson) β
β Log interaction β
ββββββββββββββ¬βββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β 6. Offer: "Try another?" or β
β "Switch to Practice Mode" β
β (where engine auto-selects) β
βββββββββββββββββββββββββββββββββββ
4. API Contracts
4.1 Client β Cloud Functions
// POST /generateScaffolds
interface GenerateScaffoldsRequest {
problemText: string;
gradeLevel: number; // 6, 7, or 8
currentLDS: number; // 0.0-1.0, informs simplification
}
interface GenerateScaffoldsResponse {
scaffolds: {
L1_simplified: string;
L2_bilingual: string;
L3_spanish: string;
L4_solution: string;
};
answer: string;
answerNumeric: number;
estimatedLevel: string; // e.g., "2.3"
estimatedElo: number; // e.g., 1100
processingTimeMs: number;
}
// POST /batchGenerateQuestions
interface BatchRequest {
batchSize: number; // default 20
// Auth token provides uid β adaptive state looked up server-side
}
interface BatchResponse {
questions: QuestionWithScaffolds[];
sessionBatchId: string;
}
// POST /submitInteraction
interface InteractionSubmission {
sessionId: string;
questionId: string;
answer: string;
isCorrect: boolean;
timeSpentMs: number;
hintsUsed: number[]; // [0], [0,1], [0,1,2], etc.
hintTimestamps: Record<string, number>;
attempts: number;
}
interface InteractionResponse {
weightedOutcome: number;
lds: number;
mcs: number;
newElo: number;
newLevel: string;
decision: "increase" | "maintain" | "decrease" | "skip" | "rapid_decrease";
nextQuestion: QuestionWithScaffolds; // Pre-selected
}
// POST /generateSessionReport
interface SessionReportRequest {
sessionId: string;
}
interface SessionReportResponse {
summary: {
questionsAttempted: number;
questionsCorrect: number;
avgWeightedOutcome: number;
eloChange: number;
topicsStrong: string[];
topicsWeak: string[];
avgLDS: number;
avgMCS: number;
languageProgressNote: string; // Generated text about L2 progress
};
recommendations: string[]; // e.g., "Focus on fractions vocabulary"
}
5. Deployment Architecture
5.1 V1 Deployment (MVP)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Firebase Project β
β β
β βββββββββββββββ βββββββββββββββ ββββββββββββββββββββ β
β β Firebase β β Cloud β β Cloud β β
β β Hosting β β Firestore β β Functions β β
β β (Next.js) β β (Database) β β (Node.js 20) β β
β β β β β β β β
β β Static + β β Student β β LLM calls β β
β β SSR pages β β state, β β Batch gen β β
β β β β questions, β β Reports β β
β β β β sessions β β β β
β ββββββββββββββββ ββββββββββββββββ βββββββββ¬βββββββββββ β
β β β
ββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββ
β
HTTPS β
βΌ
ββββββββββββββββββββ
β Google Gemini β
β 2.0 Flash API β
ββββββββββββββββββββ
Estimated monthly cost (100 students, 5 sessions/week):
- Firebase Hosting: Free tier (~$0)
- Firestore: ~$5/mo (reads/writes within free tier mostly)
- Cloud Functions: ~$10/mo (invocations + compute)
- Gemini API: ~$15-25/mo (scaffold generation)
- Total: ~$30-40/mo
5.2 V2 Deployment (SLM)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Firebase Project β
β β
β βββββββββββββββ βββββββββββββββ ββββββββββββββββββββ β
β β Firebase β β Cloud β β Cloud β β
β β Hosting β β Firestore β β Functions β β
β ββββββββββββββββ ββββββββββββββββ βββββββββ¬βββββββββββ β
β β β
ββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββ
β
βββββββββββββββββββΌβββββββββββββββ
β β β
βΌ βΌ β
ββββββββββββββββββββ ββββββββββββββββ β
β HF Inference β β Gemini API β β
β Endpoint β β (fallback) β β
β Qwen2.5-3B β ββββββββββββββββ β
β QLoRA FT β β
β (T4 GPU) β Shadow testing: β
ββββββββββββββββββββ Both called, SLM β
response served, β
Gemini response β
logged for QA β
ββββββββββββββββββββββ
Estimated monthly cost (100 students):
- Firebase: ~$15/mo (same as V1)
- HF Inference Endpoint (T4, scale-to-zero): ~$50-100/mo
(active only during school hours, ~8hrs/day Γ 20 days)
- Gemini fallback: ~$5/mo (only when SLM is cold)
- Total: ~$70-120/mo (but no per-token costs at scale)
5.3 V3 Deployment (Scale)
When student count exceeds 500+, migrate to:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ β
β β Vercel β β Firebase β β Cloud Run β β
β β (Next.js) β β Firestore β β (API server) β β
β ββββββββββββββββ ββββββββββββββββ βββββββββ¬βββββββββββ β
β β β
β βββββββββββββββββββββββββββΌβββββββ β
β β β β β
β βΌ βΌ β β
β ββββββββββββββββββββ βββββββββββββββββββ β β
β β HF Inference EP β β IRT/DKT Model β β β
β β Qwen2.5-3B β β Server β β β
β β (Auto-scaling) β β (Python/FastAPI)β β β
β ββββββββββββββββββββ βββββββββββββββββββ β β
β β β
β + Deep Knowledge Tracing (DKT) replaces BKT β β
β + IRT item calibration from pooled student data β β
β + A/B testing framework for algorithm improvements β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
6. Technology Stack Summary
| Layer | Technology | Justification |
|---|---|---|
| Frontend Framework | Next.js 14+ (App Router) | SSR for SEO, React ecosystem, TypeScript |
| UI Styling | Tailwind CSS + shadcn/ui | Rapid prototyping, consistent design |
| Math Rendering | KaTeX | Fast client-side LaTeX rendering |
| Charts | Recharts | React-native charting for dashboards |
| Authentication | Firebase Auth | Google Sign-In, simple integration |
| Database | Cloud Firestore | Real-time sync, offline support, serverless |
| Serverless Functions | Firebase Cloud Functions (Node.js 20) | Low latency, Firebase integration |
| LLM (V1) | Google Gemini 2.0 Flash | Low cost, fast, good multilingual |
| SLM (V2) | Qwen2.5-3B-Instruct (QLoRA fine-tuned) | Best math+Spanish at 3B, Apache 2.0 |
| SLM Hosting | HF Inference Endpoints (T4, scale-to-zero) | Cost-effective, no infra management |
| Adaptive Engine | Client-side TypeScript | Zero-latency decisions, works offline |
| State Management | Zustand + Firestore sync | Lightweight, persists across sessions |
| Testing | Vitest + Playwright | Unit + E2E testing |
| CI/CD | GitHub Actions | Automated testing + Firebase deploy |
| Monitoring | Firebase Analytics + Crashlytics | User behavior + error tracking |
7. Security & Privacy Considerations
7.1 Data Protection
- COPPA Compliance: Students are minors (ages 11-14). No personally identifiable information stored beyond email/display name. No third-party tracking.
- FERPA Alignment: Performance data (Elo, LDS, MCS) is associated with uid only. Teachers/admins see aggregate data, never individual student identifiers.
- Data Encryption: Firestore encrypts at rest (AES-256). All API calls over HTTPS/TLS 1.3.
7.2 API Security
- Firebase Auth tokens required for all Cloud Function calls
- Gemini/SLM API keys stored in Firebase environment secrets (never client-side)
- Rate limiting on Cloud Functions to prevent abuse (max 10 scaffold generations per minute per user)
7.3 Content Safety
- All LLM-generated scaffolds pass through a validation function checking:
- Mathematical accuracy (answer matches expected)
- Appropriate content (no adult/violent themes)
- Language accuracy (Spanish translation verified against expected pattern)
- Questions from the curated database are pre-reviewed; generated questions flagged for human review
8. Performance Targets
| Metric | Target | Measurement |
|---|---|---|
| Time to first problem display | < 2 seconds | Lighthouse / Firebase Performance |
| Adaptive decision latency | < 50ms | Client-side (no network) |
| Scaffold generation (Gemini) | < 1.5 seconds | Cloud Function logs |
| Scaffold generation (SLM) | < 800ms | HF Inference EP metrics |
| Batch prefetch trigger β ready | < 5 seconds | 20 questions fetched at Q17 |
| Offline capability | Full session | After initial batch load |
| Concurrent users (V1) | 50 | Firebase free/Blaze tier |
| Concurrent users (V2) | 500+ | HF auto-scaling endpoint |