mathlingua-spec / system_architecture.md

Add system architecture document

3bc409d verified 11 days ago

preview code

raw

history blame contribute delete

42.9 kB

MathLingua — System Architecture Document

1. System Overview

MathLingua is a bilingual adaptive math tutoring application for Spanish-speaking students (grades 6–8) transitioning to English-medium mathematics education. The system presents math word problems with 4 scaffolded hint levels and uses a hybrid adaptive algorithm to personalize difficulty progression.

┌─────────────────────────────────────────────────────────────────────┐
│                        MathLingua System                            │
│                                                                     │
│  ┌──────────────┐   ┌──────────────┐   ┌──────────────────────┐    │
│  │   Frontend    │   │   Backend    │   │   External Services  │    │
│  │  (Next.js)   │◄─►│  (Firebase)  │◄─►│  (LLM / SLM)        │    │
│  └──────┬───────┘   └──────┬───────┘   └──────────┬───────────┘    │
│         │                  │                       │                │
│         ▼                  ▼                       ▼                │
│  ┌──────────────┐   ┌──────────────┐   ┌──────────────────────┐    │
│  │  Adaptive    │   │  Firestore   │   │  V1: Gemini API      │    │
│  │  Engine      │   │  Database    │   │  V2: Qwen2.5-3B SLM  │    │
│  │  (Client JS) │   │              │   │  (HF Inference EP)   │    │
│  └──────────────┘   └──────────────┘   └──────────────────────┘    │
└─────────────────────────────────────────────────────────────────────┘

2. Component Architecture

2.1 Frontend — React / Next.js Application

Technology: Next.js 14+ (App Router), TypeScript, Tailwind CSS
Hosting: Firebase Hosting or Vercel

Key Pages/Routes

Route	Component	Purpose
`/`	`LandingPage`	Login/signup, language preference
`/dashboard`	`StudentDashboard`	Progress overview, session history, MCS/LDS charts
`/practice`	`PracticeSession`	Adaptive practice from question database
`/solve`	`CustomProblem`	"Input your question" — Gemini/SLM processes user-submitted problems
`/session-report`	`SessionReport`	End-of-session summary with performance analytics

Core Frontend Components

src/
├── components/
│   ├── ProblemDisplay/
│   │   ├── MathProblem.tsx          # Renders word problem text
│   │   ├── HintScaffold.tsx         # L1/L2/L3/L4 progressive hint UI
│   │   ├── AnswerInput.tsx          # Numeric/expression answer entry
│   │   └── SolutionReveal.tsx       # L4 step-by-step solution display
│   ├── Adaptive/
│   │   ├── DifficultyIndicator.tsx  # Visual current-level indicator
│   │   ├── ProgressBar.tsx          # Session progress (e.g., 7/20)
│   │   └── SessionTimer.tsx         # Time tracking per problem
│   ├── Dashboard/
│   │   ├── EloChart.tsx             # Elo rating over time (Recharts)
│   │   ├── TopicHeatmap.tsx         # Performance by math topic
│   │   ├── LDSMCSPanel.tsx          # Language Dependency & Math Confidence
│   │   └── StreakBadge.tsx          # Gamification elements
│   └── Shared/
│       ├── BilingualToggle.tsx      # EN/ES interface language switch
│       ├── MathRenderer.tsx         # KaTeX for math expressions
│       └── LoadingSkeleton.tsx
├── lib/
│   ├── adaptive-engine.ts           # Elo + BKT + Thompson Sampling (client-side)
│   ├── feature-engineer.ts          # LDS & MCS computation
│   ├── firebase.ts                  # Firebase SDK initialization
│   └── llm-client.ts               # Gemini/SLM API abstraction
├── hooks/
│   ├── useAdaptiveSession.ts        # Manages session state + engine calls
│   ├── useStudentProfile.ts         # Reads/writes Firestore student state
│   └── useQuestionQueue.ts          # Pre-fetches next batch of questions
└── types/
    └── index.ts                     # TypeScript interfaces for all data structures

Hint Scaffold UI Flow

┌─────────────────────────────────────┐
│  Problem displayed in original      │
│  English at student's current level │
│                                     │
│  [Try to solve]  [I need a hint →]  │
└──────────────────────┬──────────────┘
                       │ click
                       ▼
┌─────────────────────────────────────┐
│  L1: Simplified English             │
│  "A store has 24 apples..."         │
│                                     │
│  [Got it!]  [Still stuck →]         │
└──────────────────────┬──────────────┘
                       │ click
                       ▼
┌─────────────────────────────────────┐
│  L2: Bilingual Keywords Inline      │
│  "A store has 24 apples (manzanas)" │
│  "divided equally (dividido         │
│   igualmente) among 6 boxes"        │
│                                     │
│  [Got it!]  [Still stuck →]         │
└──────────────────────┬──────────────┘
                       │ click
                       ▼
┌─────────────────────────────────────┐
│  L3: Full Spanish Translation       │
│  "Una tienda tiene 24 manzanas      │
│   divididas igualmente entre 6      │
│   cajas. ¿Cuántas manzanas hay      │
│   en cada caja?"                    │
│                                     │
│  [Got it!]  [Show me the answer →]  │
└──────────────────────┬──────────────┘
                       │ click
                       ▼
┌─────────────────────────────────────┐
│  L4: Step-by-Step Solution          │
│  Step 1: Identify — 24 ÷ 6         │
│  Step 2: Calculate — 24 ÷ 6 = 4    │
│  Step 3: Answer — 4 apples per box  │
│                                     │
│  [Next Problem →]                   │
└─────────────────────────────────────┘

Each hint interaction is logged with timestamp to compute escalation_speed and scaffold_time_ratio for the LDS formula.

2.2 Adaptive Engine (Client-Side JavaScript)

The adaptive engine runs entirely in the browser — no server round-trip needed for difficulty decisions. This ensures instant feedback and works offline after initial question batch load.

Engine Components

┌─────────────────────────────────────────────────┐
│              Adaptive Engine (client-side)        │
│                                                   │
│  ┌─────────────┐  ┌──────────┐  ┌────────────┐  │
│  │  Elo Rating  │  │   BKT    │  │  Thompson  │  │
│  │   System     │  │  Engine  │  │  Sampler   │  │
│  │             │  │          │  │            │  │
│  │ Updates     │  │ P(know)  │  │ Beta prior │  │
│  │ student &   │  │ per      │  │ per level, │  │
│  │ question    │  │ topic    │  │ ZPD window │  │
│  │ ratings     │  │          │  │            │  │
│  └──────┬──────┘  └────┬─────┘  └─────┬──────┘  │
│         │              │              │          │
│         ▼              ▼              ▼          │
│  ┌───────────────────────────────────────────┐   │
│  │         Decision Orchestrator             │   │
│  │                                           │   │
│  │  Input: weighted_outcome, features        │   │
│  │  Output: next_level, decision_type        │   │
│  │         (increase/maintain/decrease)       │   │
│  └───────────────────────────────────────────┘   │
└─────────────────────────────────────────────────┘

Elo Update Formula

weighted_outcome = {
    no_hint:  1.00 (solved without any scaffold)
    L1_only:  0.75 (needed simplified English)
    L2_used:  0.50 (needed bilingual keywords)
    L3_used:  0.25 (needed full translation)
    L4_used:  0.00 (needed answer reveal)
}

E_student = 1 / (1 + 10^((R_question - R_student) / 400))
R_student_new = R_student + K × (weighted_outcome - E_student)

K = 32 (default), increased to 48 for first 10 interactions (cold-start acceleration)

BKT Parameters (per topic)

Parameter	Symbol	Default	Description
Prior knowledge	P(L₀)	0.10	Initial probability student knows topic
Learn rate	P(T)	0.15	Probability of learning per opportunity
Slip	P(S)	0.10	Probability of incorrect despite knowing
Guess	P(G)	0.25	Probability of correct despite not knowing

Slip is adjusted based on hint usage:

P(S)_adjusted = P(S) × (1 + 0.5 × hint_depth_normalized)

This models the intuition that using more scaffolds means apparent "correctness" is less certain.

Thompson Sampling with ZPD Windowing

For each candidate level l in ZPD window [current - 2, current + 3]:
    sample θ_l ~ Beta(α_l, β_l)
    score_l = θ_l × proximity_bonus(l, target_elo)

Select level = argmax(score_l)

proximity_bonus(l, target) = exp(-0.5 × ((elo_l - target) / 100)²)

ZPD window is asymmetric (+3 upward, -2 downward) to encourage upward progression while preventing catastrophic failure.

Progression Decision Rules

Condition	Decision	Action
weighted_outcome ≥ 0.75 AND P(know) ≥ 0.70	Increase	Move up 1 sub-level
weighted_outcome ≥ 0.85 AND streak ≥ 3	Skip	Move up 2 sub-levels
0.40 ≤ weighted_outcome < 0.75	Maintain	Stay at current level
weighted_outcome < 0.40 OR streak_wrong ≥ 2	Decrease	Move down 1 sub-level
weighted_outcome < 0.25 AND P(know) < 0.30	Rapid Decrease	Move down 2 sub-levels

2.3 Firebase Backend

Services Used:

Firebase Authentication (Google Sign-In, Email/Password)
Cloud Firestore (student state, question database, session logs)
Cloud Functions (LLM API calls, batch question generation, session reports)
Firebase Hosting (static frontend assets)

Firestore Data Model

firestore/
├── users/
│   └── {uid}/
│       ├── profile: {
│       │     displayName, email, gradeLevel, preferredLanguage,
│       │     createdAt, lastActive
│       │   }
│       ├── adaptiveState: {
│       │     currentElo: number,         // e.g., 1050
│       │     currentLevel: string,       // e.g., "2.1"
│       │     totalInteractions: number,
│       │     topicMastery: {             // BKT P(know) per topic
│       │       "arithmetic": 0.72,
│       │       "fractions": 0.45,
│       │       "algebra_basic": 0.31,
│       │       ...
│       │     },
│       │     thompsonPriors: {           // Beta(α,β) per level
│       │       "1.1": { alpha: 12, beta: 3 },
│       │       "1.2": { alpha: 8, beta: 5 },
│       │       ...
│       │     },
│       │     featureAverages: {
│       │       avgLDS: 0.42,
│       │       avgMCS: 0.61,
│       │       recentLDS_5: [0.3, 0.4, 0.5, 0.35, 0.45],
│       │       recentMCS_5: [0.6, 0.65, 0.58, 0.62, 0.7]
│       │     },
│       │     streakCount: number,
│       │     lastUpdated: timestamp
│       │   }
│       └── sessions/
│           └── {sessionId}/
│               ├── metadata: {
│               │     startTime, endTime, questionsAttempted,
│               │     questionsCorrect, avgWeightedOutcome,
│               │     startElo, endElo, sessionLDS, sessionMCS
│               │   }
│               └── interactions/
│                   └── {interactionId}: {
│                         questionId, level, topic,
│                         startTime, endTime, timeSpentMs,
│                         hintsUsed: [0,1,2,3,4],  // which levels accessed
│                         hintTimestamps: { L1: ts, L2: ts, ... },
│                         maxHintLevel: number,
│                         answer: string,
│                         isCorrect: boolean,
│                         attempts: number,
│                         weightedOutcome: number,
│                         lds: number,
│                         mcs: number,
│                         eloBeforeUpdate: number,
│                         eloAfterUpdate: number,
│                         adaptiveDecision: string
│                       }
│
├── questions/
│   └── {questionId}: {
│         id, level, topic, subtopic,
│         problemText, answer, answerNumeric,
│         solutionSteps: [...],
│         scaffolds: {
│           L1_simplified: string,
│           L2_bilingual: string,
│           L3_spanish: string,
│           L4_solution: string
│         },
│         readability: {
│           fleschKincaid: number,
│           wordCount: number,
│           difficultWords: number,
│           avgSyllables: number
│         },
│         eloRating: number,
│         timesServed: number,
│         avgOutcome: number,
│         metadata: {
│           source: "curated" | "generated",
│           generatedBy: "gemini-2.0" | "qwen2.5-3b" | null,
│           reviewedBy: string | null,
│           createdAt: timestamp
│         }
│       }
│
├── questionIndex/                      // Denormalized for fast queries
│   └── byLevel/
│       └── {level}: {
│             questionIds: [...],
│             count: number
│           }
│
└── analytics/                          // Aggregated (Cloud Functions)
    ├── dailyStats/
    │   └── {date}: { activeUsers, sessionsCompleted, ... }
    └── cohortProgress/
        └── {cohortId}: { avgElo, avgLDS, avgMCS, ... }

Firestore Security Rules

rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {
    // Users can only read/write their own data
    match /users/{uid}/{document=**} {
      allow read, write: if request.auth != null && request.auth.uid == uid;
    }
    // Questions are readable by all authenticated users
    match /questions/{questionId} {
      allow read: if request.auth != null;
      allow write: if false; // Only admin/Cloud Functions
    }
    // Question index readable by all authenticated users
    match /questionIndex/{document=**} {
      allow read: if request.auth != null;
      allow write: if false;
    }
    // Analytics only accessible by admin
    match /analytics/{document=**} {
      allow read, write: if false; // Cloud Functions only
    }
  }
}

2.4 Cloud Functions (Serverless Backend)

functions/
├── onUserCreate.ts          # Initialize adaptive state for new user
├── generateScaffolds.ts     # Call Gemini/SLM to create L1-L4 for a problem
├── batchGenerateQuestions.ts # Generate next 20 questions for session queue
├── processCustomProblem.ts  # "Input your question" flow
├── generateSessionReport.ts # End-of-session analytics
├── updateQuestionStats.ts   # Update question difficulty from outcomes
└── scheduledAnalytics.ts    # Daily aggregation (cron-triggered)

Key Cloud Function: `generateScaffolds`

// Triggered when student submits a custom problem or when
// pre-generating scaffolds for database questions

interface ScaffoldRequest {
  problemText: string;
  studentGradeLevel: number;
  currentLDS: number;  // Informs simplification level
}

interface ScaffoldResponse {
  L1_simplified: string;   // Simplified English
  L2_bilingual: string;    // English with inline Spanish keywords
  L3_spanish: string;      // Full Spanish translation
  L4_solution: string;     // Step-by-step solution
  answer: string;
  answerNumeric: number;
}

// Prompt template for LLM
const SCAFFOLD_PROMPT = `
You are a bilingual math tutor helping Spanish-speaking students 
(grades 6-8) learn math in English.

Given this math word problem:
"{problemText}"

Generate 4 scaffold levels:

**L1 (Simplified English):** Rewrite using shorter sentences, 
simpler vocabulary (grade {adjustedGrade} reading level). 
Keep all math content identical.

**L2 (Bilingual Keywords):** Take the original problem and add 
Spanish translations in parentheses for key math and context 
vocabulary. Format: "English word (palabra en español)".

**L3 (Full Spanish Translation):** Translate the complete problem 
to natural, grade-appropriate Spanish. Ensure mathematical 
precision is maintained.

**L4 (Step-by-Step Solution):** Provide a clear, numbered 
step-by-step solution in English with the final numerical answer.

Return as JSON with keys: L1_simplified, L2_bilingual, L3_spanish, 
L4_solution, answer, answerNumeric.
`;

Key Cloud Function: `batchGenerateQuestions`

// Called when student reaches question 17 of 20 (prefetch trigger)
// Selects next 20 questions from database based on adaptive state

export const batchGenerateQuestions = onCall(async (request) => {
  const { uid } = request.auth;
  const state = await getAdaptiveState(uid);
  
  // Thompson Sampling selects level distribution for next batch
  const levelDistribution = thompsonSampleBatch(
    state.thompsonPriors, 
    state.currentLevel,
    batchSize: 20
  );
  // e.g., { "2.1": 5, "2.2": 8, "2.3": 5, "2.4": 2 }
  
  // Select questions avoiding recently served ones
  const recentIds = await getRecentQuestionIds(uid, lookback: 100);
  const questions = await selectQuestions(
    levelDistribution, 
    excludeIds: recentIds,
    topicBalance: state.topicMastery  // Favor weaker topics
  );
  
  // Ensure all questions have scaffolds generated
  const withScaffolds = await ensureScaffoldsGenerated(questions);
  
  return { questions: withScaffolds, sessionBatchId: generateId() };
});

2.5 LLM Service Layer

V1: Gemini API (Current)

┌────────────┐     HTTPS/REST     ┌──────────────────┐
│  Cloud      │ ──────────────────►│  Google Gemini    │
│  Function   │ ◄──────────────────│  2.0 Flash API    │
└────────────┘                    └──────────────────┘

Cost: ~$0.075 per 1M input tokens, ~$0.30 per 1M output tokens
Latency: 200-800ms per scaffold generation
Rate limit: 60 RPM (free tier), 1000 RPM (paid)

V2: Qwen2.5-3B SLM (Planned)

┌────────────┐     HTTPS/REST     ┌──────────────────────────┐
│  Cloud      │ ──────────────────►│  HF Inference Endpoint   │
│  Function   │ ◄──────────────────│  Qwen2.5-3B-Instruct     │
└────────────┘                    │  (QLoRA fine-tuned)       │
                                  │  GPU: T4 or L4            │
                                  └──────────────────────────┘

Cost: ~$0.60/hr (T4) or ~$1.04/hr (L4)
Latency: 100-400ms per scaffold generation
Rate limit: Unlimited (dedicated endpoint)

LLM Client Abstraction

// lib/llm-client.ts — Provider-agnostic interface

interface LLMProvider {
  generateScaffolds(problem: string, context: ScaffoldContext): Promise<ScaffoldResponse>;
  generateQuestion(level: string, topic: string): Promise<QuestionWithScaffolds>;
  validateAnswer(problem: string, studentAnswer: string, correctAnswer: string): Promise<AnswerValidation>;
}

class GeminiProvider implements LLMProvider { ... }
class QwenSLMProvider implements LLMProvider { ... }

// Factory with fallback
function createLLMClient(): LLMProvider {
  if (config.useSLM && config.slmEndpointAvailable) {
    return new QwenSLMProvider(config.slmEndpoint);
  }
  return new GeminiProvider(config.geminiApiKey);
}

2.6 SLM Fine-Tuning Pipeline

┌──────────────┐    ┌──────────────┐    ┌──────────────────┐
│  Training     │    │  Fine-Tune   │    │  Deploy          │
│  Data Prep    │───►│  QLoRA SFT   │───►│  HF Inference EP │
└──────────────┘    └──────────────┘    └──────────────────┘

Step 1: Collect 2,000-5,000 scaffold examples from Gemini V1 usage
Step 2: Human review + quality filter → ~1,500 gold examples
Step 3: QLoRA fine-tune Qwen2.5-3B-Instruct
Step 4: Evaluate on held-out test set (BLEU, math accuracy, readability)
Step 5: Deploy to HF Inference Endpoint
Step 6: Shadow-test alongside Gemini (serve both, compare quality)
Step 7: Full cutover when SLM matches Gemini quality

Fine-tuning Configuration:

Parameter	Value	Rationale
Base model	Qwen2.5-3B-Instruct	Best math+Spanish at 3B scale
Method	QLoRA (4-bit NF4)	Fits single 16GB GPU
LoRA rank (r)	32	Balance quality/efficiency for small dataset
LoRA alpha	64	Standard 2× rank
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj	Full attention + MLP
Learning rate	2e-4	Standard for QLoRA
Epochs	3-5	Small dataset, monitor val loss
Batch size	4 (effective 16 with grad accum)	Memory constraint
Max sequence length	1024	Sufficient for problem + all 4 scaffolds
Warmup ratio	0.05	Short warmup for small dataset

3. Data Flow Diagrams

3.1 Flow A: "Practice Problems" Mode

Student clicks "Start Practice"
         │
         ▼
┌─────────────────────────────────┐
│ 1. Load adaptive state from     │
│    Firestore (Elo, BKT, priors) │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 2. Thompson Sampling selects    │
│    next question level          │
│    (ZPD window: current ±2/+3)  │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 3. Fetch question from Firestore│
│    by level + topic balancing   │
│    (avoid recently served)      │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 4. Display problem, start timer │
│    Student reads and attempts   │
└────────────┬────────────────────┘
             │
    ┌────────┴────────┐
    │  Needs hints?   │
    ▼ No              ▼ Yes
┌─────────┐   ┌───────────────────┐
│ Submit  │   │ L1 → L2 → L3 → L4│
│ answer  │   │ (each click logged │
└────┬────┘   │  with timestamp)   │
     │        └────────┬───────────┘
     │                 │
     ▼                 ▼
┌─────────────────────────────────┐
│ 5. Compute weighted_outcome     │
│    based on correctness + hints │
│    Compute LDS and MCS          │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 6. Update Elo (student + Q)     │
│    Update BKT P(know) for topic │
│    Update Thompson Beta priors  │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 7. Progression decision:        │
│    increase / maintain / decrease│
│    Select next level            │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 8. Save interaction to Firestore│
│    Display "Next Problem"       │
└────────────┬────────────────────┘
             │
             ▼
    ┌────────┴────────┐
    │ Q17 of 20?      │
    ▼ Yes             ▼ No
┌─────────────┐   ┌──────────┐
│ Prefetch    │   │ Loop to  │
│ next batch  │   │ step 2   │
│ (Cloud Fn)  │   └──────────┘
└─────────────┘
             │
    At Q20:  ▼
┌─────────────────────────────────┐
│ 9. Generate session report      │
│    (Cloud Function)             │
│    Show summary to student      │
└─────────────────────────────────┘

3.2 Flow B: "Input Your Question" Mode

Student types/pastes a math word problem
         │
         ▼
┌─────────────────────────────────┐
│ 1. Cloud Function:              │
│    processCustomProblem         │
│    - Validate it's a math       │
│      word problem               │
│    - Extract answer/solution    │
│    - Call Gemini/SLM to generate│
│      L1, L2, L3, L4 scaffolds  │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 2. Estimate difficulty level    │
│    using readability metrics    │
│    (FK grade, word count, etc.) │
│    Map to nearest Elo rating    │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 3. Display problem with         │
│    scaffold buttons active      │
│    (same UI as Practice mode)   │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 4. Student interacts, solves    │
│    Same hint tracking as        │
│    Practice mode                │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 5. Update adaptive state        │
│    (Elo, BKT, Thompson)         │
│    Log interaction               │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 6. Offer: "Try another?" or     │
│    "Switch to Practice Mode"    │
│    (where engine auto-selects)  │
└─────────────────────────────────┘

4. API Contracts

4.1 Client → Cloud Functions

// POST /generateScaffolds
interface GenerateScaffoldsRequest {
  problemText: string;
  gradeLevel: number;          // 6, 7, or 8
  currentLDS: number;          // 0.0-1.0, informs simplification
}
interface GenerateScaffoldsResponse {
  scaffolds: {
    L1_simplified: string;
    L2_bilingual: string;
    L3_spanish: string;
    L4_solution: string;
  };
  answer: string;
  answerNumeric: number;
  estimatedLevel: string;      // e.g., "2.3"
  estimatedElo: number;        // e.g., 1100
  processingTimeMs: number;
}

// POST /batchGenerateQuestions
interface BatchRequest {
  batchSize: number;           // default 20
  // Auth token provides uid → adaptive state looked up server-side
}
interface BatchResponse {
  questions: QuestionWithScaffolds[];
  sessionBatchId: string;
}

// POST /submitInteraction
interface InteractionSubmission {
  sessionId: string;
  questionId: string;
  answer: string;
  isCorrect: boolean;
  timeSpentMs: number;
  hintsUsed: number[];         // [0], [0,1], [0,1,2], etc.
  hintTimestamps: Record<string, number>;
  attempts: number;
}
interface InteractionResponse {
  weightedOutcome: number;
  lds: number;
  mcs: number;
  newElo: number;
  newLevel: string;
  decision: "increase" | "maintain" | "decrease" | "skip" | "rapid_decrease";
  nextQuestion: QuestionWithScaffolds;  // Pre-selected
}

// POST /generateSessionReport
interface SessionReportRequest {
  sessionId: string;
}
interface SessionReportResponse {
  summary: {
    questionsAttempted: number;
    questionsCorrect: number;
    avgWeightedOutcome: number;
    eloChange: number;
    topicsStrong: string[];
    topicsWeak: string[];
    avgLDS: number;
    avgMCS: number;
    languageProgressNote: string;  // Generated text about L2 progress
  };
  recommendations: string[];      // e.g., "Focus on fractions vocabulary"
}

5. Deployment Architecture

5.1 V1 Deployment (MVP)

┌──────────────────────────────────────────────────────────┐
│                    Firebase Project                        │
│                                                            │
│  ┌─────────────┐  ┌─────────────┐  ┌──────────────────┐  │
│  │  Firebase    │  │  Cloud       │  │  Cloud           │  │
│  │  Hosting     │  │  Firestore   │  │  Functions       │  │
│  │  (Next.js)   │  │  (Database)  │  │  (Node.js 20)   │  │
│  │              │  │              │  │                  │  │
│  │  Static +    │  │  Student     │  │  LLM calls       │  │
│  │  SSR pages   │  │  state,      │  │  Batch gen       │  │
│  │              │  │  questions,  │  │  Reports         │  │
│  │              │  │  sessions    │  │                  │  │
│  └──────────────┘  └──────────────┘  └───────┬──────────┘  │
│                                              │             │
└──────────────────────────────────────────────┼─────────────┘
                                               │
                                    HTTPS      │
                                               ▼
                                  ┌──────────────────┐
                                  │  Google Gemini    │
                                  │  2.0 Flash API    │
                                  └──────────────────┘

Estimated monthly cost (100 students, 5 sessions/week):
- Firebase Hosting: Free tier (~$0)
- Firestore: ~$5/mo (reads/writes within free tier mostly)
- Cloud Functions: ~$10/mo (invocations + compute)
- Gemini API: ~$15-25/mo (scaffold generation)
- Total: ~$30-40/mo

5.2 V2 Deployment (SLM)

┌──────────────────────────────────────────────────────────┐
│                    Firebase Project                        │
│                                                            │
│  ┌─────────────┐  ┌─────────────┐  ┌──────────────────┐  │
│  │  Firebase    │  │  Cloud       │  │  Cloud           │  │
│  │  Hosting     │  │  Firestore   │  │  Functions       │  │
│  └──────────────┘  └──────────────┘  └───────┬──────────┘  │
│                                              │             │
└──────────────────────────────────────────────┼─────────────┘
                                               │
                              ┌─────────────────┼──────────────┐
                              │                 │              │
                              ▼                 ▼              │
                   ┌──────────────────┐  ┌──────────────┐     │
                   │  HF Inference    │  │  Gemini API  │     │
                   │  Endpoint        │  │  (fallback)  │     │
                   │  Qwen2.5-3B     │  └──────────────┘     │
                   │  QLoRA FT       │                        │
                   │  (T4 GPU)       │  Shadow testing:       │
                   └──────────────────┘  Both called, SLM     │
                                         response served,     │
                                         Gemini response      │
                                         logged for QA        │
                                         ─────────────────────┘

Estimated monthly cost (100 students):
- Firebase: ~$15/mo (same as V1)
- HF Inference Endpoint (T4, scale-to-zero): ~$50-100/mo
  (active only during school hours, ~8hrs/day × 20 days)
- Gemini fallback: ~$5/mo (only when SLM is cold)
- Total: ~$70-120/mo (but no per-token costs at scale)

5.3 V3 Deployment (Scale)

When student count exceeds 500+, migrate to:

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │  Vercel       │  │  Firebase    │  │  Cloud Run       │  │
│  │  (Next.js)    │  │  Firestore   │  │  (API server)    │  │
│  └──────────────┘  └──────────────┘  └───────┬──────────┘  │
│                                              │             │
│                    ┌─────────────────────────┼──────┐      │
│                    │                         │      │      │
│                    ▼                         ▼      │      │
│         ┌──────────────────┐    ┌─────────────────┐ │      │
│         │  HF Inference EP │    │  IRT/DKT Model  │ │      │
│         │  Qwen2.5-3B     │    │  Server          │ │      │
│         │  (Auto-scaling)  │    │  (Python/FastAPI)│ │      │
│         └──────────────────┘    └─────────────────┘ │      │
│                                                     │      │
│  + Deep Knowledge Tracing (DKT) replaces BKT        │      │
│  + IRT item calibration from pooled student data     │      │
│  + A/B testing framework for algorithm improvements  │      │
└─────────────────────────────────────────────────────────────┘

6. Technology Stack Summary

Layer	Technology	Justification
Frontend Framework	Next.js 14+ (App Router)	SSR for SEO, React ecosystem, TypeScript
UI Styling	Tailwind CSS + shadcn/ui	Rapid prototyping, consistent design
Math Rendering	KaTeX	Fast client-side LaTeX rendering
Charts	Recharts	React-native charting for dashboards
Authentication	Firebase Auth	Google Sign-In, simple integration
Database	Cloud Firestore	Real-time sync, offline support, serverless
Serverless Functions	Firebase Cloud Functions (Node.js 20)	Low latency, Firebase integration
LLM (V1)	Google Gemini 2.0 Flash	Low cost, fast, good multilingual
SLM (V2)	Qwen2.5-3B-Instruct (QLoRA fine-tuned)	Best math+Spanish at 3B, Apache 2.0
SLM Hosting	HF Inference Endpoints (T4, scale-to-zero)	Cost-effective, no infra management
Adaptive Engine	Client-side TypeScript	Zero-latency decisions, works offline
State Management	Zustand + Firestore sync	Lightweight, persists across sessions
Testing	Vitest + Playwright	Unit + E2E testing
CI/CD	GitHub Actions	Automated testing + Firebase deploy
Monitoring	Firebase Analytics + Crashlytics	User behavior + error tracking

7. Security & Privacy Considerations

7.1 Data Protection

COPPA Compliance: Students are minors (ages 11-14). No personally identifiable information stored beyond email/display name. No third-party tracking.
FERPA Alignment: Performance data (Elo, LDS, MCS) is associated with uid only. Teachers/admins see aggregate data, never individual student identifiers.
Data Encryption: Firestore encrypts at rest (AES-256). All API calls over HTTPS/TLS 1.3.

7.2 API Security

Firebase Auth tokens required for all Cloud Function calls
Gemini/SLM API keys stored in Firebase environment secrets (never client-side)
Rate limiting on Cloud Functions to prevent abuse (max 10 scaffold generations per minute per user)

7.3 Content Safety

All LLM-generated scaffolds pass through a validation function checking:
- Mathematical accuracy (answer matches expected)
- Appropriate content (no adult/violent themes)
- Language accuracy (Spanish translation verified against expected pattern)
Questions from the curated database are pre-reviewed; generated questions flagged for human review

8. Performance Targets

Metric	Target	Measurement
Time to first problem display	< 2 seconds	Lighthouse / Firebase Performance
Adaptive decision latency	< 50ms	Client-side (no network)
Scaffold generation (Gemini)	< 1.5 seconds	Cloud Function logs
Scaffold generation (SLM)	< 800ms	HF Inference EP metrics
Batch prefetch trigger → ready	< 5 seconds	20 questions fetched at Q17
Offline capability	Full session	After initial batch load
Concurrent users (V1)	50	Firebase free/Blaze tier
Concurrent users (V2)	500+	HF auto-scaling endpoint