File size: 42,898 Bytes

3bc409d

# MathLingua — System Architecture Document

## 1. System Overview

MathLingua is a bilingual adaptive math tutoring application for Spanish-speaking students (grades 6–8) transitioning to English-medium mathematics education. The system presents math word problems with 4 scaffolded hint levels and uses a hybrid adaptive algorithm to personalize difficulty progression.

```
┌─────────────────────────────────────────────────────────────────────┐
│                        MathLingua System                            │
│                                                                     │
│  ┌──────────────┐   ┌──────────────┐   ┌──────────────────────┐    │
│  │   Frontend    │   │   Backend    │   │   External Services  │    │
│  │  (Next.js)   │◄─►│  (Firebase)  │◄─►│  (LLM / SLM)        │    │
│  └──────┬───────┘   └──────┬───────┘   └──────────┬───────────┘    │
│         │                  │                       │                │
│         ▼                  ▼                       ▼                │
│  ┌──────────────┐   ┌──────────────┐   ┌──────────────────────┐    │
│  │  Adaptive    │   │  Firestore   │   │  V1: Gemini API      │    │
│  │  Engine      │   │  Database    │   │  V2: Qwen2.5-3B SLM  │    │
│  │  (Client JS) │   │              │   │  (HF Inference EP)   │    │
│  └──────────────┘   └──────────────┘   └──────────────────────┘    │
└─────────────────────────────────────────────────────────────────────┘
```

---

## 2. Component Architecture

### 2.1 Frontend — React / Next.js Application

**Technology**: Next.js 14+ (App Router), TypeScript, Tailwind CSS  
**Hosting**: Firebase Hosting or Vercel  

#### Key Pages/Routes

| Route | Component | Purpose |
|---|---|---|
| `/` | `LandingPage` | Login/signup, language preference |
| `/dashboard` | `StudentDashboard` | Progress overview, session history, MCS/LDS charts |
| `/practice` | `PracticeSession` | Adaptive practice from question database |
| `/solve` | `CustomProblem` | "Input your question" — Gemini/SLM processes user-submitted problems |
| `/session-report` | `SessionReport` | End-of-session summary with performance analytics |

#### Core Frontend Components

```
src/
├── components/
│   ├── ProblemDisplay/
│   │   ├── MathProblem.tsx          # Renders word problem text
│   │   ├── HintScaffold.tsx         # L1/L2/L3/L4 progressive hint UI
│   │   ├── AnswerInput.tsx          # Numeric/expression answer entry
│   │   └── SolutionReveal.tsx       # L4 step-by-step solution display
│   ├── Adaptive/
│   │   ├── DifficultyIndicator.tsx  # Visual current-level indicator
│   │   ├── ProgressBar.tsx          # Session progress (e.g., 7/20)
│   │   └── SessionTimer.tsx         # Time tracking per problem
│   ├── Dashboard/
│   │   ├── EloChart.tsx             # Elo rating over time (Recharts)
│   │   ├── TopicHeatmap.tsx         # Performance by math topic
│   │   ├── LDSMCSPanel.tsx          # Language Dependency & Math Confidence
│   │   └── StreakBadge.tsx          # Gamification elements
│   └── Shared/
│       ├── BilingualToggle.tsx      # EN/ES interface language switch
│       ├── MathRenderer.tsx         # KaTeX for math expressions
│       └── LoadingSkeleton.tsx
├── lib/
│   ├── adaptive-engine.ts           # Elo + BKT + Thompson Sampling (client-side)
│   ├── feature-engineer.ts          # LDS & MCS computation
│   ├── firebase.ts                  # Firebase SDK initialization
│   └── llm-client.ts               # Gemini/SLM API abstraction
├── hooks/
│   ├── useAdaptiveSession.ts        # Manages session state + engine calls
│   ├── useStudentProfile.ts         # Reads/writes Firestore student state
│   └── useQuestionQueue.ts          # Pre-fetches next batch of questions
└── types/
    └── index.ts                     # TypeScript interfaces for all data structures
```

#### Hint Scaffold UI Flow

```
┌─────────────────────────────────────┐
│  Problem displayed in original      │
│  English at student's current level │
│                                     │
│  [Try to solve]  [I need a hint →]  │
└──────────────────────┬──────────────┘
                       │ click
                       ▼
┌─────────────────────────────────────┐
│  L1: Simplified English             │
│  "A store has 24 apples..."         │
│                                     │
│  [Got it!]  [Still stuck →]         │
└──────────────────────┬──────────────┘
                       │ click
                       ▼
┌─────────────────────────────────────┐
│  L2: Bilingual Keywords Inline      │
│  "A store has 24 apples (manzanas)" │
│  "divided equally (dividido         │
│   igualmente) among 6 boxes"        │
│                                     │
│  [Got it!]  [Still stuck →]         │
└──────────────────────┬──────────────┘
                       │ click
                       ▼
┌─────────────────────────────────────┐
│  L3: Full Spanish Translation       │
│  "Una tienda tiene 24 manzanas      │
│   divididas igualmente entre 6      │
│   cajas. ¿Cuántas manzanas hay      │
│   en cada caja?"                    │
│                                     │
│  [Got it!]  [Show me the answer →]  │
└──────────────────────┬──────────────┘
                       │ click
                       ▼
┌─────────────────────────────────────┐
│  L4: Step-by-Step Solution          │
│  Step 1: Identify — 24 ÷ 6         │
│  Step 2: Calculate — 24 ÷ 6 = 4    │
│  Step 3: Answer — 4 apples per box  │
│                                     │
│  [Next Problem →]                   │
└─────────────────────────────────────┘
```

Each hint interaction is logged with timestamp to compute `escalation_speed` and `scaffold_time_ratio` for the LDS formula.

---

### 2.2 Adaptive Engine (Client-Side JavaScript)

The adaptive engine runs **entirely in the browser** — no server round-trip needed for difficulty decisions. This ensures instant feedback and works offline after initial question batch load.

#### Engine Components

```
┌─────────────────────────────────────────────────┐
│              Adaptive Engine (client-side)        │
│                                                   │
│  ┌─────────────┐  ┌──────────┐  ┌────────────┐  │
│  │  Elo Rating  │  │   BKT    │  │  Thompson  │  │
│  │   System     │  │  Engine  │  │  Sampler   │  │
│  │             │  │          │  │            │  │
│  │ Updates     │  │ P(know)  │  │ Beta prior │  │
│  │ student &   │  │ per      │  │ per level, │  │
│  │ question    │  │ topic    │  │ ZPD window │  │
│  │ ratings     │  │          │  │            │  │
│  └──────┬──────┘  └────┬─────┘  └─────┬──────┘  │
│         │              │              │          │
│         ▼              ▼              ▼          │
│  ┌───────────────────────────────────────────┐   │
│  │         Decision Orchestrator             │   │
│  │                                           │   │
│  │  Input: weighted_outcome, features        │   │
│  │  Output: next_level, decision_type        │   │
│  │         (increase/maintain/decrease)       │   │
│  └───────────────────────────────────────────┘   │
└─────────────────────────────────────────────────┘
```

#### Elo Update Formula

```
weighted_outcome = {
    no_hint:  1.00 (solved without any scaffold)
    L1_only:  0.75 (needed simplified English)
    L2_used:  0.50 (needed bilingual keywords)
    L3_used:  0.25 (needed full translation)
    L4_used:  0.00 (needed answer reveal)
}

E_student = 1 / (1 + 10^((R_question - R_student) / 400))
R_student_new = R_student + K × (weighted_outcome - E_student)

K = 32 (default), increased to 48 for first 10 interactions (cold-start acceleration)
```

#### BKT Parameters (per topic)

| Parameter | Symbol | Default | Description |
|---|---|---|---|
| Prior knowledge | P(L₀) | 0.10 | Initial probability student knows topic |
| Learn rate | P(T) | 0.15 | Probability of learning per opportunity |
| Slip | P(S) | 0.10 | Probability of incorrect despite knowing |
| Guess | P(G) | 0.25 | Probability of correct despite not knowing |

Slip is adjusted based on hint usage:
```
P(S)_adjusted = P(S) × (1 + 0.5 × hint_depth_normalized)
```
This models the intuition that using more scaffolds means apparent "correctness" is less certain.

#### Thompson Sampling with ZPD Windowing

```
For each candidate level l in ZPD window [current - 2, current + 3]:
    sample θ_l ~ Beta(α_l, β_l)
    score_l = θ_l × proximity_bonus(l, target_elo)

Select level = argmax(score_l)

proximity_bonus(l, target) = exp(-0.5 × ((elo_l - target) / 100)²)
```

ZPD window is asymmetric (+3 upward, -2 downward) to encourage upward progression while preventing catastrophic failure.

#### Progression Decision Rules

| Condition | Decision | Action |
|---|---|---|
| weighted_outcome ≥ 0.75 AND P(know) ≥ 0.70 | **Increase** | Move up 1 sub-level |
| weighted_outcome ≥ 0.85 AND streak ≥ 3 | **Skip** | Move up 2 sub-levels |
| 0.40 ≤ weighted_outcome < 0.75 | **Maintain** | Stay at current level |
| weighted_outcome < 0.40 OR streak_wrong ≥ 2 | **Decrease** | Move down 1 sub-level |
| weighted_outcome < 0.25 AND P(know) < 0.30 | **Rapid Decrease** | Move down 2 sub-levels |

---

### 2.3 Firebase Backend

**Services Used**:
- Firebase Authentication (Google Sign-In, Email/Password)
- Cloud Firestore (student state, question database, session logs)
- Cloud Functions (LLM API calls, batch question generation, session reports)
- Firebase Hosting (static frontend assets)

#### Firestore Data Model

```
firestore/
├── users/
│   └── {uid}/
│       ├── profile: {
│       │     displayName, email, gradeLevel, preferredLanguage,
│       │     createdAt, lastActive
│       │   }
│       ├── adaptiveState: {
│       │     currentElo: number,         // e.g., 1050
│       │     currentLevel: string,       // e.g., "2.1"
│       │     totalInteractions: number,
│       │     topicMastery: {             // BKT P(know) per topic
│       │       "arithmetic": 0.72,
│       │       "fractions": 0.45,
│       │       "algebra_basic": 0.31,
│       │       ...
│       │     },
│       │     thompsonPriors: {           // Beta(α,β) per level
│       │       "1.1": { alpha: 12, beta: 3 },
│       │       "1.2": { alpha: 8, beta: 5 },
│       │       ...
│       │     },
│       │     featureAverages: {
│       │       avgLDS: 0.42,
│       │       avgMCS: 0.61,
│       │       recentLDS_5: [0.3, 0.4, 0.5, 0.35, 0.45],
│       │       recentMCS_5: [0.6, 0.65, 0.58, 0.62, 0.7]
│       │     },
│       │     streakCount: number,
│       │     lastUpdated: timestamp
│       │   }
│       └── sessions/
│           └── {sessionId}/
│               ├── metadata: {
│               │     startTime, endTime, questionsAttempted,
│               │     questionsCorrect, avgWeightedOutcome,
│               │     startElo, endElo, sessionLDS, sessionMCS
│               │   }
│               └── interactions/
│                   └── {interactionId}: {
│                         questionId, level, topic,
│                         startTime, endTime, timeSpentMs,
│                         hintsUsed: [0,1,2,3,4],  // which levels accessed
│                         hintTimestamps: { L1: ts, L2: ts, ... },
│                         maxHintLevel: number,
│                         answer: string,
│                         isCorrect: boolean,
│                         attempts: number,
│                         weightedOutcome: number,
│                         lds: number,
│                         mcs: number,
│                         eloBeforeUpdate: number,
│                         eloAfterUpdate: number,
│                         adaptiveDecision: string
│                       }
│
├── questions/
│   └── {questionId}: {
│         id, level, topic, subtopic,
│         problemText, answer, answerNumeric,
│         solutionSteps: [...],
│         scaffolds: {
│           L1_simplified: string,
│           L2_bilingual: string,
│           L3_spanish: string,
│           L4_solution: string
│         },
│         readability: {
│           fleschKincaid: number,
│           wordCount: number,
│           difficultWords: number,
│           avgSyllables: number
│         },
│         eloRating: number,
│         timesServed: number,
│         avgOutcome: number,
│         metadata: {
│           source: "curated" | "generated",
│           generatedBy: "gemini-2.0" | "qwen2.5-3b" | null,
│           reviewedBy: string | null,
│           createdAt: timestamp
│         }
│       }
│
├── questionIndex/                      // Denormalized for fast queries
│   └── byLevel/
│       └── {level}: {
│             questionIds: [...],
│             count: number
│           }
│
└── analytics/                          // Aggregated (Cloud Functions)
    ├── dailyStats/
    │   └── {date}: { activeUsers, sessionsCompleted, ... }
    └── cohortProgress/
        └── {cohortId}: { avgElo, avgLDS, avgMCS, ... }
```

#### Firestore Security Rules

```javascript
rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {
    // Users can only read/write their own data
    match /users/{uid}/{document=**} {
      allow read, write: if request.auth != null && request.auth.uid == uid;
    }
    // Questions are readable by all authenticated users
    match /questions/{questionId} {
      allow read: if request.auth != null;
      allow write: if false; // Only admin/Cloud Functions
    }
    // Question index readable by all authenticated users
    match /questionIndex/{document=**} {
      allow read: if request.auth != null;
      allow write: if false;
    }
    // Analytics only accessible by admin
    match /analytics/{document=**} {
      allow read, write: if false; // Cloud Functions only
    }
  }
}
```

---

### 2.4 Cloud Functions (Serverless Backend)

```
functions/
├── onUserCreate.ts          # Initialize adaptive state for new user
├── generateScaffolds.ts     # Call Gemini/SLM to create L1-L4 for a problem
├── batchGenerateQuestions.ts # Generate next 20 questions for session queue
├── processCustomProblem.ts  # "Input your question" flow
├── generateSessionReport.ts # End-of-session analytics
├── updateQuestionStats.ts   # Update question difficulty from outcomes
└── scheduledAnalytics.ts    # Daily aggregation (cron-triggered)
```

#### Key Cloud Function: `generateScaffolds`

```typescript
// Triggered when student submits a custom problem or when
// pre-generating scaffolds for database questions

interface ScaffoldRequest {
  problemText: string;
  studentGradeLevel: number;
  currentLDS: number;  // Informs simplification level
}

interface ScaffoldResponse {
  L1_simplified: string;   // Simplified English
  L2_bilingual: string;    // English with inline Spanish keywords
  L3_spanish: string;      // Full Spanish translation
  L4_solution: string;     // Step-by-step solution
  answer: string;
  answerNumeric: number;
}

// Prompt template for LLM
const SCAFFOLD_PROMPT = `
You are a bilingual math tutor helping Spanish-speaking students 
(grades 6-8) learn math in English.

Given this math word problem:
"{problemText}"

Generate 4 scaffold levels:

**L1 (Simplified English):** Rewrite using shorter sentences, 
simpler vocabulary (grade {adjustedGrade} reading level). 
Keep all math content identical.

**L2 (Bilingual Keywords):** Take the original problem and add 
Spanish translations in parentheses for key math and context 
vocabulary. Format: "English word (palabra en español)".

**L3 (Full Spanish Translation):** Translate the complete problem 
to natural, grade-appropriate Spanish. Ensure mathematical 
precision is maintained.

**L4 (Step-by-Step Solution):** Provide a clear, numbered 
step-by-step solution in English with the final numerical answer.

Return as JSON with keys: L1_simplified, L2_bilingual, L3_spanish, 
L4_solution, answer, answerNumeric.
`;
```

#### Key Cloud Function: `batchGenerateQuestions`

```typescript
// Called when student reaches question 17 of 20 (prefetch trigger)
// Selects next 20 questions from database based on adaptive state

export const batchGenerateQuestions = onCall(async (request) => {
  const { uid } = request.auth;
  const state = await getAdaptiveState(uid);
  
  // Thompson Sampling selects level distribution for next batch
  const levelDistribution = thompsonSampleBatch(
    state.thompsonPriors, 
    state.currentLevel,
    batchSize: 20
  );
  // e.g., { "2.1": 5, "2.2": 8, "2.3": 5, "2.4": 2 }
  
  // Select questions avoiding recently served ones
  const recentIds = await getRecentQuestionIds(uid, lookback: 100);
  const questions = await selectQuestions(
    levelDistribution, 
    excludeIds: recentIds,
    topicBalance: state.topicMastery  // Favor weaker topics
  );
  
  // Ensure all questions have scaffolds generated
  const withScaffolds = await ensureScaffoldsGenerated(questions);
  
  return { questions: withScaffolds, sessionBatchId: generateId() };
});
```

---

### 2.5 LLM Service Layer

#### V1: Gemini API (Current)

```
┌────────────┐     HTTPS/REST     ┌──────────────────┐
│  Cloud      │ ──────────────────►│  Google Gemini    │
│  Function   │ ◄──────────────────│  2.0 Flash API    │
└────────────┘                    └──────────────────┘

Cost: ~$0.075 per 1M input tokens, ~$0.30 per 1M output tokens
Latency: 200-800ms per scaffold generation
Rate limit: 60 RPM (free tier), 1000 RPM (paid)
```

#### V2: Qwen2.5-3B SLM (Planned)

```
┌────────────┐     HTTPS/REST     ┌──────────────────────────┐
│  Cloud      │ ──────────────────►│  HF Inference Endpoint   │
│  Function   │ ◄──────────────────│  Qwen2.5-3B-Instruct     │
└────────────┘                    │  (QLoRA fine-tuned)       │
                                  │  GPU: T4 or L4            │
                                  └──────────────────────────┘

Cost: ~$0.60/hr (T4) or ~$1.04/hr (L4)
Latency: 100-400ms per scaffold generation
Rate limit: Unlimited (dedicated endpoint)
```

#### LLM Client Abstraction

```typescript
// lib/llm-client.ts — Provider-agnostic interface

interface LLMProvider {
  generateScaffolds(problem: string, context: ScaffoldContext): Promise<ScaffoldResponse>;
  generateQuestion(level: string, topic: string): Promise<QuestionWithScaffolds>;
  validateAnswer(problem: string, studentAnswer: string, correctAnswer: string): Promise<AnswerValidation>;
}

class GeminiProvider implements LLMProvider { ... }
class QwenSLMProvider implements LLMProvider { ... }

// Factory with fallback
function createLLMClient(): LLMProvider {
  if (config.useSLM && config.slmEndpointAvailable) {
    return new QwenSLMProvider(config.slmEndpoint);
  }
  return new GeminiProvider(config.geminiApiKey);
}
```

---

### 2.6 SLM Fine-Tuning Pipeline

```
┌──────────────┐    ┌──────────────┐    ┌──────────────────┐
│  Training     │    │  Fine-Tune   │    │  Deploy          │
│  Data Prep    │───►│  QLoRA SFT   │───►│  HF Inference EP │
└──────────────┘    └──────────────┘    └──────────────────┘

Step 1: Collect 2,000-5,000 scaffold examples from Gemini V1 usage
Step 2: Human review + quality filter → ~1,500 gold examples
Step 3: QLoRA fine-tune Qwen2.5-3B-Instruct
Step 4: Evaluate on held-out test set (BLEU, math accuracy, readability)
Step 5: Deploy to HF Inference Endpoint
Step 6: Shadow-test alongside Gemini (serve both, compare quality)
Step 7: Full cutover when SLM matches Gemini quality
```

**Fine-tuning Configuration:**

| Parameter | Value | Rationale |
|---|---|---|
| Base model | Qwen2.5-3B-Instruct | Best math+Spanish at 3B scale |
| Method | QLoRA (4-bit NF4) | Fits single 16GB GPU |
| LoRA rank (r) | 32 | Balance quality/efficiency for small dataset |
| LoRA alpha | 64 | Standard 2× rank |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | Full attention + MLP |
| Learning rate | 2e-4 | Standard for QLoRA |
| Epochs | 3-5 | Small dataset, monitor val loss |
| Batch size | 4 (effective 16 with grad accum) | Memory constraint |
| Max sequence length | 1024 | Sufficient for problem + all 4 scaffolds |
| Warmup ratio | 0.05 | Short warmup for small dataset |

---

## 3. Data Flow Diagrams

### 3.1 Flow A: "Practice Problems" Mode

```
Student clicks "Start Practice"
         │
         ▼
┌─────────────────────────────────┐
│ 1. Load adaptive state from     │
│    Firestore (Elo, BKT, priors) │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 2. Thompson Sampling selects    │
│    next question level          │
│    (ZPD window: current ±2/+3)  │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 3. Fetch question from Firestore│
│    by level + topic balancing   │
│    (avoid recently served)      │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 4. Display problem, start timer │
│    Student reads and attempts   │
└────────────┬────────────────────┘
             │
    ┌────────┴────────┐
    │  Needs hints?   │
    ▼ No              ▼ Yes
┌─────────┐   ┌───────────────────┐
│ Submit  │   │ L1 → L2 → L3 → L4│
│ answer  │   │ (each click logged │
└────┬────┘   │  with timestamp)   │
     │        └────────┬───────────┘
     │                 │
     ▼                 ▼
┌─────────────────────────────────┐
│ 5. Compute weighted_outcome     │
│    based on correctness + hints │
│    Compute LDS and MCS          │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 6. Update Elo (student + Q)     │
│    Update BKT P(know) for topic │
│    Update Thompson Beta priors  │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 7. Progression decision:        │
│    increase / maintain / decrease│
│    Select next level            │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 8. Save interaction to Firestore│
│    Display "Next Problem"       │
└────────────┬────────────────────┘
             │
             ▼
    ┌────────┴────────┐
    │ Q17 of 20?      │
    ▼ Yes             ▼ No
┌─────────────┐   ┌──────────┐
│ Prefetch    │   │ Loop to  │
│ next batch  │   │ step 2   │
│ (Cloud Fn)  │   └──────────┘
└─────────────┘
             │
    At Q20:  ▼
┌─────────────────────────────────┐
│ 9. Generate session report      │
│    (Cloud Function)             │
│    Show summary to student      │
└─────────────────────────────────┘
```

### 3.2 Flow B: "Input Your Question" Mode

```
Student types/pastes a math word problem
         │
         ▼
┌─────────────────────────────────┐
│ 1. Cloud Function:              │
│    processCustomProblem         │
│    - Validate it's a math       │
│      word problem               │
│    - Extract answer/solution    │
│    - Call Gemini/SLM to generate│
│      L1, L2, L3, L4 scaffolds  │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 2. Estimate difficulty level    │
│    using readability metrics    │
│    (FK grade, word count, etc.) │
│    Map to nearest Elo rating    │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 3. Display problem with         │
│    scaffold buttons active      │
│    (same UI as Practice mode)   │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 4. Student interacts, solves    │
│    Same hint tracking as        │
│    Practice mode                │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 5. Update adaptive state        │
│    (Elo, BKT, Thompson)         │
│    Log interaction               │
└────────────┬────────────────────┘
             │
             ▼
┌─────────────────────────────────┐
│ 6. Offer: "Try another?" or     │
│    "Switch to Practice Mode"    │
│    (where engine auto-selects)  │
└─────────────────────────────────┘
```

---

## 4. API Contracts

### 4.1 Client → Cloud Functions

```typescript
// POST /generateScaffolds
interface GenerateScaffoldsRequest {
  problemText: string;
  gradeLevel: number;          // 6, 7, or 8
  currentLDS: number;          // 0.0-1.0, informs simplification
}
interface GenerateScaffoldsResponse {
  scaffolds: {
    L1_simplified: string;
    L2_bilingual: string;
    L3_spanish: string;
    L4_solution: string;
  };
  answer: string;
  answerNumeric: number;
  estimatedLevel: string;      // e.g., "2.3"
  estimatedElo: number;        // e.g., 1100
  processingTimeMs: number;
}

// POST /batchGenerateQuestions
interface BatchRequest {
  batchSize: number;           // default 20
  // Auth token provides uid → adaptive state looked up server-side
}
interface BatchResponse {
  questions: QuestionWithScaffolds[];
  sessionBatchId: string;
}

// POST /submitInteraction
interface InteractionSubmission {
  sessionId: string;
  questionId: string;
  answer: string;
  isCorrect: boolean;
  timeSpentMs: number;
  hintsUsed: number[];         // [0], [0,1], [0,1,2], etc.
  hintTimestamps: Record<string, number>;
  attempts: number;
}
interface InteractionResponse {
  weightedOutcome: number;
  lds: number;
  mcs: number;
  newElo: number;
  newLevel: string;
  decision: "increase" | "maintain" | "decrease" | "skip" | "rapid_decrease";
  nextQuestion: QuestionWithScaffolds;  // Pre-selected
}

// POST /generateSessionReport
interface SessionReportRequest {
  sessionId: string;
}
interface SessionReportResponse {
  summary: {
    questionsAttempted: number;
    questionsCorrect: number;
    avgWeightedOutcome: number;
    eloChange: number;
    topicsStrong: string[];
    topicsWeak: string[];
    avgLDS: number;
    avgMCS: number;
    languageProgressNote: string;  // Generated text about L2 progress
  };
  recommendations: string[];      // e.g., "Focus on fractions vocabulary"
}
```

---

## 5. Deployment Architecture

### 5.1 V1 Deployment (MVP)

```
┌──────────────────────────────────────────────────────────┐
│                    Firebase Project                        │
│                                                            │
│  ┌─────────────┐  ┌─────────────┐  ┌──────────────────┐  │
│  │  Firebase    │  │  Cloud       │  │  Cloud           │  │
│  │  Hosting     │  │  Firestore   │  │  Functions       │  │
│  │  (Next.js)   │  │  (Database)  │  │  (Node.js 20)   │  │
│  │              │  │              │  │                  │  │
│  │  Static +    │  │  Student     │  │  LLM calls       │  │
│  │  SSR pages   │  │  state,      │  │  Batch gen       │  │
│  │              │  │  questions,  │  │  Reports         │  │
│  │              │  │  sessions    │  │                  │  │
│  └──────────────┘  └──────────────┘  └───────┬──────────┘  │
│                                              │             │
└──────────────────────────────────────────────┼─────────────┘
                                               │
                                    HTTPS      │
                                               ▼
                                  ┌──────────────────┐
                                  │  Google Gemini    │
                                  │  2.0 Flash API    │
                                  └──────────────────┘

Estimated monthly cost (100 students, 5 sessions/week):
- Firebase Hosting: Free tier (~$0)
- Firestore: ~$5/mo (reads/writes within free tier mostly)
- Cloud Functions: ~$10/mo (invocations + compute)
- Gemini API: ~$15-25/mo (scaffold generation)
- Total: ~$30-40/mo
```

### 5.2 V2 Deployment (SLM)

```
┌──────────────────────────────────────────────────────────┐
│                    Firebase Project                        │
│                                                            │
│  ┌─────────────┐  ┌─────────────┐  ┌──────────────────┐  │
│  │  Firebase    │  │  Cloud       │  │  Cloud           │  │
│  │  Hosting     │  │  Firestore   │  │  Functions       │  │
│  └──────────────┘  └──────────────┘  └───────┬──────────┘  │
│                                              │             │
└──────────────────────────────────────────────┼─────────────┘
                                               │
                              ┌─────────────────┼──────────────┐
                              │                 │              │
                              ▼                 ▼              │
                   ┌──────────────────┐  ┌──────────────┐     │
                   │  HF Inference    │  │  Gemini API  │     │
                   │  Endpoint        │  │  (fallback)  │     │
                   │  Qwen2.5-3B     │  └──────────────┘     │
                   │  QLoRA FT       │                        │
                   │  (T4 GPU)       │  Shadow testing:       │
                   └──────────────────┘  Both called, SLM     │
                                         response served,     │
                                         Gemini response      │
                                         logged for QA        │
                                         ─────────────────────┘

Estimated monthly cost (100 students):
- Firebase: ~$15/mo (same as V1)
- HF Inference Endpoint (T4, scale-to-zero): ~$50-100/mo
  (active only during school hours, ~8hrs/day × 20 days)
- Gemini fallback: ~$5/mo (only when SLM is cold)
- Total: ~$70-120/mo (but no per-token costs at scale)
```

### 5.3 V3 Deployment (Scale)

```
When student count exceeds 500+, migrate to:

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │  Vercel       │  │  Firebase    │  │  Cloud Run       │  │
│  │  (Next.js)    │  │  Firestore   │  │  (API server)    │  │
│  └──────────────┘  └──────────────┘  └───────┬──────────┘  │
│                                              │             │
│                    ┌─────────────────────────┼──────┐      │
│                    │                         │      │      │
│                    ▼                         ▼      │      │
│         ┌──────────────────┐    ┌─────────────────┐ │      │
│         │  HF Inference EP │    │  IRT/DKT Model  │ │      │
│         │  Qwen2.5-3B     │    │  Server          │ │      │
│         │  (Auto-scaling)  │    │  (Python/FastAPI)│ │      │
│         └──────────────────┘    └─────────────────┘ │      │
│                                                     │      │
│  + Deep Knowledge Tracing (DKT) replaces BKT        │      │
│  + IRT item calibration from pooled student data     │      │
│  + A/B testing framework for algorithm improvements  │      │
└─────────────────────────────────────────────────────────────┘
```

---

## 6. Technology Stack Summary

| Layer | Technology | Justification |
|---|---|---|
| Frontend Framework | Next.js 14+ (App Router) | SSR for SEO, React ecosystem, TypeScript |
| UI Styling | Tailwind CSS + shadcn/ui | Rapid prototyping, consistent design |
| Math Rendering | KaTeX | Fast client-side LaTeX rendering |
| Charts | Recharts | React-native charting for dashboards |
| Authentication | Firebase Auth | Google Sign-In, simple integration |
| Database | Cloud Firestore | Real-time sync, offline support, serverless |
| Serverless Functions | Firebase Cloud Functions (Node.js 20) | Low latency, Firebase integration |
| LLM (V1) | Google Gemini 2.0 Flash | Low cost, fast, good multilingual |
| SLM (V2) | Qwen2.5-3B-Instruct (QLoRA fine-tuned) | Best math+Spanish at 3B, Apache 2.0 |
| SLM Hosting | HF Inference Endpoints (T4, scale-to-zero) | Cost-effective, no infra management |
| Adaptive Engine | Client-side TypeScript | Zero-latency decisions, works offline |
| State Management | Zustand + Firestore sync | Lightweight, persists across sessions |
| Testing | Vitest + Playwright | Unit + E2E testing |
| CI/CD | GitHub Actions | Automated testing + Firebase deploy |
| Monitoring | Firebase Analytics + Crashlytics | User behavior + error tracking |

---

## 7. Security & Privacy Considerations

### 7.1 Data Protection
- **COPPA Compliance**: Students are minors (ages 11-14). No personally identifiable information stored beyond email/display name. No third-party tracking.
- **FERPA Alignment**: Performance data (Elo, LDS, MCS) is associated with uid only. Teachers/admins see aggregate data, never individual student identifiers.
- **Data Encryption**: Firestore encrypts at rest (AES-256). All API calls over HTTPS/TLS 1.3.

### 7.2 API Security
- Firebase Auth tokens required for all Cloud Function calls
- Gemini/SLM API keys stored in Firebase environment secrets (never client-side)
- Rate limiting on Cloud Functions to prevent abuse (max 10 scaffold generations per minute per user)

### 7.3 Content Safety
- All LLM-generated scaffolds pass through a validation function checking:
  - Mathematical accuracy (answer matches expected)
  - Appropriate content (no adult/violent themes)
  - Language accuracy (Spanish translation verified against expected pattern)
- Questions from the curated database are pre-reviewed; generated questions flagged for human review

---

## 8. Performance Targets

| Metric | Target | Measurement |
|---|---|---|
| Time to first problem display | < 2 seconds | Lighthouse / Firebase Performance |
| Adaptive decision latency | < 50ms | Client-side (no network) |
| Scaffold generation (Gemini) | < 1.5 seconds | Cloud Function logs |
| Scaffold generation (SLM) | < 800ms | HF Inference EP metrics |
| Batch prefetch trigger → ready | < 5 seconds | 20 questions fetched at Q17 |
| Offline capability | Full session | After initial batch load |
| Concurrent users (V1) | 50 | Firebase free/Blaze tier |
| Concurrent users (V2) | 500+ | HF auto-scaling endpoint |