muthuk1 commited on
Commit
8dd8a8d
·
verified ·
1 Parent(s): a0ebf39

Add comprehensive MeDo-styled prompt document for recreating the full project

Browse files
Files changed (1) hide show
  1. PROMPT.md +627 -0
PROMPT.md ADDED
@@ -0,0 +1,627 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🧠 PROMPT.md — Build OpenMAIC from Scratch
2
+
3
+ > **MeDo-Styled Prompts** to recreate the full OpenMAIC AI Interactive Classroom application.
4
+ > One-shot (single mega-prompt) and Multi-shot (phased build) variants included.
5
+
6
+ ---
7
+
8
+ ## 📋 Table of Contents
9
+
10
+ 1. [Project Overview](#-project-overview)
11
+ 2. [Architecture Blueprint](#-architecture-blueprint)
12
+ 3. [One-Shot Mega Prompt](#-one-shot-mega-prompt)
13
+ 4. [Multi-Shot Phased Prompts](#-multi-shot-phased-prompts)
14
+ - Phase 1: Foundation & Scaffold
15
+ - Phase 2: Data Layer & State Management
16
+ - Phase 3: AI Provider System
17
+ - Phase 4: Generation Pipeline
18
+ - Phase 5: Slide Renderer & Canvas
19
+ - Phase 6: Multi-Agent Orchestration
20
+ - Phase 7: Playback Engine & Roundtable
21
+ - Phase 8: Interactive Widgets & PBL
22
+ - Phase 9: Media Generation Pipeline
23
+ - Phase 10: Audio System (TTS/ASR)
24
+ - Phase 11: Chat System & Streaming
25
+ - Phase 12: Settings, i18n, Export, Polish
26
+ 5. [Key Technical Decisions](#-key-technical-decisions)
27
+
28
+ ---
29
+
30
+ ## 🎯 Project Overview
31
+
32
+ **OpenMAIC** is an open-source AI interactive classroom platform. Users upload a PDF, the system generates an immersive multi-agent learning experience with:
33
+
34
+ - **AI-generated slide presentations** from PDF content
35
+ - **Multi-agent roundtable discussions** (teacher, assistant, student agents)
36
+ - **Real-time TTS/ASR** for voice-driven lectures
37
+ - **Interactive whiteboard** with collaborative drawing
38
+ - **Quiz generation** with auto-grading
39
+ - **Interactive widgets** (simulations, diagrams, code editors, 3D visualizations, games)
40
+ - **Project-Based Learning (PBL)** mode with MCP tool-calling agents
41
+ - **Media generation** (AI images + videos embedded in slides)
42
+ - **PowerPoint export** and classroom zip import/export
43
+ - **5-language i18n** (zh-CN, en-US, ja-JP, ru-RU, ar-SA)
44
+
45
+ ### Tech Stack
46
+
47
+ | Layer | Technology |
48
+ |-------|-----------|
49
+ | Framework | React 19 + Vite 6 |
50
+ | Routing | React Router DOM v7 |
51
+ | State | Zustand 5 (5 stores: stage, canvas, settings, snapshot, keyboard) |
52
+ | Storage | Dexie (IndexedDB) — stages, scenes, audio, images, chat, outlines |
53
+ | UI | shadcn/ui (Radix primitives) + Tailwind CSS v4 + Motion (Framer) |
54
+ | AI SDK | Vercel AI SDK 6 + LangGraph (multi-agent director graph) |
55
+ | AI Providers | OpenAI, Anthropic, Google Gemini, DeepSeek, Qwen, GLM, MiniMax, Ollama, OpenRouter, +10 more |
56
+ | TTS/ASR | OpenAI, Azure, GLM, Qwen, MiniMax, Doubao, ElevenLabs, Browser native, VoxCPM |
57
+ | Image Gen | Seedream, OpenAI Image, Qwen Image, Nano Banana, MiniMax, Grok |
58
+ | Video Gen | Seedance, Kling, Veo, Sora, MiniMax, Grok |
59
+ | Rich Text | ProseMirror (custom schema with marks for bold/italic/underline/color/align/indent/lists) |
60
+ | Charts | ECharts 6 |
61
+ | Diagrams | @xyflow/react (React Flow) |
62
+ | Export | pptxgenjs (PowerPoint), JSZip (classroom archives) |
63
+ | Math | KaTeX + Temml + mathml2omml (for PPTX export) |
64
+ | Code Highlighting | Shiki |
65
+ | PDF Parsing | unpdf + MinerU cloud + custom providers |
66
+
67
+ ---
68
+
69
+ ## 🏗 Architecture Blueprint
70
+
71
+ ### Data Flow
72
+
73
+ ```
74
+ PDF Upload → Outline Generation (SSE stream) → Scene Content Generation → Action Generation
75
+
76
+ Media Generation (parallel)
77
+
78
+ IndexedDB Storage (Dexie)
79
+
80
+ Playback Engine (state machine)
81
+
82
+ Roundtable UI ←→ Chat System ←→ Multi-Agent LangGraph
83
+ ```
84
+
85
+ ### Store Architecture (Zustand)
86
+
87
+ | Store | Purpose | Persistence |
88
+ |-------|---------|-------------|
89
+ | `useStageStore` | Current stage, scenes, outlines, generation status | IndexedDB |
90
+ | `useCanvasStore` | Viewport, zoom, selected elements, editing state | Memory |
91
+ | `useSettingsStore` | All provider configs, model selection, UI prefs | localStorage |
92
+ | `useSnapshotStore` | Undo/redo history | Memory |
93
+ | `useKeyboardStore` | Keyboard shortcut state | Memory |
94
+ | `useMediaGenerationStore` | Image/video generation tasks and status | IndexedDB |
95
+ | `useWhiteboardHistoryStore` | Whiteboard undo/redo per scene | Memory |
96
+ | `useWidgetIframeStore` | Widget iframe communication state | Memory |
97
+ | `useUserProfileStore` | User nickname, avatar, bio | localStorage |
98
+ | `useAgentRegistry` | Agent configs (default + custom + generated) | localStorage |
99
+
100
+ ### Database Schema (Dexie/IndexedDB)
101
+
102
+ ```
103
+ stages: id, name, description, createdAt, updatedAt, languageDirective, style, agentIds
104
+ scenes: id, stageId, type, title, order, content (JSON), actions (JSON), whiteboard (JSON)
105
+ audioFiles: id (audioId), blob, duration, format, text, voice
106
+ imageStore: id (storageId), blob, mimeType
107
+ chatSessions: id, stageId, sceneId, type, status, messages, config
108
+ outlines: id, stageId, outlines (JSON array)
109
+ mediaFiles: id (stageId_elementId), blob, mimeType
110
+ generatedAgents: id (stageId), agents (AgentConfig[])
111
+ ```
112
+
113
+ ### Prompt Template System
114
+
115
+ File-based with composition:
116
+ - `lib/prompts/templates/{promptId}/system.md` + `user.md`
117
+ - `lib/prompts/snippets/{name}.md`
118
+ - Syntax: `{{variable}}`, `{{snippet:name}}`, `{{#if flag}}...{{/if}}`
119
+ - 20+ prompt templates for outlines, slides, quizzes, actions, widgets, PBL, agents
120
+
121
+ ### Action System
122
+
123
+ Two categories executed by ActionEngine:
124
+ - **Fire-and-forget**: `spotlight`, `laser`, `play_video`
125
+ - **Synchronous** (wait for completion): `speech`, `wb_open`, `wb_close`, `wb_draw_text`, `wb_draw_shape`, `wb_draw_chart`, `wb_draw_latex`, `wb_draw_table`, `wb_draw_line`, `wb_draw_code`, `wb_edit_code`, `wb_clear`, `wb_delete`, `discussion`
126
+ - **Widget actions**: `widget_highlight`, `widget_setState`, `widget_annotation`, `widget_reveal`
127
+
128
+ ### Multi-Agent Orchestration (LangGraph)
129
+
130
+ ```
131
+ START → director ──(end)──→ END
132
+
133
+ └─(next)→ agent_generate ──→ director (loop)
134
+ ```
135
+
136
+ - Director: LLM-based for multi-agent, code-only for single-agent
137
+ - Agents: teacher (full slide+whiteboard control), assistant (whiteboard), student (whiteboard, short responses)
138
+ - Per-agent: persona prompt, allowed actions, TTS voice, avatar, color
139
+
140
+ ### Playback Engine State Machine
141
+
142
+ ```
143
+ start() pause()
144
+ idle ──────────→ playing ──────────→ paused
145
+ ▲ ▲ │
146
+ │ │ resume() │
147
+ │ └───────────────────┘
148
+ │ handleEndDiscussion()
149
+ │ confirmDiscussion()
150
+ └──────────────── live ──────────→ paused
151
+ ```
152
+
153
+ ---
154
+
155
+ ## 🚀 ONE-SHOT MEGA PROMPT
156
+
157
+ > Use this single prompt to generate the entire application in one conversation.
158
+
159
+ ```
160
+ Me: Build me "OpenMAIC" — an open-source AI interactive classroom platform.
161
+
162
+ Do: Create a React 19 + Vite 6 + TypeScript application with these EXACT specifications:
163
+
164
+ ### CORE SETUP
165
+ - React Router DOM v7 with 4 lazy-loaded routes: / (HomePage), /classroom/:id (ClassroomPage), /generation-preview (GenerationPreviewPage), /eval/whiteboard (WhiteboardEvalPage)
166
+ - App.tsx wraps all routes in: BrowserRouter → ThemeProvider → I18nProvider → ServerProvidersInit → AccessCodeGuard → Toaster
167
+ - Tailwind CSS v4 with PostCSS, shadcn/ui (Radix-based), oklch color system with light/dark mode via CSS variables
168
+ - Path alias @/* → ./src/*
169
+
170
+ ### STATE MANAGEMENT (Zustand 5)
171
+ Create 10 Zustand stores:
172
+ 1. **useStageStore** — stage (name, description, languageDirective, style, agentIds), scenes[], currentSceneId, outlines[], generationStatus, chats[], mode ('autonomous'|'playback'). Actions: loadFromStorage, saveToStorage, addScene, setCurrentSceneId. Debounced IndexedDB persistence.
173
+ 2. **useCanvasStore** — viewportSize, canvasScale, selectedElementIds, editingElementId, isDrawing, creatingElement, ctrlOrShiftKeyActive. All canvas interaction state.
174
+ 3. **useSettingsStore** — providerId, modelId, providersConfig (unified JSON), thinkingConfigs, ttsProviderId, ttsVoice, ttsSpeed, asrProviderId, imageProviderId, videoProviderId, pdfProviderId, webSearchProviderId, playbackSpeed, sidebarCollapsed, chatAreaWidth, agentMode ('preset'|'auto'), selectedAgentIds. Persisted via zustand/persist to localStorage. Has fetchServerProviders() to merge server-configured providers.
175
+ 4. **useSnapshotStore** — history stack for undo/redo
176
+ 5. **useKeyboardStore** — keyboard shortcut state
177
+ 6. **useMediaGenerationStore** — tasks map (elementId → {status, objectUrl, error}), enqueueTasks, restoreFromDB
178
+ 7. **useWhiteboardHistoryStore** — per-scene whiteboard undo/redo
179
+ 8. **useWidgetIframeStore** — widget iframe postMessage communication
180
+ 9. **useUserProfileStore** — nickname, avatar, bio. Persisted localStorage.
181
+ 10. **useAgentRegistry** — agents map (id → AgentConfig), addAgent, updateAgent, deleteAgent. 3 default agents: teacher (AI teacher), assistant (AI助教), student (好奇学生). Each has: id, name, role, persona, avatar, color, allowedActions, priority, voiceConfig, isDefault, isGenerated, boundStageId.
182
+
183
+ ### DATABASE (Dexie/IndexedDB)
184
+ Database name 'maic-local-db' with tables: stages, scenes, audioFiles, imageStore, chatSessions, outlines, mediaFiles, generatedAgents. Full CRUD operations. Stage storage utilities: listStages, deleteStageData, renameStage, getFirstSlideByStages.
185
+
186
+ ### AI PROVIDER SYSTEM
187
+ - Unified provider registry (PROVIDERS) with 15+ providers: openai, anthropic, google, deepseek, qwen, kimi, glm, minimax, siliconflow, doubao, hunyuan, xiaomi, grok, openrouter, ollama
188
+ - Each provider: id, name, type ('openai'|'anthropic'|'google'|'openai-compatible'), defaultBaseUrl, requiresApiKey, icon, models[]
189
+ - Each model: id, name, contextWindow, outputWindow, capabilities (streaming, tools, vision, thinking)
190
+ - createLanguageModel() factory using @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google
191
+ - Thinking config system: toggleable, budgetAdjustable, defaultEnabled per model
192
+ - callLLM() and streamLLM() wrappers with thinking support
193
+
194
+ ### GENERATION PIPELINE (Two-Stage)
195
+ **Stage 1 — Outline Generation:**
196
+ - POST /api/generate/scene-outlines-stream → SSE stream
197
+ - Input: PDF text + images + user requirements + agents + language
198
+ - Output: SceneOutline[] with type ('slide'|'quiz'|'interactive'|'pbl'), title, notes, order, mediaGenerations[]
199
+ - Incremental JSON parsing from LLM stream
200
+ - Generation Preview page with step-by-step visualization (outline streaming → agent profile generation → scene content → navigate to classroom)
201
+
202
+ **Stage 2 — Scene Generation (per outline):**
203
+ - generateSceneContent() → POST /api/generate/scene-content — generates slides/quiz/interactive/pbl content
204
+ - generateSceneActions() → POST /api/generate/scene-actions — generates teacher speech + visual actions for each scene
205
+ - createSceneWithActions() — assembles Scene object with elements, actions, whiteboard data
206
+ - Interactive post-processor: sanitizes HTML widgets, injects CSS isolation
207
+ - Action parser: extracts structured [{type, name, params}, {type: "text", content}] from LLM output
208
+
209
+ ### SLIDE TYPE SYSTEM
210
+ Scene types with specific content structures:
211
+ - **slide**: PPTElement[] (text, image, shape, line, chart, table, latex, video, audio, code elements). Each element has: id, type, left, top, width, height, rotate, opacity, shadow, outline, fill, link, groupId, lock, name.
212
+ - **quiz**: QuizQuestion[] with type (single-choice, multiple-choice, fill-in-blank, true-false, short-answer), question text, options, correctAnswer, explanation
213
+ - **interactive**: WidgetConfig with type (simulation, diagram, code, game, visualization3d), HTML/iframe content, teacherActions[]
214
+ - **pbl**: PBLProjectConfig with roles, issues, workspaces
215
+
216
+ ### SLIDE RENDERER
217
+ Full PowerPoint-compatible renderer in React:
218
+ - Editor/Canvas with viewport scaling, drag/drop, resize handles, rotation handles, alignment lines, grid, ruler
219
+ - Element types: TextElement (ProseMirror), ImageElement (clip masks, filters), ShapeElement (SVG paths, gradients, patterns), LineElement (cubic bezier, markers), ChartElement (ECharts), TableElement, LatexElement (KaTeX), VideoElement, CodeElement (Shiki)
220
+ - ThumbnailSlide for sidebar previews
221
+ - ScreenElement for presentation mode
222
+ - useScaleElement, useDragElement, useRotateElement, useSelectElement hooks
223
+
224
+ ### MULTI-AGENT ORCHESTRATION (LangGraph)
225
+ - StateGraph with OrchestratorState annotation
226
+ - Director node: multi-agent LLM decision or single-agent code-only
227
+ - agent_generate node: builds structured prompt per agent, streams response with tool calls
228
+ - statelessGenerate(): single-pass generation from messages + storeState
229
+ - Prompt builder: role guidelines per agent type, whiteboard ledger context, peer context, state context
230
+ - SSE streaming: StatelessEvent chunks with {type: 'text'|'tool_call'|'done'|'error', agentId, content}
231
+
232
+ ### PLAYBACK ENGINE
233
+ Class-based PlaybackEngine with state machine (idle → playing → paused, idle → live → paused):
234
+ - Consumes Scene.actions[] sequentially via ActionEngine
235
+ - ActionEngine: processes spotlight (highlight element), laser (pointer effect), speech (TTS), whiteboard actions, discussion triggers
236
+ - Speech TTS: fetches audio from /api/generate/tts, plays via AudioPlayer, shows speech overlay
237
+ - Discussion triggers: pause playback, switch to live mode, enable chat input
238
+ - Auto-resume generation for pending outlines on classroom load
239
+ - Speed control: 1x, 1.25x, 1.5x, 2x
240
+
241
+ ### ROUNDTABLE UI (95KB component)
242
+ Main classroom interaction panel with:
243
+ - Voice waveform animation during speech
244
+ - Agent avatars with speaking indicator
245
+ - Chat input with voice recording (ASR)
246
+ - Proactive discussion cards
247
+ - Slide navigation controls
248
+ - Presentation mode (fullscreen)
249
+ - Whiteboard toggle
250
+ - Playback progress bar
251
+ - Speed selector
252
+ - Thinking state indicator
253
+
254
+ ### CHAT SYSTEM
255
+ - ChatSession: id, type (qa|discussion|lecture), status, messages (UIMessage[]), config (agentIds, maxTurns, triggerAgentId)
256
+ - useChatSessions hook: manages sessions, sends messages via POST /api/chat SSE, handles interruption, persists to IndexedDB
257
+ - StreamBuffer: buffers SSE text chunks, reveals text word-by-word for natural speech feel
258
+ - Message metadata: senderName, senderAvatar, agentId, agentColor, actions (spotlight/highlight/insert)
259
+ - Chat area with session list, message bubbles, inline action tags, lecture notes view
260
+
261
+ ### WHITEBOARD SYSTEM
262
+ - Collaborative whiteboard overlay on slides
263
+ - Elements: text, shapes (rect/circle/triangle), charts, LaTeX, tables, lines, code blocks
264
+ - ActionEngine executes wb_draw_*, wb_delete, wb_clear, wb_open, wb_close
265
+ - WhiteboardCanvas with drawing tools
266
+ - WhiteboardHistory for undo/redo
267
+ - Whiteboard conflicts summarizer for multi-agent coordination
268
+
269
+ ### INTERACTIVE WIDGETS
270
+ 5 widget types, each generates self-contained HTML rendered in sandboxed iframe:
271
+ 1. **Simulation**: variable sliders, physics/math simulations, presets
272
+ 2. **Diagram**: React Flow nodes/edges, decision trees, flowcharts
273
+ 3. **Code**: executable code editor with output panel
274
+ 4. **Game**: educational games (quizzes, puzzles)
275
+ 5. **Visualization3D**: Three.js/WebGL 3D models
276
+ - Widget teacher actions: highlight, setState, annotation, reveal
277
+ - postMessage bridge for iframe ↔ parent communication
278
+
279
+ ### PBL (Project-Based Learning)
280
+ - Agentic loop using Vercel AI SDK generateText + stopWhen
281
+ - MCP tools: ModeMCP, ProjectMCP, AgentMCP, IssueboardMCP
282
+ - Generates: project config, roles, issues, workspaces
283
+ - PBL renderer with: role selection, chat panel, issue board, workspace, guide
284
+
285
+ ### MEDIA GENERATION
286
+ - Media orchestrator dispatches parallel API calls
287
+ - Image providers: Seedream (ByteDance), OpenAI Image, Qwen Image, Nano Banana, MiniMax, Grok
288
+ - Video providers: Seedance (ByteDance), Kling (Kuaishou), Veo (Google), MiniMax, Grok
289
+ - Async task pattern: submit → poll → download blob → IndexedDB
290
+ - MediaGenerationStore tracks task status per elementId
291
+
292
+ ### AUDIO SYSTEM
293
+ - TTS providers: OpenAI, Azure, GLM, Qwen, MiniMax, Doubao, ElevenLabs, VoxCPM, Browser native
294
+ - ASR providers: OpenAI Whisper, Qwen ASR, Browser native
295
+ - Voice resolver: maps agent voices across providers
296
+ - AudioPlayer: Web Audio API playback with speed control
297
+ - useAudioRecorder: MediaRecorder API for voice input
298
+ - useBrowserTTS/useDiscussionTTS: manages TTS lifecycle during discussions
299
+
300
+ ### EXPORT SYSTEM
301
+ - PowerPoint export via pptxgenjs: converts PPTElement[] to PPTX with shapes, images, charts, tables, LaTeX→OMML
302
+ - Classroom ZIP export/import: stages + scenes + audio + images + agents as portable archive
303
+ - HTML parser for slide text → PPTX rich text conversion
304
+ - SVG path parser for shape export
305
+ - LaTeX → OMML converter via mathml2omml
306
+
307
+ ### I18N SYSTEM
308
+ - i18next + react-i18next + resources-to-backend
309
+ - 5 locales: zh-CN, en-US, ja-JP, ru-RU, ar-SA
310
+ - Dynamic import: `import(\`./locales/${language}.json\`)`
311
+ - useI18n hook with locale detection from localStorage/navigator
312
+ - Language switcher component
313
+
314
+ ### API ROUTES (27 endpoints)
315
+ All under /api/:
316
+ - /chat — SSE chat stream (multi-agent)
317
+ - /generate/scene-outlines-stream — SSE outline generation
318
+ - /generate/scene-content — scene content generation
319
+ - /generate/scene-actions — scene action generation
320
+ - /generate/agent-profiles — agent profile generation
321
+ - /generate/image — image generation
322
+ - /generate/video — video generation
323
+ - /generate/tts — single TTS audio generation
324
+ - /parse-pdf — PDF parsing
325
+ - /classroom — CRUD for server-stored classrooms
326
+ - /classroom-media/[classroomId]/[...path] — media file serving
327
+ - /generate-classroom — background classroom generation job
328
+ - /generate-classroom/[jobId] — job status polling
329
+ - /quiz-grade — LLM-based quiz grading
330
+ - /pbl/chat — PBL runtime chat
331
+ - /web-search — Tavily web search
332
+ - /proxy-media — CORS proxy for remote media
333
+ - /server-providers — server-configured provider list
334
+ - /verify-model, /verify-image-provider, /verify-video-provider, /verify-pdf-provider — credential verification
335
+ - /azure-voices — Azure TTS voice list
336
+ - /transcription — audio transcription
337
+ - /access-code/status, /access-code/verify — access code authentication
338
+ - /health — health check
339
+
340
+ ### SECURITY
341
+ - Access code guard (HMAC-signed cookie)
342
+ - SSRF guard for server-side URL fetching
343
+ - Content Security Policy headers
344
+ - Input validation on all API routes
345
+
346
+ ### TESTING
347
+ - Vitest for unit tests (29 test files)
348
+ - Playwright for E2E tests (4 test suites)
349
+ - Evaluation framework for whiteboard layout scoring and outline language detection
350
+
351
+ Build the complete application with all 950+ source files, full type safety, and production-ready error handling.
352
+ ```
353
+
354
+ ---
355
+
356
+ ## 🔄 MULTI-SHOT PHASED PROMPTS
357
+
358
+ ### Phase 1: Foundation & Scaffold
359
+
360
+ ```
361
+ Me: Start building "OpenMAIC" — an AI interactive classroom. Set up the project foundation.
362
+
363
+ Do:
364
+ 1. Initialize React 19 + Vite 6 + TypeScript project
365
+ 2. Configure: Tailwind CSS v4 with PostCSS, path alias @/* → ./src/*, oklch color system
366
+ 3. Install core deps: react-router-dom, zustand, dexie, lucide-react, motion, sonner, clsx, tailwind-merge, class-variance-authority, nanoid, zod
367
+ 4. Install shadcn/ui components: button, dialog, dropdown-menu, popover, tooltip, tabs, input, textarea, select, checkbox, switch, slider, scroll-area, command, alert-dialog, card, badge, carousel, separator, progress, label, hover-card, context-menu, collapsible, avatar, alert, field, input-group, button-group, combobox
368
+ 5. Create App.tsx with BrowserRouter wrapping: ThemeProvider → I18nProvider → AccessCodeGuard → lazy routes (/, /classroom/:id, /generation-preview, /eval/whiteboard) → Toaster
369
+ 6. Create ThemeProvider (light/dark/system, localStorage persist, document.documentElement.classList toggle)
370
+ 7. Create I18nProvider with i18next + react-i18next + resources-to-backend, 5 locales (zh-CN, en-US, ja-JP, ru-RU, ar-SA), dynamic JSON imports, localStorage locale persistence
371
+ 8. Create globals.css with full oklch color system (:root + .dark), CSS custom properties for all shadcn tokens, Tailwind @theme inline block, ProseMirror styles, animation keyframes (wave, shimmer, breathing-bar, interactive-mode-breathe)
372
+ 9. Create createLogger utility (timestamp + level + tag formatting)
373
+ 10. Verify: `vite build` succeeds with 0 errors
374
+ ```
375
+
376
+ ### Phase 2: Data Layer & State Management
377
+
378
+ ```
379
+ Me: Build the data layer and state management for OpenMAIC.
380
+
381
+ Do:
382
+ 1. Create Dexie database 'maic-local-db' with tables: stages (id, name, description, createdAt, updatedAt, languageDirective, style, currentSceneId, agentIds, interactiveMode), scenes (id, stageId, type, title, order, content, actions, whiteboard), audioFiles (id, blob, duration, format, text, voice), imageStore (id, blob, mimeType), chatSessions (id, stageId, sceneId, type, status, messages, config), outlines (id, stageId, outlines), mediaFiles (id, blob, mimeType), generatedAgents (id, agents)
383
+ 2. Create stage-storage utilities: listStages() → StageListItem[], deleteStageData(), renameStage(), getFirstSlideByStages() → Record<string, Slide>
384
+ 3. Create image-storage utilities: storePdfBlob(), loadPdfBlob(), storeImages(), loadImageMapping(), cleanupOldImages()
385
+ 4. Create useStageStore (Zustand): stage, scenes[], currentSceneId, outlines[], chats[], mode, generationStatus, generationEpoch, failedOutlines[], toolbarState. Actions: setStage, addScene, updateScene, deleteScene, setCurrentSceneId, loadFromStorage (IndexedDB → state), saveToStorage (debounced state → IndexedDB), getCurrentScene()
386
+ 5. Create useCanvasStore: viewportSize, canvasScale, selectedElementIds[], editingElementId, isDrawing, creatingElement, ctrlOrShiftKeyActive, showGridLines, showRuler, snapToGrid
387
+ 6. Create useSettingsStore with zustand/persist: providerId, modelId, thinkingConfigs, providersConfig, ttsProviderId/Voice/Speed, asrProviderId/Language, imageProviderId, videoProviderId, pdfProviderId, webSearchProviderId, playbackSpeed, sidebarCollapsed, chatAreaWidth, chatAreaCollapsed, agentMode, selectedAgentIds, fetchServerProviders(), all setters. Validate provider/model on rehydration.
388
+ 7. Create useSnapshotStore, useKeyboardStore, useMediaGenerationStore, useWhiteboardHistoryStore, useWidgetIframeStore, useUserProfileStore
389
+ 8. Create useAgentRegistry with zustand/persist: agents map, 3 default agents (teacher: "AI teacher" with full slide+whiteboard actions priority 10, assistant: "AI助教" with whiteboard-only priority 5, student: "好奇学生" with whiteboard-only priority 3). Each agent: id, name, role, persona (detailed teaching style), avatar, color, allowedActions, priority, voiceConfig, isDefault
390
+ 9. Define all TypeScript types in lib/types/: slides.ts (PPTElement union with 10 element types, Slide, SlideTheme, SlideBackground), action.ts (20+ action types), stage.ts (Stage, Scene, SceneType, StageMode), chat.ts (ChatSession, StatelessChatRequest/Event), generation.ts (SceneOutline, UserRequirements, PdfImage), provider.ts (ProviderId, ProviderConfig, ModelInfo, ThinkingConfig), widgets.ts (5 widget configs), settings.ts, roundtable.ts, web-search.ts, pdf.ts, edit.ts, export.ts
391
+ ```
392
+
393
+ ### Phase 3: AI Provider System
394
+
395
+ ```
396
+ Me: Build the AI provider system with 15+ LLM providers.
397
+
398
+ Do:
399
+ 1. Create PROVIDERS registry with full configs for: openai (gpt-4o, gpt-5.5, o3-mini, o4-mini), anthropic (claude-4-sonnet, claude-3.7-sonnet), google (gemini-2.5-pro, gemini-2.5-flash), deepseek (deepseek-chat, deepseek-reasoner), qwen (qwen3-235b, qwen-max, qwen-plus), kimi (moonshot-v1-auto), glm (glm-4-plus, glm-z1-air), minimax (MiniMax-M1), siliconflow (meta-llama, Qwen, DeepSeek), doubao (doubao-pro, doubao-1.5-pro), hunyuan, xiaomi (MiMo-7B), grok (grok-3), openrouter (pass-through), ollama (local models)
400
+ 2. Each provider: type ('openai'|'anthropic'|'google'|'openai-compatible'), defaultBaseUrl, requiresApiKey, icon path, models[] with contextWindow, outputWindow, capabilities (streaming, tools, vision, thinking config)
401
+ 3. createLanguageModel(config) factory: routes to @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google based on provider type. OpenAI-compatible providers use createOpenAI with custom baseURL.
402
+ 4. Thinking config system: ThinkingConfig = {mode: 'disabled'|'auto'|'manual', enabled, budget?}. Per-model thinking capability (toggleable, budgetAdjustable, defaultEnabled). getThinkingMode() and pickThinkingBudget() utilities.
403
+ 5. callLLM(model, options) — single-shot generation with thinking support. streamLLM(model, options) — streaming with thinking.
404
+ 6. Model metadata: applyModelMetadata() enriches model configs with catalog data. getCatalogThinkingCapability() returns thinking support level.
405
+ 7. Server-side resolveModel() — resolves model string + API key + base URL into LanguageModel instance, handles server-configured providers from env vars.
406
+ ```
407
+
408
+ ### Phase 4: Generation Pipeline
409
+
410
+ ```
411
+ Me: Build the two-stage generation pipeline for creating classroom content from PDF.
412
+
413
+ Do:
414
+ 1. Create prompt template system: lib/prompts/ with loader.ts (loadPrompt, buildPrompt, interpolateVariables, processSnippets, processConditionalBlocks), types.ts (PromptId, SnippetId). Templates in templates/{promptId}/system.md + user.md. Snippets in snippets/*.md. Syntax: {{variable}}, {{snippet:name}}, {{#if flag}}...{{/if}}.
415
+ 2. Create 20+ prompt templates: requirements-to-outlines, interactive-outlines, slide-content, quiz-content, slide-actions, quiz-actions, interactive-actions, simulation-content, diagram-content, code-content, game-content, visualization3d-content, widget-teacher-actions, pbl-actions, pbl-design, agent-system (4 variants: base, wb-teacher, wb-assistant, wb-student), director, web-search-query-rewrite
416
+ 3. Stage 1 — Outline Generator: generateSceneOutlinesFromRequirements() builds prompt from PDF content + user requirements + agent info + language directive. SSE API endpoint streams outlines as incremental JSON objects. Frontend GenerationPreview page shows step visualization.
417
+ 4. Stage 2 — Scene Generator: generateSceneContent(outline, context, model) dispatches to slide/quiz/interactive/PBL generators based on outline.type. generateSceneActions(content, outline, context, model) generates teacher speech + visual action sequences. createSceneWithActions() assembles final Scene.
418
+ 5. Scene builder: buildSceneFromOutline() converts generated content to PPTElement[]. uniquifyMediaElementIds() ensures globally unique IDs for media placeholders.
419
+ 6. Interactive post-processor: sanitizes HTML, injects CSS isolation, wraps in responsive container.
420
+ 7. Action parser: parseActionsFromStructuredOutput() extracts [{type:"action", name, params}, {type:"text", content}] from LLM JSON output.
421
+ 8. JSON repair: parseJsonResponse() handles malformed LLM JSON with bracket balancing, markdown fence stripping, partial parse recovery.
422
+ 9. Pipeline runner: createGenerationSession() + runGenerationPipeline() orchestrates the full flow with callbacks.
423
+ 10. API routes: /api/generate/scene-outlines-stream (SSE), /api/generate/scene-content (POST), /api/generate/scene-actions (POST), /api/generate/agent-profiles (POST)
424
+ ```
425
+
426
+ ### Phase 5: Slide Renderer & Canvas
427
+
428
+ ```
429
+ Me: Build the full slide renderer with PowerPoint-compatible elements and interactive canvas.
430
+
431
+ Do:
432
+ 1. Create Editor/Canvas with: viewport scaling (useViewportSize), drag-to-select (useMouseSelection), element selection (useSelectElement), element dragging (useDragElement), element scaling (useScaleElement with 8 resize handles), element rotation (useRotateElement), alignment lines (AlignmentLine), grid lines (GridLines), ruler (Ruler), drop support (useDrop)
433
+ 2. Create 10 element renderers:
434
+ - TextElement: ProseMirror editor with custom schema (paragraph, heading, bulletList, orderedList, hardBreak, marks: bold, italic, underline, strikethrough, color, backgroundColor, fontSize, fontFamily, textAlign, textIndent, lineHeight, superscript, subscript, link)
435
+ - ImageElement: clip paths (rect, ellipse, polygon), filters, flip, shadow, outline
436
+ - ShapeElement: 20+ SVG path formulas (roundRect, triangle, parallelogram, trapezoid, etc), gradient fills (linear/radial), pattern fills
437
+ - LineElement: cubic bezier curves, arrow markers, point dragging (useDragLineElement)
438
+ - ChartElement: ECharts integration (bar, line, pie, scatter, radar, area)
439
+ - TableElement: cell editing, merge, border styling
440
+ - LatexElement: KaTeX rendering with Temml fallback
441
+ - VideoElement: HTML5 video with poster, autoplay control
442
+ - CodeElement: Shiki syntax highlighting with 50+ language grammars
443
+ - AudioElement: audio player UI
444
+ 3. Create operate overlays: CommonElementOperate, ImageElementOperate, ShapeElementOperate (keypoint drag for path shapes), LineElementOperate (endpoint drag), TableElementOperate, TextElementOperate, MultiSelectOperate
445
+ 4. Create ThumbnailSlide for sidebar scene list (scaled-down readonly render)
446
+ 5. Create ThumbnailInteractive for interactive widget previews
447
+ 6. Create ScreenElement and ScreenCanvas for presentation mode
448
+ 7. Create ViewportBackground (slide background with solid/gradient/image fills)
449
+ 8. Create element hooks: useElementFill, useElementFlip, useElementOutline, useElementShadow
450
+ 9. Create canvas operations hook: useCanvasOperations with element CRUD, alignment, distribution, z-order, grouping
451
+ ```
452
+
453
+ ### Phase 6: Multi-Agent Orchestration
454
+
455
+ ```
456
+ Me: Build the LangGraph-based multi-agent orchestration system.
457
+
458
+ Do:
459
+ 1. Create OrchestratorState (LangGraph Annotation.Root): messages, storeState, availableAgentIds, maxTurns, languageModel, thinkingConfig, discussionContext, triggerAgentId, userProfile, agentConfigOverrides, turnSummaries[], whiteboardActions[], nextAgentId, isComplete, generatedChunks
460
+ 2. Create director node: LLM-based multi-agent decision (who speaks next, what to do). Code fast-paths for turn 0 (trigger agent) and turn limits. Single-agent mode: pure code logic, no LLM call.
461
+ 3. Create agent_generate node: resolves AgentConfig, builds structured prompt via buildStructuredPrompt(), streams LLM response, parses structured chunks [{type, name/content}], emits StatelessEvent via config.writer()
462
+ 4. Create StateGraph: START → director → (end→END | next→agent_generate→director loop)
463
+ 5. Create buildStructuredPrompt(): combines role guidelines, persona, state context (current slide elements), whiteboard ledger (spatial layout of all whiteboard elements), peer context (other agents' recent actions), available action descriptions, format examples
464
+ 6. Create summarizers: conversation-summary (compress old messages), message-converter (UIMessage → OpenAI format), state-context (current slide description), whiteboard-ledger (virtual whiteboard spatial state), whiteboard-conflicts (detect conflicting draws), peer-context (recent agent actions)
465
+ 7. Create director-prompt: buildDirectorPrompt() with agent profiles, conversation history, available tools. parseDirectorDecision() extracts {nextAgentId, reason, isComplete}
466
+ 8. Create tool-schemas: getEffectiveActions(role) returns allowed action schemas. getActionDescriptions() generates human-readable action docs.
467
+ 9. Create AISdkLangGraphAdapter: bridges Vercel AI SDK LanguageModel to LangGraph's BaseChatModel interface
468
+ 10. Create statelessGenerate(): entry point called by /api/chat, invokes graph.stream(), yields StatelessEvent SSE chunks
469
+ ```
470
+
471
+ ### Phase 7: Playback Engine & Roundtable
472
+
473
+ ```
474
+ Me: Build the PlaybackEngine state machine and Roundtable UI.
475
+
476
+ Do:
477
+ 1. Create PlaybackEngine class: state machine (idle/playing/paused/live), consumes Scene.actions[] via ActionEngine, manages scene transitions, handles discussion triggers, speed control (1x/1.25x/1.5x/2x)
478
+ 2. Create ActionEngine: processes action queue — spotlight (dim other elements, highlight target), laser (red pointer effect), speech (fetch TTS → AudioPlayer → wait for completion), wb_open/close (toggle whiteboard overlay), wb_draw_* (add elements to whiteboard), wb_delete/clear, discussion (pause playback, switch to live mode), play_video, widget actions
479
+ 3. Create AudioPlayer: Web Audio API wrapper with play/pause/stop, speed adjustment, volume control, onEnd callback
480
+ 4. Create PlaybackEngine callbacks: onModeChange, onSceneChange, onActionStart/End, onSpeechStart/End, onDiscussionTrigger, onComplete
481
+ 5. Create computePlaybackView() — derives presentation state from engine: currentSpeech, speakingAgentId, audioState, progress
482
+ 6. Create Stage component (main container): integrates SceneSidebar + CanvasArea + Roundtable + ChatArea. Manages PlaybackEngine lifecycle, discussion flow (trigger → live chat → end → resume), TTS during discussions via useDiscussionTTS
483
+ 7. Create Roundtable component: voice waveform bars, agent avatar ring with speaking indicator, chat input with send button + voice recording, proactive discussion cards, slide navigation (prev/next), playback controls (play/pause/speed), presentation mode toggle, whiteboard toggle, thinking state display, end flash animation
484
+ 8. Create PresentationSpeechOverlay: full-screen speech text display during presentation mode
485
+ 9. Create SceneSidebar: scene thumbnail list with drag-to-reorder, generation progress indicators, failed outline retry, home navigation
486
+ 10. Create Header: back button, settings gear, theme switcher, language switcher, export dropdown (PPTX, classroom ZIP)
487
+ ```
488
+
489
+ ### Phase 8: Interactive Widgets & PBL
490
+
491
+ ```
492
+ Me: Build the interactive widget system and PBL mode.
493
+
494
+ Do:
495
+ 1. Create 5 widget content generators (each calls LLM with specialized prompts):
496
+ - simulation-content: generates HTML with variable sliders, canvas/SVG visualization, physics formulas
497
+ - diagram-content: generates React Flow JSON (nodes, edges, layout)
498
+ - code-content: generates executable code with output panel, language selector
499
+ - game-content: generates HTML5 game with scoring, levels, educational goals
500
+ - visualization3d-content: generates Three.js scene with camera controls, annotations
501
+ 2. Create InteractiveRenderer: sandboxed iframe loading widget HTML, postMessage bridge for teacher actions (highlight, setState, annotation, reveal)
502
+ 3. Create widget teacher action generation: widget-teacher-actions prompt generates action sequence for teacher to guide students through widget
503
+ 4. Create useWidgetIframeStore: register/unregister iframes, send setState/highlight/annotation/reveal messages
504
+ 5. Create PBL generation system:
505
+ - generatePBLContent() using Vercel AI SDK generateText with tools and stepCountIs stopWhen
506
+ - MCP tools: ModeMCP (set PBL mode), ProjectMCP (set project config), AgentMCP (create agent roles), IssueboardMCP (create issues with acceptance criteria)
507
+ - buildPBLSystemPrompt() with project topic, skills, language directive
508
+ 6. Create PBL renderer components:
509
+ - PBLRenderer: main container with role selection → workspace
510
+ - RoleSelection: choose student role from generated options
511
+ - ChatPanel: per-role chat with @mention routing to agents
512
+ - IssueboardPanel: kanban-style issue tracking
513
+ - Workspace: collaborative workspace area
514
+ - Guide: step-by-step project guide
515
+ 7. Create /api/pbl/chat endpoint: handles @mention routing, generates agent responses per role
516
+ ```
517
+
518
+ ### Phase 9: Media Generation Pipeline
519
+
520
+ ```
521
+ Me: Build the media generation pipeline for AI images and videos.
522
+
523
+ Do:
524
+ 1. Create MediaGenerationStore: tasks Map<elementId, {status, objectUrl, blob, error}>, enqueueTasks(), completeTask(), failTask(), restoreFromDB(), revokeObjectUrls()
525
+ 2. Create media orchestrator: generateMediaForOutlines() collects all media requests from outlines[].mediaGenerations, filters by enabled providers, processes serially (API concurrency limits)
526
+ 3. Create image provider adapters:
527
+ - Seedream (ByteDance): POST to ark.cn-beijing.volces.com with HMAC auth
528
+ - OpenAI Image: POST to /v1/images/generations
529
+ - Qwen Image: POST to dashscope with async task pattern
530
+ - Nano Banana: POST with banana.dev API
531
+ - MiniMax Image: POST to api.minimax.chat
532
+ - Grok Image: POST to api.x.ai
533
+ 4. Create video provider adapters (all async task pattern: submit → poll → download):
534
+ - Seedance (ByteDance): HMAC-signed requests, JWT token for kling
535
+ - Kling (Kuaishou): JWT auth, task polling
536
+ - Veo (Google DeepMind): OAuth, long-running operations
537
+ - MiniMax Video, Grok Video
538
+ 5. Each adapter: generate(config, options) → {url, blob}, testConnectivity(config) → boolean
539
+ 6. Create /api/generate/image and /api/generate/video endpoints
540
+ 7. Create /api/proxy-media endpoint for CORS proxy of remote media URLs
541
+ 8. Create /api/verify-image-provider and /api/verify-video-provider for credential testing
542
+ 9. Create MediaPopover UI: shows generation progress per media element, retry failed, preview generated media
543
+ ```
544
+
545
+ ### Phase 10: Audio System (TTS/ASR)
546
+
547
+ ```
548
+ Me: Build the TTS and ASR audio system with 8+ providers.
549
+
550
+ Do:
551
+ 1. Create TTS provider registry with configs: openai-tts (alloy/echo/fable/onyx/nova/shimmer), azure-tts (500+ voices from azure.json), glm-tts, qwen-tts (sambert voices), minimax-tts (3 models), doubao-tts, elevenlabs-tts, voxcpm (custom voice cloning), browser-tts (Web Speech API)
552
+ 2. Each TTS provider: id, name, requiresApiKey, defaultBaseUrl, icon, voices[], supportedFormats, speedRange
553
+ 3. Create generateTTS(config, text) router: dispatches to provider-specific functions, returns {audio: Uint8Array, format: string}
554
+ 4. Create ASR provider registry: openai-whisper, qwen-asr, browser-asr
555
+ 5. Create transcribeAudio(config, audioBlob) router
556
+ 6. Create voice resolver: getAvailableProvidersWithVoices(), maps agent voiceConfig to provider+voice
557
+ 7. Create VoxCPM integration: custom voice profiles, VLLM model support, voice cloning
558
+ 8. Create /api/generate/tts endpoint (single TTS generation)
559
+ 9. Create /api/transcription endpoint
560
+ 10. Create /api/azure-voices endpoint (Azure voice list)
561
+ 11. Create useAudioRecorder hook: MediaRecorder API, audio visualization, silence detection
562
+ 12. Create useBrowserTTS hook: Web Speech API fallback
563
+ 13. Create useDiscussionTTS hook: manages TTS lifecycle during live discussions, queues speech, handles interruption
564
+ 14. Create useTTSPreview hook: preview voice in settings
565
+ 15. Create SpeechButton component: toggle voice recording with waveform
566
+ 16. Create TTSConfigPopover: voice selector, speed slider, provider selector
567
+ ```
568
+
569
+ ### Phase 11: Chat System & Streaming
570
+
571
+ ```
572
+ Me: Build the chat system with SSE streaming and session management.
573
+
574
+ Do:
575
+ 1. Create StatelessChatRequest type: messages (UIMessage[]), storeState ({stage, scenes, currentSceneId, mode}), config ({agentIds, sessionType, maxTurns, triggerAgentId}), model, apiKey, baseUrl, providerType, userProfile, agentConfigs
576
+ 2. Create StatelessEvent type: {type: 'text'|'tool_call'|'error'|'done', agentId?, content?, toolName?, args?}
577
+ 3. Create /api/chat POST endpoint: validates request, resolves model, invokes statelessGenerate(), streams SSE events via ReadableStream + TextEncoder
578
+ 4. Create useChatSessions hook (53KB): manages multiple ChatSession instances per scene, sendMessage() → fetch SSE → parse events → update messages, handleInterrupt() → abort controller, auto-create QA session on first user message, create discussion session from proactive card, persist sessions to IndexedDB, restore on load
579
+ 5. Create StreamBuffer: accumulates text chunks from SSE, reveals words incrementally for natural TTS sync, tracks reveal progress (0-1) for auto-scroll
580
+ 6. Create ChatArea component: session tab list, message list with agent avatars + colors, inline action tags (spotlight/highlight buttons in messages), lecture notes view (extracted from speech actions), typing indicator
581
+ 7. Create ChatSession component: handles individual session rendering, message input, send/interrupt buttons
582
+ 8. Create ProactiveCard component: discussion invitation cards with topic, prompt, accept/skip buttons, animation
583
+ 9. Create InlineActionTag component: clickable action buttons within messages (triggers spotlight/insert on slide)
584
+ 10. Create LectureNotesView: extracts and displays all speech text from actions as structured notes
585
+ ```
586
+
587
+ ### Phase 12: Settings, i18n, Export, Polish
588
+
589
+ ```
590
+ Me: Build settings, internationalization, export system, and polish everything.
591
+
592
+ Do:
593
+ 1. Create SettingsDialog with tabbed sections: General (theme, language, access code), Model (provider selector, model selector with search, API key input, base URL, thinking config toggle), Audio (TTS provider/voice/speed, ASR provider, per-agent voice assignment), Image (provider, model, API key, aspect ratio), Video (provider, model, API key), PDF (provider, API key), Web Search (provider, API key), Agent (agent list with add/edit/delete, persona editor, action permissions, priority)
594
+ 2. Create ModelSelector: searchable dropdown with provider grouping, model capabilities badges (vision, tools, thinking), context window display
595
+ 3. Create AddProviderDialog: custom provider registration with name, base URL, API key, models
596
+ 4. Create ProviderConfigPanel: provider-specific settings form
597
+ 5. Create i18n translation files for all 5 locales (1500+ translation keys each): home, classroom, settings, generation, chat, quiz, whiteboard, export, agents, audio, errors, common
598
+ 6. Create LanguageSwitcher component: dropdown with locale labels + short codes
599
+ 7. Create PowerPoint export: useExportPPTX hook converts scenes to PPTX via pptxgenjs. Handles: text with rich formatting, images with clip paths, shapes with SVG paths, charts (ECharts → static image), tables, LaTeX (KaTeX → MathML → OMML), videos (poster image), code blocks (syntax-highlighted HTML)
600
+ 8. Create classroom ZIP export/import: useExportClassroom hook creates ZIP with manifest.json + scenes + audio + images + agents. useImportClassroom hook parses ZIP and restores to IndexedDB.
601
+ 9. Create HTML parser for PPTX: lexer → parser → format → stringify pipeline for converting ProseMirror HTML to PPTX rich text runs
602
+ 10. Create LaTeX to OMML converter chain: KaTeX → MathML → OMML (via mathml2omml package) for PowerPoint math equations
603
+ 11. Create AccessCodeGuard + AccessCodeModal: HMAC-signed token verification, cookie persistence
604
+ 12. Create ServerProvidersInit: fetches /api/server-providers on mount, merges into settings store
605
+ 13. Create UserProfile component: expandable pill with avatar picker (12 built-in + custom upload), nickname editor, bio textarea
606
+ 14. Create GeneratingProgress component: step-by-step progress during classroom generation
607
+ 15. Create OutlinesEditor: edit generated outlines before scene generation (reorder, delete, rename)
608
+ 16. Add all CSS animations, transitions, hover states, dark mode variants
609
+ 17. Verify: full build succeeds, all routes render, settings persist, i18n switches correctly
610
+ ```
611
+
612
+ ---
613
+
614
+ ## 🔧 Key Technical Decisions
615
+
616
+ | Decision | Rationale |
617
+ |----------|-----------|
618
+ | **Zustand over Redux** | Simpler API, better TypeScript support, no boilerplate, built-in persist middleware |
619
+ | **Dexie over raw IndexedDB** | Type-safe queries, promise-based API, versioned migrations, compound indexes |
620
+ | **Vercel AI SDK** | Unified streaming interface across 15+ providers, built-in tool calling, thinking support |
621
+ | **LangGraph for orchestration** | Stateful graph execution, conditional routing, streaming writer API, battle-tested |
622
+ | **ProseMirror over Slate/TipTap** | Lower-level control needed for PowerPoint-compatible rich text, custom schema/marks |
623
+ | **Vite over Next.js (conversion)** | Client-only SPA, no SSR needed, faster builds, simpler deployment |
624
+ | **File-based prompts** | Version-controllable, composable via snippets, conditional blocks for feature flags |
625
+ | **ActionEngine pattern** | Unified sync/async action execution, same types for live streaming and playback |
626
+ | **IndexedDB for everything** | Offline-first, large blob storage (audio/images), no server dependency for user data |
627
+ | **iframe sandbox for widgets** | Security isolation for LLM-generated HTML/JS, postMessage for controlled communication |