OpenMAIC-React / PROMPT.md
muthuk1's picture
Rebrand: OpenMAIC β†’ MultiMind Classroom β€” rename in all source, DB name, cookie name, zip extension, prompts, docs, skills
ed07c96 verified

🧠 PROMPT.md β€” Build MultiMind Classroom from Scratch

MeDo-Styled Prompts to recreate the full MultiMind Classroom AI Interactive Classroom application. One-shot (single mega-prompt) and Multi-shot (phased build) variants included.


πŸ“‹ Table of Contents

  1. Project Overview
  2. Architecture Blueprint
  3. One-Shot Mega Prompt
  4. Multi-Shot Phased Prompts
    • Phase 1: Foundation & Scaffold
    • Phase 2: Data Layer & State Management
    • Phase 3: AI Provider System
    • Phase 4: Generation Pipeline
    • Phase 5: Slide Renderer & Canvas
    • Phase 6: Multi-Agent Orchestration
    • Phase 7: Playback Engine & Roundtable
    • Phase 8: Interactive Widgets & PBL
    • Phase 9: Media Generation Pipeline
    • Phase 10: Audio System (TTS/ASR)
    • Phase 11: Chat System & Streaming
    • Phase 12: Settings, i18n, Export, Polish
  5. Key Technical Decisions

🎯 Project Overview

MultiMind Classroom is an open-source AI interactive classroom platform. Users upload a PDF, the system generates an immersive multi-agent learning experience with:

  • AI-generated slide presentations from PDF content
  • Multi-agent roundtable discussions (teacher, assistant, student agents)
  • Real-time TTS/ASR for voice-driven lectures
  • Interactive whiteboard with collaborative drawing
  • Quiz generation with auto-grading
  • Interactive widgets (simulations, diagrams, code editors, 3D visualizations, games)
  • Project-Based Learning (PBL) mode with MCP tool-calling agents
  • Media generation (AI images + videos embedded in slides)
  • PowerPoint export and classroom zip import/export
  • 5-language i18n (zh-CN, en-US, ja-JP, ru-RU, ar-SA)

Tech Stack

Layer Technology
Framework React 19 + Vite 6
Routing React Router DOM v7
State Zustand 5 (5 stores: stage, canvas, settings, snapshot, keyboard)
Storage Dexie (IndexedDB) β€” stages, scenes, audio, images, chat, outlines
UI shadcn/ui (Radix primitives) + Tailwind CSS v4 + Motion (Framer)
AI SDK Vercel AI SDK 6 + LangGraph (multi-agent director graph)
AI Providers OpenAI, Anthropic, Google Gemini, DeepSeek, Qwen, GLM, MiniMax, Ollama, OpenRouter, +10 more
TTS/ASR OpenAI, Azure, GLM, Qwen, MiniMax, Doubao, ElevenLabs, Browser native, VoxCPM
Image Gen Seedream, OpenAI Image, Qwen Image, Nano Banana, MiniMax, Grok
Video Gen Seedance, Kling, Veo, Sora, MiniMax, Grok
Rich Text ProseMirror (custom schema with marks for bold/italic/underline/color/align/indent/lists)
Charts ECharts 6
Diagrams @xyflow/react (React Flow)
Export pptxgenjs (PowerPoint), JSZip (classroom archives)
Math KaTeX + Temml + mathml2omml (for PPTX export)
Code Highlighting Shiki
PDF Parsing unpdf + MinerU cloud + custom providers

πŸ— Architecture Blueprint

Data Flow

PDF Upload β†’ Outline Generation (SSE stream) β†’ Scene Content Generation β†’ Action Generation
                                                        ↓
                                                  Media Generation (parallel)
                                                        ↓
                                              IndexedDB Storage (Dexie)
                                                        ↓
                                              Playback Engine (state machine)
                                                        ↓
                                    Roundtable UI ←→ Chat System ←→ Multi-Agent LangGraph

Store Architecture (Zustand)

Store Purpose Persistence
useStageStore Current stage, scenes, outlines, generation status IndexedDB
useCanvasStore Viewport, zoom, selected elements, editing state Memory
useSettingsStore All provider configs, model selection, UI prefs localStorage
useSnapshotStore Undo/redo history Memory
useKeyboardStore Keyboard shortcut state Memory
useMediaGenerationStore Image/video generation tasks and status IndexedDB
useWhiteboardHistoryStore Whiteboard undo/redo per scene Memory
useWidgetIframeStore Widget iframe communication state Memory
useUserProfileStore User nickname, avatar, bio localStorage
useAgentRegistry Agent configs (default + custom + generated) localStorage

Database Schema (Dexie/IndexedDB)

stages:        id, name, description, createdAt, updatedAt, languageDirective, style, agentIds
scenes:        id, stageId, type, title, order, content (JSON), actions (JSON), whiteboard (JSON)
audioFiles:    id (audioId), blob, duration, format, text, voice
imageStore:    id (storageId), blob, mimeType
chatSessions:  id, stageId, sceneId, type, status, messages, config
outlines:      id, stageId, outlines (JSON array)
mediaFiles:    id (stageId_elementId), blob, mimeType
generatedAgents: id (stageId), agents (AgentConfig[])

Prompt Template System

File-based with composition:

  • lib/prompts/templates/{promptId}/system.md + user.md
  • lib/prompts/snippets/{name}.md
  • Syntax: {{variable}}, {{snippet:name}}, {{#if flag}}...{{/if}}
  • 20+ prompt templates for outlines, slides, quizzes, actions, widgets, PBL, agents

Action System

Two categories executed by ActionEngine:

  • Fire-and-forget: spotlight, laser, play_video
  • Synchronous (wait for completion): speech, wb_open, wb_close, wb_draw_text, wb_draw_shape, wb_draw_chart, wb_draw_latex, wb_draw_table, wb_draw_line, wb_draw_code, wb_edit_code, wb_clear, wb_delete, discussion
  • Widget actions: widget_highlight, widget_setState, widget_annotation, widget_reveal

Multi-Agent Orchestration (LangGraph)

START β†’ director ──(end)──→ END
           β”‚
           └─(next)β†’ agent_generate ──→ director (loop)
  • Director: LLM-based for multi-agent, code-only for single-agent
  • Agents: teacher (full slide+whiteboard control), assistant (whiteboard), student (whiteboard, short responses)
  • Per-agent: persona prompt, allowed actions, TTS voice, avatar, color

Playback Engine State Machine

         start()                pause()
idle ──────────→ playing ──────────→ paused
  β–²                 β–²                   β”‚
  β”‚                 β”‚  resume()         β”‚
  β”‚                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  β”‚  handleEndDiscussion()
  β”‚                    confirmDiscussion()
  └──────────────── live ──────────→ paused

πŸš€ ONE-SHOT MEGA PROMPT

Use this single prompt to generate the entire application in one conversation.

Me: Build me "MultiMind Classroom" β€” an open-source AI interactive classroom platform.

Do: Create a React 19 + Vite 6 + TypeScript application with these EXACT specifications:

### CORE SETUP
- React Router DOM v7 with 4 lazy-loaded routes: / (HomePage), /classroom/:id (ClassroomPage), /generation-preview (GenerationPreviewPage), /eval/whiteboard (WhiteboardEvalPage)
- App.tsx wraps all routes in: BrowserRouter β†’ ThemeProvider β†’ I18nProvider β†’ ServerProvidersInit β†’ AccessCodeGuard β†’ Toaster
- Tailwind CSS v4 with PostCSS, shadcn/ui (Radix-based), oklch color system with light/dark mode via CSS variables
- Path alias @/* β†’ ./src/*

### STATE MANAGEMENT (Zustand 5)
Create 10 Zustand stores:
1. **useStageStore** β€” stage (name, description, languageDirective, style, agentIds), scenes[], currentSceneId, outlines[], generationStatus, chats[], mode ('autonomous'|'playback'). Actions: loadFromStorage, saveToStorage, addScene, setCurrentSceneId. Debounced IndexedDB persistence.
2. **useCanvasStore** β€” viewportSize, canvasScale, selectedElementIds, editingElementId, isDrawing, creatingElement, ctrlOrShiftKeyActive. All canvas interaction state.
3. **useSettingsStore** β€” providerId, modelId, providersConfig (unified JSON), thinkingConfigs, ttsProviderId, ttsVoice, ttsSpeed, asrProviderId, imageProviderId, videoProviderId, pdfProviderId, webSearchProviderId, playbackSpeed, sidebarCollapsed, chatAreaWidth, agentMode ('preset'|'auto'), selectedAgentIds. Persisted via zustand/persist to localStorage. Has fetchServerProviders() to merge server-configured providers.
4. **useSnapshotStore** β€” history stack for undo/redo
5. **useKeyboardStore** β€” keyboard shortcut state
6. **useMediaGenerationStore** β€” tasks map (elementId β†’ {status, objectUrl, error}), enqueueTasks, restoreFromDB
7. **useWhiteboardHistoryStore** β€” per-scene whiteboard undo/redo
8. **useWidgetIframeStore** β€” widget iframe postMessage communication
9. **useUserProfileStore** β€” nickname, avatar, bio. Persisted localStorage.
10. **useAgentRegistry** β€” agents map (id β†’ AgentConfig), addAgent, updateAgent, deleteAgent. 3 default agents: teacher (AI teacher), assistant (AIεŠ©ζ•™), student (ε₯½ε₯‡ε­¦η”Ÿ). Each has: id, name, role, persona, avatar, color, allowedActions, priority, voiceConfig, isDefault, isGenerated, boundStageId.

### DATABASE (Dexie/IndexedDB)
Database name 'multimind-db' with tables: stages, scenes, audioFiles, imageStore, chatSessions, outlines, mediaFiles, generatedAgents. Full CRUD operations. Stage storage utilities: listStages, deleteStageData, renameStage, getFirstSlideByStages.

### AI PROVIDER SYSTEM
- Unified provider registry (PROVIDERS) with 15+ providers: openai, anthropic, google, deepseek, qwen, kimi, glm, minimax, siliconflow, doubao, hunyuan, xiaomi, grok, openrouter, ollama
- Each provider: id, name, type ('openai'|'anthropic'|'google'|'openai-compatible'), defaultBaseUrl, requiresApiKey, icon, models[]
- Each model: id, name, contextWindow, outputWindow, capabilities (streaming, tools, vision, thinking)
- createLanguageModel() factory using @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google
- Thinking config system: toggleable, budgetAdjustable, defaultEnabled per model
- callLLM() and streamLLM() wrappers with thinking support

### GENERATION PIPELINE (Two-Stage)
**Stage 1 β€” Outline Generation:**
- POST /api/generate/scene-outlines-stream β†’ SSE stream
- Input: PDF text + images + user requirements + agents + language
- Output: SceneOutline[] with type ('slide'|'quiz'|'interactive'|'pbl'), title, notes, order, mediaGenerations[]
- Incremental JSON parsing from LLM stream
- Generation Preview page with step-by-step visualization (outline streaming β†’ agent profile generation β†’ scene content β†’ navigate to classroom)

**Stage 2 β€” Scene Generation (per outline):**
- generateSceneContent() β†’ POST /api/generate/scene-content β€” generates slides/quiz/interactive/pbl content
- generateSceneActions() β†’ POST /api/generate/scene-actions β€” generates teacher speech + visual actions for each scene
- createSceneWithActions() β€” assembles Scene object with elements, actions, whiteboard data
- Interactive post-processor: sanitizes HTML widgets, injects CSS isolation
- Action parser: extracts structured [{type, name, params}, {type: "text", content}] from LLM output

### SLIDE TYPE SYSTEM
Scene types with specific content structures:
- **slide**: PPTElement[] (text, image, shape, line, chart, table, latex, video, audio, code elements). Each element has: id, type, left, top, width, height, rotate, opacity, shadow, outline, fill, link, groupId, lock, name.
- **quiz**: QuizQuestion[] with type (single-choice, multiple-choice, fill-in-blank, true-false, short-answer), question text, options, correctAnswer, explanation
- **interactive**: WidgetConfig with type (simulation, diagram, code, game, visualization3d), HTML/iframe content, teacherActions[]
- **pbl**: PBLProjectConfig with roles, issues, workspaces

### SLIDE RENDERER
Full PowerPoint-compatible renderer in React:
- Editor/Canvas with viewport scaling, drag/drop, resize handles, rotation handles, alignment lines, grid, ruler
- Element types: TextElement (ProseMirror), ImageElement (clip masks, filters), ShapeElement (SVG paths, gradients, patterns), LineElement (cubic bezier, markers), ChartElement (ECharts), TableElement, LatexElement (KaTeX), VideoElement, CodeElement (Shiki)
- ThumbnailSlide for sidebar previews
- ScreenElement for presentation mode
- useScaleElement, useDragElement, useRotateElement, useSelectElement hooks

### MULTI-AGENT ORCHESTRATION (LangGraph)
- StateGraph with OrchestratorState annotation
- Director node: multi-agent LLM decision or single-agent code-only
- agent_generate node: builds structured prompt per agent, streams response with tool calls
- statelessGenerate(): single-pass generation from messages + storeState
- Prompt builder: role guidelines per agent type, whiteboard ledger context, peer context, state context
- SSE streaming: StatelessEvent chunks with {type: 'text'|'tool_call'|'done'|'error', agentId, content}

### PLAYBACK ENGINE
Class-based PlaybackEngine with state machine (idle β†’ playing β†’ paused, idle β†’ live β†’ paused):
- Consumes Scene.actions[] sequentially via ActionEngine
- ActionEngine: processes spotlight (highlight element), laser (pointer effect), speech (TTS), whiteboard actions, discussion triggers
- Speech TTS: fetches audio from /api/generate/tts, plays via AudioPlayer, shows speech overlay
- Discussion triggers: pause playback, switch to live mode, enable chat input
- Auto-resume generation for pending outlines on classroom load
- Speed control: 1x, 1.25x, 1.5x, 2x

### ROUNDTABLE UI (95KB component)
Main classroom interaction panel with:
- Voice waveform animation during speech
- Agent avatars with speaking indicator
- Chat input with voice recording (ASR)
- Proactive discussion cards
- Slide navigation controls
- Presentation mode (fullscreen)
- Whiteboard toggle
- Playback progress bar
- Speed selector
- Thinking state indicator

### CHAT SYSTEM
- ChatSession: id, type (qa|discussion|lecture), status, messages (UIMessage[]), config (agentIds, maxTurns, triggerAgentId)
- useChatSessions hook: manages sessions, sends messages via POST /api/chat SSE, handles interruption, persists to IndexedDB
- StreamBuffer: buffers SSE text chunks, reveals text word-by-word for natural speech feel
- Message metadata: senderName, senderAvatar, agentId, agentColor, actions (spotlight/highlight/insert)
- Chat area with session list, message bubbles, inline action tags, lecture notes view

### WHITEBOARD SYSTEM
- Collaborative whiteboard overlay on slides
- Elements: text, shapes (rect/circle/triangle), charts, LaTeX, tables, lines, code blocks
- ActionEngine executes wb_draw_*, wb_delete, wb_clear, wb_open, wb_close
- WhiteboardCanvas with drawing tools
- WhiteboardHistory for undo/redo
- Whiteboard conflicts summarizer for multi-agent coordination

### INTERACTIVE WIDGETS
5 widget types, each generates self-contained HTML rendered in sandboxed iframe:
1. **Simulation**: variable sliders, physics/math simulations, presets
2. **Diagram**: React Flow nodes/edges, decision trees, flowcharts
3. **Code**: executable code editor with output panel
4. **Game**: educational games (quizzes, puzzles)
5. **Visualization3D**: Three.js/WebGL 3D models
- Widget teacher actions: highlight, setState, annotation, reveal
- postMessage bridge for iframe ↔ parent communication

### PBL (Project-Based Learning)
- Agentic loop using Vercel AI SDK generateText + stopWhen
- MCP tools: ModeMCP, ProjectMCP, AgentMCP, IssueboardMCP
- Generates: project config, roles, issues, workspaces
- PBL renderer with: role selection, chat panel, issue board, workspace, guide

### MEDIA GENERATION
- Media orchestrator dispatches parallel API calls
- Image providers: Seedream (ByteDance), OpenAI Image, Qwen Image, Nano Banana, MiniMax, Grok
- Video providers: Seedance (ByteDance), Kling (Kuaishou), Veo (Google), MiniMax, Grok
- Async task pattern: submit β†’ poll β†’ download blob β†’ IndexedDB
- MediaGenerationStore tracks task status per elementId

### AUDIO SYSTEM
- TTS providers: OpenAI, Azure, GLM, Qwen, MiniMax, Doubao, ElevenLabs, VoxCPM, Browser native
- ASR providers: OpenAI Whisper, Qwen ASR, Browser native
- Voice resolver: maps agent voices across providers
- AudioPlayer: Web Audio API playback with speed control
- useAudioRecorder: MediaRecorder API for voice input
- useBrowserTTS/useDiscussionTTS: manages TTS lifecycle during discussions

### EXPORT SYSTEM
- PowerPoint export via pptxgenjs: converts PPTElement[] to PPTX with shapes, images, charts, tables, LaTeX→OMML
- Classroom ZIP export/import: stages + scenes + audio + images + agents as portable archive
- HTML parser for slide text β†’ PPTX rich text conversion
- SVG path parser for shape export
- LaTeX β†’ OMML converter via mathml2omml

### I18N SYSTEM
- i18next + react-i18next + resources-to-backend
- 5 locales: zh-CN, en-US, ja-JP, ru-RU, ar-SA
- Dynamic import: `import(\`./locales/${language}.json\`)`
- useI18n hook with locale detection from localStorage/navigator
- Language switcher component

### API ROUTES (27 endpoints)
All under /api/:
- /chat β€” SSE chat stream (multi-agent)
- /generate/scene-outlines-stream β€” SSE outline generation
- /generate/scene-content β€” scene content generation
- /generate/scene-actions β€” scene action generation
- /generate/agent-profiles β€” agent profile generation
- /generate/image β€” image generation
- /generate/video β€” video generation
- /generate/tts β€” single TTS audio generation
- /parse-pdf β€” PDF parsing
- /classroom β€” CRUD for server-stored classrooms
- /classroom-media/[classroomId]/[...path] β€” media file serving
- /generate-classroom β€” background classroom generation job
- /generate-classroom/[jobId] β€” job status polling
- /quiz-grade β€” LLM-based quiz grading
- /pbl/chat β€” PBL runtime chat
- /web-search β€” Tavily web search
- /proxy-media β€” CORS proxy for remote media
- /server-providers β€” server-configured provider list
- /verify-model, /verify-image-provider, /verify-video-provider, /verify-pdf-provider β€” credential verification
- /azure-voices β€” Azure TTS voice list
- /transcription β€” audio transcription
- /access-code/status, /access-code/verify β€” access code authentication
- /health β€” health check

### SECURITY
- Access code guard (HMAC-signed cookie)
- SSRF guard for server-side URL fetching
- Content Security Policy headers
- Input validation on all API routes

### TESTING
- Vitest for unit tests (29 test files)
- Playwright for E2E tests (4 test suites)
- Evaluation framework for whiteboard layout scoring and outline language detection

Build the complete application with all 950+ source files, full type safety, and production-ready error handling.

πŸ”„ MULTI-SHOT PHASED PROMPTS

Phase 1: Foundation & Scaffold

Me: Start building "MultiMind Classroom" β€” an AI interactive classroom. Set up the project foundation.

Do:
1. Initialize React 19 + Vite 6 + TypeScript project
2. Configure: Tailwind CSS v4 with PostCSS, path alias @/* β†’ ./src/*, oklch color system
3. Install core deps: react-router-dom, zustand, dexie, lucide-react, motion, sonner, clsx, tailwind-merge, class-variance-authority, nanoid, zod
4. Install shadcn/ui components: button, dialog, dropdown-menu, popover, tooltip, tabs, input, textarea, select, checkbox, switch, slider, scroll-area, command, alert-dialog, card, badge, carousel, separator, progress, label, hover-card, context-menu, collapsible, avatar, alert, field, input-group, button-group, combobox
5. Create App.tsx with BrowserRouter wrapping: ThemeProvider β†’ I18nProvider β†’ AccessCodeGuard β†’ lazy routes (/, /classroom/:id, /generation-preview, /eval/whiteboard) β†’ Toaster
6. Create ThemeProvider (light/dark/system, localStorage persist, document.documentElement.classList toggle)
7. Create I18nProvider with i18next + react-i18next + resources-to-backend, 5 locales (zh-CN, en-US, ja-JP, ru-RU, ar-SA), dynamic JSON imports, localStorage locale persistence
8. Create globals.css with full oklch color system (:root + .dark), CSS custom properties for all shadcn tokens, Tailwind @theme inline block, ProseMirror styles, animation keyframes (wave, shimmer, breathing-bar, interactive-mode-breathe)
9. Create createLogger utility (timestamp + level + tag formatting)
10. Verify: `vite build` succeeds with 0 errors

Phase 2: Data Layer & State Management

Me: Build the data layer and state management for MultiMind Classroom.

Do:
1. Create Dexie database 'multimind-db' with tables: stages (id, name, description, createdAt, updatedAt, languageDirective, style, currentSceneId, agentIds, interactiveMode), scenes (id, stageId, type, title, order, content, actions, whiteboard), audioFiles (id, blob, duration, format, text, voice), imageStore (id, blob, mimeType), chatSessions (id, stageId, sceneId, type, status, messages, config), outlines (id, stageId, outlines), mediaFiles (id, blob, mimeType), generatedAgents (id, agents)
2. Create stage-storage utilities: listStages() β†’ StageListItem[], deleteStageData(), renameStage(), getFirstSlideByStages() β†’ Record<string, Slide>
3. Create image-storage utilities: storePdfBlob(), loadPdfBlob(), storeImages(), loadImageMapping(), cleanupOldImages()
4. Create useStageStore (Zustand): stage, scenes[], currentSceneId, outlines[], chats[], mode, generationStatus, generationEpoch, failedOutlines[], toolbarState. Actions: setStage, addScene, updateScene, deleteScene, setCurrentSceneId, loadFromStorage (IndexedDB β†’ state), saveToStorage (debounced state β†’ IndexedDB), getCurrentScene()
5. Create useCanvasStore: viewportSize, canvasScale, selectedElementIds[], editingElementId, isDrawing, creatingElement, ctrlOrShiftKeyActive, showGridLines, showRuler, snapToGrid
6. Create useSettingsStore with zustand/persist: providerId, modelId, thinkingConfigs, providersConfig, ttsProviderId/Voice/Speed, asrProviderId/Language, imageProviderId, videoProviderId, pdfProviderId, webSearchProviderId, playbackSpeed, sidebarCollapsed, chatAreaWidth, chatAreaCollapsed, agentMode, selectedAgentIds, fetchServerProviders(), all setters. Validate provider/model on rehydration.
7. Create useSnapshotStore, useKeyboardStore, useMediaGenerationStore, useWhiteboardHistoryStore, useWidgetIframeStore, useUserProfileStore
8. Create useAgentRegistry with zustand/persist: agents map, 3 default agents (teacher: "AI teacher" with full slide+whiteboard actions priority 10, assistant: "AIεŠ©ζ•™" with whiteboard-only priority 5, student: "ε₯½ε₯‡ε­¦η”Ÿ" with whiteboard-only priority 3). Each agent: id, name, role, persona (detailed teaching style), avatar, color, allowedActions, priority, voiceConfig, isDefault
9. Define all TypeScript types in lib/types/: slides.ts (PPTElement union with 10 element types, Slide, SlideTheme, SlideBackground), action.ts (20+ action types), stage.ts (Stage, Scene, SceneType, StageMode), chat.ts (ChatSession, StatelessChatRequest/Event), generation.ts (SceneOutline, UserRequirements, PdfImage), provider.ts (ProviderId, ProviderConfig, ModelInfo, ThinkingConfig), widgets.ts (5 widget configs), settings.ts, roundtable.ts, web-search.ts, pdf.ts, edit.ts, export.ts

Phase 3: AI Provider System

Me: Build the AI provider system with 15+ LLM providers.

Do:
1. Create PROVIDERS registry with full configs for: openai (gpt-4o, gpt-5.5, o3-mini, o4-mini), anthropic (claude-4-sonnet, claude-3.7-sonnet), google (gemini-2.5-pro, gemini-2.5-flash), deepseek (deepseek-chat, deepseek-reasoner), qwen (qwen3-235b, qwen-max, qwen-plus), kimi (moonshot-v1-auto), glm (glm-4-plus, glm-z1-air), minimax (MiniMax-M1), siliconflow (meta-llama, Qwen, DeepSeek), doubao (doubao-pro, doubao-1.5-pro), hunyuan, xiaomi (MiMo-7B), grok (grok-3), openrouter (pass-through), ollama (local models)
2. Each provider: type ('openai'|'anthropic'|'google'|'openai-compatible'), defaultBaseUrl, requiresApiKey, icon path, models[] with contextWindow, outputWindow, capabilities (streaming, tools, vision, thinking config)
3. createLanguageModel(config) factory: routes to @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google based on provider type. OpenAI-compatible providers use createOpenAI with custom baseURL.
4. Thinking config system: ThinkingConfig = {mode: 'disabled'|'auto'|'manual', enabled, budget?}. Per-model thinking capability (toggleable, budgetAdjustable, defaultEnabled). getThinkingMode() and pickThinkingBudget() utilities.
5. callLLM(model, options) β€” single-shot generation with thinking support. streamLLM(model, options) β€” streaming with thinking.
6. Model metadata: applyModelMetadata() enriches model configs with catalog data. getCatalogThinkingCapability() returns thinking support level.
7. Server-side resolveModel() β€” resolves model string + API key + base URL into LanguageModel instance, handles server-configured providers from env vars.

Phase 4: Generation Pipeline

Me: Build the two-stage generation pipeline for creating classroom content from PDF.

Do:
1. Create prompt template system: lib/prompts/ with loader.ts (loadPrompt, buildPrompt, interpolateVariables, processSnippets, processConditionalBlocks), types.ts (PromptId, SnippetId). Templates in templates/{promptId}/system.md + user.md. Snippets in snippets/*.md. Syntax: {{variable}}, {{snippet:name}}, {{#if flag}}...{{/if}}.
2. Create 20+ prompt templates: requirements-to-outlines, interactive-outlines, slide-content, quiz-content, slide-actions, quiz-actions, interactive-actions, simulation-content, diagram-content, code-content, game-content, visualization3d-content, widget-teacher-actions, pbl-actions, pbl-design, agent-system (4 variants: base, wb-teacher, wb-assistant, wb-student), director, web-search-query-rewrite
3. Stage 1 β€” Outline Generator: generateSceneOutlinesFromRequirements() builds prompt from PDF content + user requirements + agent info + language directive. SSE API endpoint streams outlines as incremental JSON objects. Frontend GenerationPreview page shows step visualization.
4. Stage 2 β€” Scene Generator: generateSceneContent(outline, context, model) dispatches to slide/quiz/interactive/PBL generators based on outline.type. generateSceneActions(content, outline, context, model) generates teacher speech + visual action sequences. createSceneWithActions() assembles final Scene.
5. Scene builder: buildSceneFromOutline() converts generated content to PPTElement[]. uniquifyMediaElementIds() ensures globally unique IDs for media placeholders.
6. Interactive post-processor: sanitizes HTML, injects CSS isolation, wraps in responsive container.
7. Action parser: parseActionsFromStructuredOutput() extracts [{type:"action", name, params}, {type:"text", content}] from LLM JSON output.
8. JSON repair: parseJsonResponse() handles malformed LLM JSON with bracket balancing, markdown fence stripping, partial parse recovery.
9. Pipeline runner: createGenerationSession() + runGenerationPipeline() orchestrates the full flow with callbacks.
10. API routes: /api/generate/scene-outlines-stream (SSE), /api/generate/scene-content (POST), /api/generate/scene-actions (POST), /api/generate/agent-profiles (POST)

Phase 5: Slide Renderer & Canvas

Me: Build the full slide renderer with PowerPoint-compatible elements and interactive canvas.

Do:
1. Create Editor/Canvas with: viewport scaling (useViewportSize), drag-to-select (useMouseSelection), element selection (useSelectElement), element dragging (useDragElement), element scaling (useScaleElement with 8 resize handles), element rotation (useRotateElement), alignment lines (AlignmentLine), grid lines (GridLines), ruler (Ruler), drop support (useDrop)
2. Create 10 element renderers:
   - TextElement: ProseMirror editor with custom schema (paragraph, heading, bulletList, orderedList, hardBreak, marks: bold, italic, underline, strikethrough, color, backgroundColor, fontSize, fontFamily, textAlign, textIndent, lineHeight, superscript, subscript, link)
   - ImageElement: clip paths (rect, ellipse, polygon), filters, flip, shadow, outline
   - ShapeElement: 20+ SVG path formulas (roundRect, triangle, parallelogram, trapezoid, etc), gradient fills (linear/radial), pattern fills
   - LineElement: cubic bezier curves, arrow markers, point dragging (useDragLineElement)
   - ChartElement: ECharts integration (bar, line, pie, scatter, radar, area)
   - TableElement: cell editing, merge, border styling
   - LatexElement: KaTeX rendering with Temml fallback
   - VideoElement: HTML5 video with poster, autoplay control
   - CodeElement: Shiki syntax highlighting with 50+ language grammars
   - AudioElement: audio player UI
3. Create operate overlays: CommonElementOperate, ImageElementOperate, ShapeElementOperate (keypoint drag for path shapes), LineElementOperate (endpoint drag), TableElementOperate, TextElementOperate, MultiSelectOperate
4. Create ThumbnailSlide for sidebar scene list (scaled-down readonly render)
5. Create ThumbnailInteractive for interactive widget previews
6. Create ScreenElement and ScreenCanvas for presentation mode
7. Create ViewportBackground (slide background with solid/gradient/image fills)
8. Create element hooks: useElementFill, useElementFlip, useElementOutline, useElementShadow
9. Create canvas operations hook: useCanvasOperations with element CRUD, alignment, distribution, z-order, grouping

Phase 6: Multi-Agent Orchestration

Me: Build the LangGraph-based multi-agent orchestration system.

Do:
1. Create OrchestratorState (LangGraph Annotation.Root): messages, storeState, availableAgentIds, maxTurns, languageModel, thinkingConfig, discussionContext, triggerAgentId, userProfile, agentConfigOverrides, turnSummaries[], whiteboardActions[], nextAgentId, isComplete, generatedChunks
2. Create director node: LLM-based multi-agent decision (who speaks next, what to do). Code fast-paths for turn 0 (trigger agent) and turn limits. Single-agent mode: pure code logic, no LLM call.
3. Create agent_generate node: resolves AgentConfig, builds structured prompt via buildStructuredPrompt(), streams LLM response, parses structured chunks [{type, name/content}], emits StatelessEvent via config.writer()
4. Create StateGraph: START → director → (end→END | next→agent_generate→director loop)
5. Create buildStructuredPrompt(): combines role guidelines, persona, state context (current slide elements), whiteboard ledger (spatial layout of all whiteboard elements), peer context (other agents' recent actions), available action descriptions, format examples
6. Create summarizers: conversation-summary (compress old messages), message-converter (UIMessage β†’ OpenAI format), state-context (current slide description), whiteboard-ledger (virtual whiteboard spatial state), whiteboard-conflicts (detect conflicting draws), peer-context (recent agent actions)
7. Create director-prompt: buildDirectorPrompt() with agent profiles, conversation history, available tools. parseDirectorDecision() extracts {nextAgentId, reason, isComplete}
8. Create tool-schemas: getEffectiveActions(role) returns allowed action schemas. getActionDescriptions() generates human-readable action docs.
9. Create AISdkLangGraphAdapter: bridges Vercel AI SDK LanguageModel to LangGraph's BaseChatModel interface
10. Create statelessGenerate(): entry point called by /api/chat, invokes graph.stream(), yields StatelessEvent SSE chunks

Phase 7: Playback Engine & Roundtable

Me: Build the PlaybackEngine state machine and Roundtable UI.

Do:
1. Create PlaybackEngine class: state machine (idle/playing/paused/live), consumes Scene.actions[] via ActionEngine, manages scene transitions, handles discussion triggers, speed control (1x/1.25x/1.5x/2x)
2. Create ActionEngine: processes action queue β€” spotlight (dim other elements, highlight target), laser (red pointer effect), speech (fetch TTS β†’ AudioPlayer β†’ wait for completion), wb_open/close (toggle whiteboard overlay), wb_draw_* (add elements to whiteboard), wb_delete/clear, discussion (pause playback, switch to live mode), play_video, widget actions
3. Create AudioPlayer: Web Audio API wrapper with play/pause/stop, speed adjustment, volume control, onEnd callback
4. Create PlaybackEngine callbacks: onModeChange, onSceneChange, onActionStart/End, onSpeechStart/End, onDiscussionTrigger, onComplete
5. Create computePlaybackView() β€” derives presentation state from engine: currentSpeech, speakingAgentId, audioState, progress
6. Create Stage component (main container): integrates SceneSidebar + CanvasArea + Roundtable + ChatArea. Manages PlaybackEngine lifecycle, discussion flow (trigger β†’ live chat β†’ end β†’ resume), TTS during discussions via useDiscussionTTS
7. Create Roundtable component: voice waveform bars, agent avatar ring with speaking indicator, chat input with send button + voice recording, proactive discussion cards, slide navigation (prev/next), playback controls (play/pause/speed), presentation mode toggle, whiteboard toggle, thinking state display, end flash animation
8. Create PresentationSpeechOverlay: full-screen speech text display during presentation mode
9. Create SceneSidebar: scene thumbnail list with drag-to-reorder, generation progress indicators, failed outline retry, home navigation
10. Create Header: back button, settings gear, theme switcher, language switcher, export dropdown (PPTX, classroom ZIP)

Phase 8: Interactive Widgets & PBL

Me: Build the interactive widget system and PBL mode.

Do:
1. Create 5 widget content generators (each calls LLM with specialized prompts):
   - simulation-content: generates HTML with variable sliders, canvas/SVG visualization, physics formulas
   - diagram-content: generates React Flow JSON (nodes, edges, layout)
   - code-content: generates executable code with output panel, language selector
   - game-content: generates HTML5 game with scoring, levels, educational goals
   - visualization3d-content: generates Three.js scene with camera controls, annotations
2. Create InteractiveRenderer: sandboxed iframe loading widget HTML, postMessage bridge for teacher actions (highlight, setState, annotation, reveal)
3. Create widget teacher action generation: widget-teacher-actions prompt generates action sequence for teacher to guide students through widget
4. Create useWidgetIframeStore: register/unregister iframes, send setState/highlight/annotation/reveal messages
5. Create PBL generation system:
   - generatePBLContent() using Vercel AI SDK generateText with tools and stepCountIs stopWhen
   - MCP tools: ModeMCP (set PBL mode), ProjectMCP (set project config), AgentMCP (create agent roles), IssueboardMCP (create issues with acceptance criteria)
   - buildPBLSystemPrompt() with project topic, skills, language directive
6. Create PBL renderer components:
   - PBLRenderer: main container with role selection β†’ workspace
   - RoleSelection: choose student role from generated options
   - ChatPanel: per-role chat with @mention routing to agents
   - IssueboardPanel: kanban-style issue tracking
   - Workspace: collaborative workspace area
   - Guide: step-by-step project guide
7. Create /api/pbl/chat endpoint: handles @mention routing, generates agent responses per role

Phase 9: Media Generation Pipeline

Me: Build the media generation pipeline for AI images and videos.

Do:
1. Create MediaGenerationStore: tasks Map<elementId, {status, objectUrl, blob, error}>, enqueueTasks(), completeTask(), failTask(), restoreFromDB(), revokeObjectUrls()
2. Create media orchestrator: generateMediaForOutlines() collects all media requests from outlines[].mediaGenerations, filters by enabled providers, processes serially (API concurrency limits)
3. Create image provider adapters:
   - Seedream (ByteDance): POST to ark.cn-beijing.volces.com with HMAC auth
   - OpenAI Image: POST to /v1/images/generations
   - Qwen Image: POST to dashscope with async task pattern
   - Nano Banana: POST with banana.dev API
   - MiniMax Image: POST to api.minimax.chat
   - Grok Image: POST to api.x.ai
4. Create video provider adapters (all async task pattern: submit β†’ poll β†’ download):
   - Seedance (ByteDance): HMAC-signed requests, JWT token for kling
   - Kling (Kuaishou): JWT auth, task polling
   - Veo (Google DeepMind): OAuth, long-running operations
   - MiniMax Video, Grok Video
5. Each adapter: generate(config, options) β†’ {url, blob}, testConnectivity(config) β†’ boolean
6. Create /api/generate/image and /api/generate/video endpoints
7. Create /api/proxy-media endpoint for CORS proxy of remote media URLs
8. Create /api/verify-image-provider and /api/verify-video-provider for credential testing
9. Create MediaPopover UI: shows generation progress per media element, retry failed, preview generated media

Phase 10: Audio System (TTS/ASR)

Me: Build the TTS and ASR audio system with 8+ providers.

Do:
1. Create TTS provider registry with configs: openai-tts (alloy/echo/fable/onyx/nova/shimmer), azure-tts (500+ voices from azure.json), glm-tts, qwen-tts (sambert voices), minimax-tts (3 models), doubao-tts, elevenlabs-tts, voxcpm (custom voice cloning), browser-tts (Web Speech API)
2. Each TTS provider: id, name, requiresApiKey, defaultBaseUrl, icon, voices[], supportedFormats, speedRange
3. Create generateTTS(config, text) router: dispatches to provider-specific functions, returns {audio: Uint8Array, format: string}
4. Create ASR provider registry: openai-whisper, qwen-asr, browser-asr
5. Create transcribeAudio(config, audioBlob) router
6. Create voice resolver: getAvailableProvidersWithVoices(), maps agent voiceConfig to provider+voice
7. Create VoxCPM integration: custom voice profiles, VLLM model support, voice cloning
8. Create /api/generate/tts endpoint (single TTS generation)
9. Create /api/transcription endpoint
10. Create /api/azure-voices endpoint (Azure voice list)
11. Create useAudioRecorder hook: MediaRecorder API, audio visualization, silence detection
12. Create useBrowserTTS hook: Web Speech API fallback
13. Create useDiscussionTTS hook: manages TTS lifecycle during live discussions, queues speech, handles interruption
14. Create useTTSPreview hook: preview voice in settings
15. Create SpeechButton component: toggle voice recording with waveform
16. Create TTSConfigPopover: voice selector, speed slider, provider selector

Phase 11: Chat System & Streaming

Me: Build the chat system with SSE streaming and session management.

Do:
1. Create StatelessChatRequest type: messages (UIMessage[]), storeState ({stage, scenes, currentSceneId, mode}), config ({agentIds, sessionType, maxTurns, triggerAgentId}), model, apiKey, baseUrl, providerType, userProfile, agentConfigs
2. Create StatelessEvent type: {type: 'text'|'tool_call'|'error'|'done', agentId?, content?, toolName?, args?}
3. Create /api/chat POST endpoint: validates request, resolves model, invokes statelessGenerate(), streams SSE events via ReadableStream + TextEncoder
4. Create useChatSessions hook (53KB): manages multiple ChatSession instances per scene, sendMessage() β†’ fetch SSE β†’ parse events β†’ update messages, handleInterrupt() β†’ abort controller, auto-create QA session on first user message, create discussion session from proactive card, persist sessions to IndexedDB, restore on load
5. Create StreamBuffer: accumulates text chunks from SSE, reveals words incrementally for natural TTS sync, tracks reveal progress (0-1) for auto-scroll
6. Create ChatArea component: session tab list, message list with agent avatars + colors, inline action tags (spotlight/highlight buttons in messages), lecture notes view (extracted from speech actions), typing indicator
7. Create ChatSession component: handles individual session rendering, message input, send/interrupt buttons
8. Create ProactiveCard component: discussion invitation cards with topic, prompt, accept/skip buttons, animation
9. Create InlineActionTag component: clickable action buttons within messages (triggers spotlight/insert on slide)
10. Create LectureNotesView: extracts and displays all speech text from actions as structured notes

Phase 12: Settings, i18n, Export, Polish

Me: Build settings, internationalization, export system, and polish everything.

Do:
1. Create SettingsDialog with tabbed sections: General (theme, language, access code), Model (provider selector, model selector with search, API key input, base URL, thinking config toggle), Audio (TTS provider/voice/speed, ASR provider, per-agent voice assignment), Image (provider, model, API key, aspect ratio), Video (provider, model, API key), PDF (provider, API key), Web Search (provider, API key), Agent (agent list with add/edit/delete, persona editor, action permissions, priority)
2. Create ModelSelector: searchable dropdown with provider grouping, model capabilities badges (vision, tools, thinking), context window display
3. Create AddProviderDialog: custom provider registration with name, base URL, API key, models
4. Create ProviderConfigPanel: provider-specific settings form
5. Create i18n translation files for all 5 locales (1500+ translation keys each): home, classroom, settings, generation, chat, quiz, whiteboard, export, agents, audio, errors, common
6. Create LanguageSwitcher component: dropdown with locale labels + short codes
7. Create PowerPoint export: useExportPPTX hook converts scenes to PPTX via pptxgenjs. Handles: text with rich formatting, images with clip paths, shapes with SVG paths, charts (ECharts β†’ static image), tables, LaTeX (KaTeX β†’ MathML β†’ OMML), videos (poster image), code blocks (syntax-highlighted HTML)
8. Create classroom ZIP export/import: useExportClassroom hook creates ZIP with manifest.json + scenes + audio + images + agents. useImportClassroom hook parses ZIP and restores to IndexedDB.
9. Create HTML parser for PPTX: lexer β†’ parser β†’ format β†’ stringify pipeline for converting ProseMirror HTML to PPTX rich text runs
10. Create LaTeX to OMML converter chain: KaTeX β†’ MathML β†’ OMML (via mathml2omml package) for PowerPoint math equations
11. Create AccessCodeGuard + AccessCodeModal: HMAC-signed token verification, cookie persistence
12. Create ServerProvidersInit: fetches /api/server-providers on mount, merges into settings store
13. Create UserProfile component: expandable pill with avatar picker (12 built-in + custom upload), nickname editor, bio textarea
14. Create GeneratingProgress component: step-by-step progress during classroom generation
15. Create OutlinesEditor: edit generated outlines before scene generation (reorder, delete, rename)
16. Add all CSS animations, transitions, hover states, dark mode variants
17. Verify: full build succeeds, all routes render, settings persist, i18n switches correctly

πŸ”§ Key Technical Decisions

Decision Rationale
Zustand over Redux Simpler API, better TypeScript support, no boilerplate, built-in persist middleware
Dexie over raw IndexedDB Type-safe queries, promise-based API, versioned migrations, compound indexes
Vercel AI SDK Unified streaming interface across 15+ providers, built-in tool calling, thinking support
LangGraph for orchestration Stateful graph execution, conditional routing, streaming writer API, battle-tested
ProseMirror over Slate/TipTap Lower-level control needed for PowerPoint-compatible rich text, custom schema/marks
Vite over Next.js (conversion) Client-only SPA, no SSR needed, faster builds, simpler deployment
File-based prompts Version-controllable, composable via snippets, conditional blocks for feature flags
ActionEngine pattern Unified sync/async action execution, same types for live streaming and playback
IndexedDB for everything Offline-first, large blob storage (audio/images), no server dependency for user data
iframe sandbox for widgets Security isolation for LLM-generated HTML/JS, postMessage for controlled communication

πŸ–Ό Frontend β€” Full UI Build Prompt

F1: Page Layouts & Routing

4 pages, all lazy-loaded via React.lazy() + <Suspense>:

Route Component Layout
/ HomePage (890 lines) Full-screen gradient bg (from-slate-50 to-slate-100). Top-right floating toolbar pill (glass morphism: bg-white/60 backdrop-blur-md border rounded-full) with 3-way theme toggle (Sun/Moon/Monitor icons cycling), LanguageSwitcher dropdown, Settings gear button. Center: animated logo + tagline. Main card: requirement textarea (auto-resize) + PDF upload button + InteractiveMode toggle (Atom icon with breathing glow) + GenerationToolbar below. Below main card: collapsible "Recent Classrooms" grid (responsive 1β†’2β†’3β†’4 cols) with ThumbnailSlide previews, date badges, rename/delete with inline confirmation overlay. User profile expandable pill (top-left). Classroom import via hidden file input. Full search with real-time filtering.
/generation-preview GenerationPreviewPage (900 lines) Centered card with vertical step progress. Steps: PDF Analysis (scanning laser animation), Web Search (globe + card stack), Outline Generation (streaming outline cards with type icons), Agent Generation (reveals agents with AgentRevealModal), Scene Content (slide assembly animation), Actions (action sequence visualization). Error state with retry. Auto-navigates to /classroom/:id on completion.
/classroom/:id ClassroomPage (180 lines) Loads classroom from IndexedDB (or server fallback), restores agents, auto-resumes pending generation. Full-screen flex layout: SceneSidebar (left, collapsible) + center column (Header + CanvasArea with Whiteboard overlay) + ChatArea (right, resizable width, collapsible) + Roundtable (bottom overlay).
/eval/whiteboard WhiteboardEvalPage (60 lines) Debug tool. Bootstraps synthetic stage/scene, renders ScreenElement for whiteboard layout evaluation.

F2: Component Hierarchy (203 files)

Top-10 largest components by complexity:

Component Lines Responsibility
Roundtable 2094 Main classroom interaction: 12-bar voice waveform, agent avatar ring with speaking indicator, chat input with Send/Mic toggle, ProactiveCard for discussion invitations, slide nav, playback controls (Play/Pause + speed 1Γ—β†’2Γ—), presentation mode, whiteboard toggle, volume slider with mute, thinking state, end-of-discussion flash, PresentationSpeechOverlay
useChatSessions 1525 Hook managing all chat state: session CRUD, SSE streaming (fetch β†’ ReadableStream β†’ TextDecoder β†’ JSON.parse per line), abort controller for interruption, tool call execution, IndexedDB persistence
Stage 1271 Master orchestrator: creates PlaybackEngine, wires all callbacks, manages ActionEngine + AudioPlayer lifecycles, computes playbackView, handles discussion flow, connects useDiscussionTTS
PromptInput 1267 Rich text input with @mention support, file attachments, voice recording integration, model selector inline
TTSSettings 1264 Full TTS configuration: provider list with enable/disable toggles, voice preview with play button, speed slider, custom model CRUD, VoxCPM configuration
SettingsDialog 1143 Tabbed dialog (10 tabs): providers, image, video, tts, asr, pdf, web-search, general, agents. Left sidebar navigation with icons. Provider list column pattern.
AgentBar 997 Agent selection/configuration for generation: horizontal scrollable agent cards with checkbox, voice config popover per agent, shuffle random selection
QuizView 985 Full quiz interface: 5 question types, 4 phases (not_started β†’ answering β†’ grading β†’ reviewing), score pie chart, per-question feedback, retry, draft persistence
GenerationToolbar 893 Toolbar: model selector, PDF upload, PDF provider selector, web search toggle, thinking config, media settings popover
AudioSettings 799 TTS + ASR combined settings: provider cards with logo, API key inputs, voice selector with preview

Full directory structure with responsibilities (all under components/):

access-code-guard.tsx      β€” Fetches /api/access-code/status, shows modal if auth needed
access-code-modal.tsx      β€” Animated overlay: shield icon, input, submit, success checkmark
agent/
  agent-avatar.tsx         — Avatar: URL→AvatarImage or emoji→AvatarFallback, 3 sizes (sm/md/lg), color ring
  agent-bar.tsx            β€” Agent selection bar with per-agent voice config popover
  agent-config-panel.tsx   β€” Edit agent: name, role, persona textarea, color picker, priority slider
  agent-reveal-modal.tsx   β€” Staggered card flip animation revealing generated agents with role icons and sparkle particles
ai-elements/               β€” 19 Vercel AI SDK UI components (message, prompt-input, code-block, reasoning, sources, etc.)
audio/
  speech-button.tsx        β€” Mic toggle with waveform bars animation, long-press to record
  tts-config-popover.tsx   β€” Voice + speed selector popover
canvas/
  canvas-area.tsx          β€” Main slide display: SceneRenderer + Whiteboard overlay + play hint + CanvasToolbar
  canvas-toolbar.tsx       β€” Bottom toolbar: sidebar toggle, slide nav, play/pause, volume, speed, whiteboard, presentation, chat toggle
chat/
  chat-area.tsx            β€” Right panel: session tabs, message list, lecture notes toggle
  chat-session.tsx         β€” Individual session: messages with agent avatars, input, send/stop buttons
  inline-action-tag.tsx    β€” Clickable action buttons in messages (spotlight/highlight)
  lecture-notes-view.tsx   β€” Extracted speech texts as structured notes
  proactive-card.tsx       β€” Discussion invitation: topic text, accept/skip, animated border gradient
  session-list.tsx         β€” Horizontal tab bar for QA/discussion/lecture sessions
  use-chat-sessions.ts     β€” Master hook for all chat state + SSE streaming
generation/
  generating-progress.tsx  β€” Step progress with completion checkmarks
  generation-toolbar.tsx   β€” Model + PDF + search + thinking + media toolbar
  media-popover.tsx        β€” Image/video provider/model/API key configuration popover
  outlines-editor.tsx      β€” Edit outlines: drag reorder, delete, rename, type badges
header.tsx                 β€” Top bar: back arrow, title, settings, theme switcher, language, export menu
language-switcher.tsx      β€” Dropdown: 5 locales with native labels + short codes
roundtable/                β€” index.tsx (2094), presentation-speech-overlay.tsx (498), audio-indicator.tsx, constants.ts
scene-renderers/           β€” classroom-complete.tsx, interactive-renderer.tsx, pbl-renderer.tsx, pbl/ (6 files), quiz-renderer.tsx, quiz-view.tsx
server-providers-init.tsx  β€” Side-effect: fetches server providers on mount
settings/                  β€” 17 files total (see F8 Phase 6 for details)
slide-renderer/
  Editor/Canvas/           β€” index.tsx (415), 5 canvas sub-components, Operate/ (7 files), hooks/ (11 hooks)
  Editor/                  β€” HighlightOverlay, LaserOverlay, ScreenCanvas, ScreenElement, SpotlightOverlay, ZoomWrapper
  components/element/      β€” 10 element types (Text, Image, Shape, Line, Chart, Table, Latex, Video, Code) + hooks/
  components/ThumbnailSlide/, ThumbnailInteractive/
stage.tsx (1271)           β€” Master classroom orchestrator
stage/scene-renderer.tsx   β€” Routes scene.type β†’ Canvas/QuizRenderer/InteractiveRenderer/PBLRenderer
stage/scene-sidebar.tsx    β€” Left sidebar: home, thumbnail list, generation progress, failed retry
ui/                        β€” 32 shadcn/ui primitives (see F3)
user-profile.tsx           β€” Expandable pill: avatar picker, name editor, bio textarea
whiteboard/                β€” index.tsx (container), whiteboard-canvas.tsx (445), whiteboard-history.tsx

F3: shadcn/ui Component Library (32 primitives)

All in components/ui/, built on Radix primitives + CVA:

alert-dialog (184), alert (73), avatar (96), avatar-display (29), badge (45), button (67, variants: default/destructive/outline/secondary/ghost/link, sizes: default/sm/lg/icon), button-group (78), card (92), carousel (231, embla-carousel-react), checkbox (28), collapsible (21), combobox (275, cmdk + popover), command (180, cmdk), context-menu (239), dialog (142), dropdown-menu (242), field (224, @base-ui/react), hover-card (38), input (19), input-group (144), label (21), popover (31), progress (31), scroll-area (55), select (184), separator (28), slider (25), sonner (45, uses custom useTheme), switch (29), tabs (80), textarea (18), tooltip (57)

F4: Hooks & Contexts (15 hooks, 2 contexts)

Hook Lines Purpose
useCanvasOperations 587 Element CRUD, alignment, distribution, z-order, group/ungroup, clipboard, delete, select all
useSceneGenerator 576 Orchestrates scene generation: generateRemaining(), retrySingleOutline(), stop()
useDiscussionTTS 343 TTS during live discussions: queue speech chunks, play sequentially, handle interruption
useAudioRecorder 325 MediaRecorder API: start/stop, audio visualization, silence detection, output Blob
useOrderElement 191 Z-order operations: bring to front, send to back, move forward/backward
useBrowserASR 155 Web Speech API recognition: start/stop, interim/final results, language
useBrowserTTS 150 Web Speech API synthesis: speak, cancel, voice selection, speed/pitch
useStreamingText 124 Word-by-word text reveal from StreamBuffer, progress tracking (0β†’1)
useDraftCache 95 Generic localStorage cache for form drafts with TTL
useTheme 71 Theme context: light/dark/system, resolvedTheme, media query listener, localStorage
useI18n 66 I18n context: locale, setLocale, t(), browser detection, localStorage
useSlideBackgroundStyle 54 Computes CSS background from SlideBackground type
useHistorySnapshot 41 Wraps snapshot store: push, undo, redo, canUndo/canRedo
useExportPPTX ~1000 PowerPoint export: PPTElement[] → pptxgenjs calls, HTML→rich text, SVG→polygon, LaTeX→OMML
useExportClassroom ~200 ZIP export: manifest.json + scenes + audio + images + agents

Contexts: SceneContext (211 lines β€” provides current scene data via SceneProvider, useSceneData(), useSceneSelector()), MediaStageContext (18 lines β€” provides stageId for IndexedDB keys)

F5: CSS System, Animations & Theming

globals.css (218 lines): @import 'tailwindcss' + 'tw-animate-css' + 'shadcn/tailwind.css'. @custom-variant dark. @theme inline with 30+ oklch color tokens. :root light theme (--primary: #722ed1 purple). .dark theme (--primary: #8b47ea). --radius: 0.625rem base.

6 Keyframe Animations: wave (audio bars), breathing-bar-1/2/3 (speech indicators), shimmer (skeleton loading), interactive-mode-breathe (button glow)

Motion Patterns: <motion.div initial/animate/exit> for enter/exit, <AnimatePresence> for conditional, spring physics (damping:20, stiffness:300), layout for reflows, staggerChildren:0.1, gesture (whileHover scale:1.02, whileTap scale:0.97)

F6: Configuration Objects (13 config files)

shapes.ts (1031 lines, 20+ SVG path formulas), symbol.ts (700, unicode categories), animation.ts (200, enter/exit animation defs), theme.ts (100, 10+ preset themes), hotkey.ts (130, keyboard shortcuts), image-clip.ts (170, clip path presets), latex.ts (200, symbol palette), chart.ts (70, chart type presets), font.ts (40, font families), lines.ts (40, line styles), element.ts (10, default dimensions), mime.ts (15, MIME mapping), storage.ts (3, localStorage keys)

F7: One-Shot Frontend Mega Prompt

Me: Build the complete frontend for MultiMind Classroom β€” every component, page, hook, and animation.

Do: Create 203 React components, 15 hooks, 2 contexts, 32 shadcn/ui primitives, and 13 config files:

PAGES: HomePage (890 lines, gradient bg, floating toolbar pill with theme/language/settings, centered logo animation, main card with auto-resize Textarea + InteractiveMode Atom toggle + GenerationToolbar + gradient Generate button, collapsible Recent Classrooms responsive grid with ThumbnailSlide previews + rename/delete + search, UserProfileCard expandable pill). GenerationPreviewPage (900 lines, vertical step list with 6 StepVisualizer animations: PdfScan laser, WebSearch globe+cards, StreamingOutlines stagger, AgentReveal flip-in cards, Content assembly, Actions sequence, AgentRevealModal with staggered rotateY flip + role icons + color borders). ClassroomPage (180 lines, loads from IndexedDB, renders Stage). WhiteboardEvalPage (60 lines, debug tool).

STAGE (1271 lines): PlaybackEngine lifecycle, ActionEngine+AudioPlayer wiring, discussion flow state machine, useDiscussionTTS integration, fullscreen container ref, AlertDialog confirmations. Layout: SceneSidebar (left) + Header+CanvasArea+Whiteboard (center) + ChatArea (right, resizable) + Roundtable (bottom overlay).

ROUNDTABLE (2094 lines): 12 motion.div voice waveform bars (peaks 14-27px, durations 0.53-0.78s), agent avatar ring with color+speaking pulse, Textarea chat input + Send ArrowUp + Mic toggle, ProactiveCard gradient border, slide ChevronLeft/Right nav, Play/Pause toggle, speed dropdown 1Γ—/1.25Γ—/1.5Γ—/2Γ—, Repeat restart, Volume slider+mute, PencilLine whiteboard toggle, Maximize2 presentation mode, Loader2 thinking state, PresentationSpeechOverlay word-by-word reveal.

CANVAS: Editor/Canvas (415 lines) with 11 hooks (viewport, select, drag, scale with 8 handles, rotate, mouse selection, line drag, keypoint move, create, drop, common). 10 element renderers (Text/ProseMirror with 10 marks, Image with clip+filters, Shape with SVG paths+gradients, Line with bezier+markers, Chart/ECharts, Table, Latex/KaTeX, Video, Code/Shiki). 7 Operate overlays. ThumbnailSlide. ScreenElement/ScreenCanvas. Spotlight/Highlight/Laser overlays.

WHITEBOARD: Container with slide-up animation, toolbar (Eraser+History+Close), WhiteboardCanvas (pan/zoom, AnimatedElement staggered entrance scale 0→1 delay index*0.06s, cascade exit reverse-order rotate+scale→0), WhiteboardHistory snapshot timeline.

CHAT: useChatSessions (1525 lines, SSE streaming, abort, IndexedDB persistence), ChatArea (session tabs + message list + lecture notes), ChatSession (agent avatars+colors, markdown, inline action tags), ProactiveCard (animated gradient border), LectureNotesView.

QUIZ: QuizView (985 lines, 5 question types: single-choice radio, multiple-choice checkbox, fill-in-blank input, true-false toggle, short-answer textarea+voice. 4 phases: not_started→answering→grading→reviewing. Score pie chart, feedback accordion, draft persistence).

SETTINGS: SettingsDialog (1143 lines, 10 tabs with left nav icons, ProviderListColumn pattern), 17 sub-files. ModelSelector (Combobox with search, provider grouping, capability badges Eye/Wrench/Brain). AgentBar (997 lines, scrollable cards, voice config hierarchy popover). TTSSettings (1264 lines), AudioSettings (799 lines), ASRSettings (559 lines).

SCENE RENDERERS: ClassroomComplete (confetti 7 colors, scene type breakdown, score summary), InteractiveRenderer (iframe sandbox + postMessage), PBLRenderer + 6 sub-components (RoleSelection, ChatPanel, IssueboardPanel, Workspace, Guide).

UI PRIMITIVES: 32 shadcn/ui components on Radix + CVA. Button variants (default/destructive/outline/secondary/ghost/link, sizes default/sm/lg/icon). Full dark mode. cn() utility everywhere.

All components use: cn() for conditional classes, useI18n() for all text, motion/react for animations, lucide-react for icons, readonly props, sonner for toasts, controlled state (useState).

F8: Multi-Shot Frontend Build (6 phases)

F-Phase 1: UI Primitives & Layout Shell

Me: Build the UI foundation β€” shadcn/ui components, layout shell, and page routing.

Do:
1. Create all 32 shadcn/ui components in components/ui/ with Radix primitives, CVA variants, cn() utility
2. Create globals.css: @import tailwindcss + tw-animate-css + shadcn/tailwind.css. @custom-variant dark. @theme inline with 30+ oklch tokens. :root light (--primary:#722ed1). .dark (--primary:#8b47ea). 6 keyframe animations. scrollbar-hide utility. ProseMirror styles.
3. Create App.tsx: BrowserRouter β†’ ThemeProvider β†’ I18nProvider β†’ ServerProvidersInit β†’ AccessCodeGuard β†’ Suspense β†’ Routes β†’ Toaster
4. Create page shells for all 4 routes
5. Create Header: back arrow, settings gear, theme switcher (Sun/Moon/Monitor cycle), LanguageSwitcher, export dropdown
6. Create AccessCodeGuard + AccessCodeModal (animated overlay, shield icon, input, success animation)
7. Create UserProfileCard: collapsible pill, avatar grid (12 SVGs + upload), nickname edit, bio textarea, Motion expand/collapse
8. Create LanguageSwitcher: dropdown with 5 locales, click-outside close
9. Create cn() utility, createLogger(), all 13 config files (shapes.ts 1031 lines, animation.ts, theme.ts, hotkey.ts, image-clip.ts, latex.ts, chart.ts, font.ts, lines.ts, element.ts, mime.ts, storage.ts, symbol.ts)

F-Phase 2: HomePage & Generation Flow

Me: Build the HomePage and GenerationPreviewPage with all interactions.

Do:
1. HomePage (890 lines): gradient bg, fixed toolbar pill, centered logo animation, main card with Textarea + InteractiveMode toggle (Atom breathing) + GenerationToolbar + gradient Generate button
2. GenerationToolbar: inline model selector, PDF upload (Paperclip + badge), PDF provider Select, web search Globe toggle, thinking Brain popover, MediaPopover
3. Recent Classrooms: collapsible chevron, responsive grid, ClassroomCard (ThumbnailSlide, metadata badge, name tooltip+copy, rename, delete overlay confirmation)
4. Search: InputGroup with Search icon, real-time filter, AnimatePresence
5. GenerationPreviewPage (900 lines): vertical step list, 6 StepVisualizers (PdfScan laser, WebSearch globe+cards, StreamingOutlines stagger, AgentGeneration, Content, Actions)
6. AgentRevealModal: full-screen overlay, staggered flip-in (rotateY), role icons (πŸ‘¨β€πŸ«/πŸ“š/πŸŽ“), color borders, auto-continue
7. OutlinesEditor: drag-reorder, delete/rename, type badges
8. Wire flow: HomePage form β†’ sessionStorage β†’ GenerationPreviewPage SSE β†’ IndexedDB β†’ navigate /classroom/:id

F-Phase 3: Slide Renderer & Canvas System

Me: Build the full slide renderer with all 10 element types and interactive canvas.

Do:
1. Editor/Canvas (415 lines): viewport scaling, element rendering loop, mouse events
2. 11 canvas hooks: useViewportSize, useSelectElement, useDragElement, useScaleElement (8 resize handles), useRotateElement, useMouseSelection, useDragLineElement, useMoveShapeKeypoint, useInsertFromCreateSelection, useDrop, useCommonOperate
3. 10 element renderers: TextElement (ProseMirror with paragraph/heading/bulletList/orderedList + 10 marks: bold/italic/underline/strikethrough/forecolor/backcolor/fontsize/fontname/textAlign/lineHeight/subscript/superscript/link), ImageElement (clip-path + CSS filters + flip), ShapeElement (SVG path formulas + gradient/pattern fills), LineElement (cubic bezier + arrow markers), ChartElement (ECharts), TableElement, LatexElement (KaTeX + Temml), VideoElement, CodeElement (Shiki 50+ grammars)
4. 7 Operate overlays per element type
5. ThumbnailSlide, ScreenElement/ScreenCanvas, ViewportBackground
6. HighlightOverlay, LaserOverlay, SpotlightOverlay

F-Phase 4: Classroom Layout

Me: Build the classroom layout β€” Stage orchestrator, sidebar, canvas area, chat.

Do:
1. Stage (1271 lines): PlaybackEngine lifecycle, discussion flow, useDiscussionTTS, sidebar/chat/whiteboard state
2. SceneSidebar (559 lines): home button, ThumbnailSlide list, active highlight, generation progress, failed retry, collapse animation
3. CanvasArea (274 lines): SceneRenderer routing, Whiteboard overlay, play hint, CanvasToolbar
4. CanvasToolbar (440 lines): sidebar toggle, slide nav, play/pause, speed dropdown, volume slider+mute, whiteboard toggle, chat toggle, presentation mode, stop discussion
5. ChatArea (340 lines): resizable right panel (drag handle min 280px max 500px), session tabs, message list, lecture notes
6. ChatSession (367 lines): agent avatar+color bubbles, markdown, inline action tags, input+send/stop
7. SessionList, ProactiveCard (gradient border animation), InlineActionTag, LectureNotesView

F-Phase 5: Roundtable, Whiteboard & Interactive

Me: Build Roundtable, Whiteboard, and interactive renderers.

Do:
1. Roundtable (2094 lines): 12-bar waveform, agent avatars with speaking pulse, chat input+Send+Mic, ProactiveCard, slide nav, playback controls, volume, speed, whiteboard toggle, presentation mode, thinking state
2. PresentationSpeechOverlay (498 lines): fullscreen speech word-by-word reveal, agent avatar, breathing bars
3. Whiteboard container: AnimatePresence slide-up, toolbar (Eraser/History/Close), element count badge
4. WhiteboardCanvas (445 lines): pan+zoom, AnimatedElement staggered entrance (scale 0β†’1, delay index*0.06s), cascade exit
5. WhiteboardHistory: snapshot timeline with thumbnails and restore
6. InteractiveRenderer: iframe sandbox + postMessage, QuizRenderer β†’ QuizView (985 lines, 5 question types, 4 phases)
7. ClassroomComplete: confetti (7 colors), scene type breakdown, score summary, encouragement
8. PBL components: PBLRenderer, RoleSelection, ChatPanel, IssueboardPanel, Workspace, Guide

F-Phase 6: Settings, Agents & Export

Me: Build Settings dialog, Agent system UI, and Export UI.

Do:
1. SettingsDialog (1143 lines): two-column (left nav 10 tabs + right content), ProviderListColumn pattern
2. ProviderConfigPanel (438 lines): API key, base URL, model list, test connection
3. ModelSelector (423 lines): Combobox search, provider grouping, capability badges (Eye/Wrench/Brain)
4. ModelEditDialog, AddProviderDialog, AddAudioProviderDialog
5. All settings sub-pages: GeneralSettings, ImageSettings, VideoSettings, TTSSettings (1264 lines), ASRSettings (559), AudioSettings (799), PDFSettings (303), WebSearchSettings
6. AgentBar (997 lines): scrollable cards, voice config popover with provider→model→voice hierarchy + search
7. AgentAvatar, AgentConfigPanel (persona editor, color picker, priority slider, action checkboxes)
8. Export UI in Header: PPTX (useExportPPTX), ZIP (useExportClassroom), Import (file input), loading toasts
9. MediaPopover (460 lines): image/video provider+model+key, enable toggles, aspect ratio
10. GeneratingProgress: step indicators with elapsed time