# 🧠 PROMPT.md β€” Build MultiMind Classroom from Scratch > **MeDo-Styled Prompts** to recreate the full MultiMind Classroom AI Interactive Classroom application. > One-shot (single mega-prompt) and Multi-shot (phased build) variants included. --- ## πŸ“‹ Table of Contents 1. [Project Overview](#-project-overview) 2. [Architecture Blueprint](#-architecture-blueprint) 3. [One-Shot Mega Prompt](#-one-shot-mega-prompt) 4. [Multi-Shot Phased Prompts](#-multi-shot-phased-prompts) - Phase 1: Foundation & Scaffold - Phase 2: Data Layer & State Management - Phase 3: AI Provider System - Phase 4: Generation Pipeline - Phase 5: Slide Renderer & Canvas - Phase 6: Multi-Agent Orchestration - Phase 7: Playback Engine & Roundtable - Phase 8: Interactive Widgets & PBL - Phase 9: Media Generation Pipeline - Phase 10: Audio System (TTS/ASR) - Phase 11: Chat System & Streaming - Phase 12: Settings, i18n, Export, Polish 5. [Key Technical Decisions](#-key-technical-decisions) --- ## 🎯 Project Overview **MultiMind Classroom** is an open-source AI interactive classroom platform. Users upload a PDF, the system generates an immersive multi-agent learning experience with: - **AI-generated slide presentations** from PDF content - **Multi-agent roundtable discussions** (teacher, assistant, student agents) - **Real-time TTS/ASR** for voice-driven lectures - **Interactive whiteboard** with collaborative drawing - **Quiz generation** with auto-grading - **Interactive widgets** (simulations, diagrams, code editors, 3D visualizations, games) - **Project-Based Learning (PBL)** mode with MCP tool-calling agents - **Media generation** (AI images + videos embedded in slides) - **PowerPoint export** and classroom zip import/export - **5-language i18n** (zh-CN, en-US, ja-JP, ru-RU, ar-SA) ### Tech Stack | Layer | Technology | |-------|-----------| | Framework | React 19 + Vite 6 | | Routing | React Router DOM v7 | | State | Zustand 5 (5 stores: stage, canvas, settings, snapshot, keyboard) | | Storage | Dexie (IndexedDB) β€” stages, scenes, audio, images, chat, outlines | | UI | shadcn/ui (Radix primitives) + Tailwind CSS v4 + Motion (Framer) | | AI SDK | Vercel AI SDK 6 + LangGraph (multi-agent director graph) | | AI Providers | OpenAI, Anthropic, Google Gemini, DeepSeek, Qwen, GLM, MiniMax, Ollama, OpenRouter, +10 more | | TTS/ASR | OpenAI, Azure, GLM, Qwen, MiniMax, Doubao, ElevenLabs, Browser native, VoxCPM | | Image Gen | Seedream, OpenAI Image, Qwen Image, Nano Banana, MiniMax, Grok | | Video Gen | Seedance, Kling, Veo, Sora, MiniMax, Grok | | Rich Text | ProseMirror (custom schema with marks for bold/italic/underline/color/align/indent/lists) | | Charts | ECharts 6 | | Diagrams | @xyflow/react (React Flow) | | Export | pptxgenjs (PowerPoint), JSZip (classroom archives) | | Math | KaTeX + Temml + mathml2omml (for PPTX export) | | Code Highlighting | Shiki | | PDF Parsing | unpdf + MinerU cloud + custom providers | --- ## πŸ— Architecture Blueprint ### Data Flow ``` PDF Upload β†’ Outline Generation (SSE stream) β†’ Scene Content Generation β†’ Action Generation ↓ Media Generation (parallel) ↓ IndexedDB Storage (Dexie) ↓ Playback Engine (state machine) ↓ Roundtable UI ←→ Chat System ←→ Multi-Agent LangGraph ``` ### Store Architecture (Zustand) | Store | Purpose | Persistence | |-------|---------|-------------| | `useStageStore` | Current stage, scenes, outlines, generation status | IndexedDB | | `useCanvasStore` | Viewport, zoom, selected elements, editing state | Memory | | `useSettingsStore` | All provider configs, model selection, UI prefs | localStorage | | `useSnapshotStore` | Undo/redo history | Memory | | `useKeyboardStore` | Keyboard shortcut state | Memory | | `useMediaGenerationStore` | Image/video generation tasks and status | IndexedDB | | `useWhiteboardHistoryStore` | Whiteboard undo/redo per scene | Memory | | `useWidgetIframeStore` | Widget iframe communication state | Memory | | `useUserProfileStore` | User nickname, avatar, bio | localStorage | | `useAgentRegistry` | Agent configs (default + custom + generated) | localStorage | ### Database Schema (Dexie/IndexedDB) ``` stages: id, name, description, createdAt, updatedAt, languageDirective, style, agentIds scenes: id, stageId, type, title, order, content (JSON), actions (JSON), whiteboard (JSON) audioFiles: id (audioId), blob, duration, format, text, voice imageStore: id (storageId), blob, mimeType chatSessions: id, stageId, sceneId, type, status, messages, config outlines: id, stageId, outlines (JSON array) mediaFiles: id (stageId_elementId), blob, mimeType generatedAgents: id (stageId), agents (AgentConfig[]) ``` ### Prompt Template System File-based with composition: - `lib/prompts/templates/{promptId}/system.md` + `user.md` - `lib/prompts/snippets/{name}.md` - Syntax: `{{variable}}`, `{{snippet:name}}`, `{{#if flag}}...{{/if}}` - 20+ prompt templates for outlines, slides, quizzes, actions, widgets, PBL, agents ### Action System Two categories executed by ActionEngine: - **Fire-and-forget**: `spotlight`, `laser`, `play_video` - **Synchronous** (wait for completion): `speech`, `wb_open`, `wb_close`, `wb_draw_text`, `wb_draw_shape`, `wb_draw_chart`, `wb_draw_latex`, `wb_draw_table`, `wb_draw_line`, `wb_draw_code`, `wb_edit_code`, `wb_clear`, `wb_delete`, `discussion` - **Widget actions**: `widget_highlight`, `widget_setState`, `widget_annotation`, `widget_reveal` ### Multi-Agent Orchestration (LangGraph) ``` START β†’ director ──(end)──→ END β”‚ └─(next)β†’ agent_generate ──→ director (loop) ``` - Director: LLM-based for multi-agent, code-only for single-agent - Agents: teacher (full slide+whiteboard control), assistant (whiteboard), student (whiteboard, short responses) - Per-agent: persona prompt, allowed actions, TTS voice, avatar, color ### Playback Engine State Machine ``` start() pause() idle ──────────→ playing ──────────→ paused β–² β–² β”‚ β”‚ β”‚ resume() β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ handleEndDiscussion() β”‚ confirmDiscussion() └──────────────── live ──────────→ paused ``` --- ## πŸš€ ONE-SHOT MEGA PROMPT > Use this single prompt to generate the entire application in one conversation. ``` Me: Build me "MultiMind Classroom" β€” an open-source AI interactive classroom platform. Do: Create a React 19 + Vite 6 + TypeScript application with these EXACT specifications: ### CORE SETUP - React Router DOM v7 with 4 lazy-loaded routes: / (HomePage), /classroom/:id (ClassroomPage), /generation-preview (GenerationPreviewPage), /eval/whiteboard (WhiteboardEvalPage) - App.tsx wraps all routes in: BrowserRouter β†’ ThemeProvider β†’ I18nProvider β†’ ServerProvidersInit β†’ AccessCodeGuard β†’ Toaster - Tailwind CSS v4 with PostCSS, shadcn/ui (Radix-based), oklch color system with light/dark mode via CSS variables - Path alias @/* β†’ ./src/* ### STATE MANAGEMENT (Zustand 5) Create 10 Zustand stores: 1. **useStageStore** β€” stage (name, description, languageDirective, style, agentIds), scenes[], currentSceneId, outlines[], generationStatus, chats[], mode ('autonomous'|'playback'). Actions: loadFromStorage, saveToStorage, addScene, setCurrentSceneId. Debounced IndexedDB persistence. 2. **useCanvasStore** β€” viewportSize, canvasScale, selectedElementIds, editingElementId, isDrawing, creatingElement, ctrlOrShiftKeyActive. All canvas interaction state. 3. **useSettingsStore** β€” providerId, modelId, providersConfig (unified JSON), thinkingConfigs, ttsProviderId, ttsVoice, ttsSpeed, asrProviderId, imageProviderId, videoProviderId, pdfProviderId, webSearchProviderId, playbackSpeed, sidebarCollapsed, chatAreaWidth, agentMode ('preset'|'auto'), selectedAgentIds. Persisted via zustand/persist to localStorage. Has fetchServerProviders() to merge server-configured providers. 4. **useSnapshotStore** β€” history stack for undo/redo 5. **useKeyboardStore** β€” keyboard shortcut state 6. **useMediaGenerationStore** β€” tasks map (elementId β†’ {status, objectUrl, error}), enqueueTasks, restoreFromDB 7. **useWhiteboardHistoryStore** β€” per-scene whiteboard undo/redo 8. **useWidgetIframeStore** β€” widget iframe postMessage communication 9. **useUserProfileStore** β€” nickname, avatar, bio. Persisted localStorage. 10. **useAgentRegistry** β€” agents map (id β†’ AgentConfig), addAgent, updateAgent, deleteAgent. 3 default agents: teacher (AI teacher), assistant (AIεŠ©ζ•™), student (ε₯½ε₯‡ε­¦η”Ÿ). Each has: id, name, role, persona, avatar, color, allowedActions, priority, voiceConfig, isDefault, isGenerated, boundStageId. ### DATABASE (Dexie/IndexedDB) Database name 'multimind-db' with tables: stages, scenes, audioFiles, imageStore, chatSessions, outlines, mediaFiles, generatedAgents. Full CRUD operations. Stage storage utilities: listStages, deleteStageData, renameStage, getFirstSlideByStages. ### AI PROVIDER SYSTEM - Unified provider registry (PROVIDERS) with 15+ providers: openai, anthropic, google, deepseek, qwen, kimi, glm, minimax, siliconflow, doubao, hunyuan, xiaomi, grok, openrouter, ollama - Each provider: id, name, type ('openai'|'anthropic'|'google'|'openai-compatible'), defaultBaseUrl, requiresApiKey, icon, models[] - Each model: id, name, contextWindow, outputWindow, capabilities (streaming, tools, vision, thinking) - createLanguageModel() factory using @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google - Thinking config system: toggleable, budgetAdjustable, defaultEnabled per model - callLLM() and streamLLM() wrappers with thinking support ### GENERATION PIPELINE (Two-Stage) **Stage 1 β€” Outline Generation:** - POST /api/generate/scene-outlines-stream β†’ SSE stream - Input: PDF text + images + user requirements + agents + language - Output: SceneOutline[] with type ('slide'|'quiz'|'interactive'|'pbl'), title, notes, order, mediaGenerations[] - Incremental JSON parsing from LLM stream - Generation Preview page with step-by-step visualization (outline streaming β†’ agent profile generation β†’ scene content β†’ navigate to classroom) **Stage 2 β€” Scene Generation (per outline):** - generateSceneContent() β†’ POST /api/generate/scene-content β€” generates slides/quiz/interactive/pbl content - generateSceneActions() β†’ POST /api/generate/scene-actions β€” generates teacher speech + visual actions for each scene - createSceneWithActions() β€” assembles Scene object with elements, actions, whiteboard data - Interactive post-processor: sanitizes HTML widgets, injects CSS isolation - Action parser: extracts structured [{type, name, params}, {type: "text", content}] from LLM output ### SLIDE TYPE SYSTEM Scene types with specific content structures: - **slide**: PPTElement[] (text, image, shape, line, chart, table, latex, video, audio, code elements). Each element has: id, type, left, top, width, height, rotate, opacity, shadow, outline, fill, link, groupId, lock, name. - **quiz**: QuizQuestion[] with type (single-choice, multiple-choice, fill-in-blank, true-false, short-answer), question text, options, correctAnswer, explanation - **interactive**: WidgetConfig with type (simulation, diagram, code, game, visualization3d), HTML/iframe content, teacherActions[] - **pbl**: PBLProjectConfig with roles, issues, workspaces ### SLIDE RENDERER Full PowerPoint-compatible renderer in React: - Editor/Canvas with viewport scaling, drag/drop, resize handles, rotation handles, alignment lines, grid, ruler - Element types: TextElement (ProseMirror), ImageElement (clip masks, filters), ShapeElement (SVG paths, gradients, patterns), LineElement (cubic bezier, markers), ChartElement (ECharts), TableElement, LatexElement (KaTeX), VideoElement, CodeElement (Shiki) - ThumbnailSlide for sidebar previews - ScreenElement for presentation mode - useScaleElement, useDragElement, useRotateElement, useSelectElement hooks ### MULTI-AGENT ORCHESTRATION (LangGraph) - StateGraph with OrchestratorState annotation - Director node: multi-agent LLM decision or single-agent code-only - agent_generate node: builds structured prompt per agent, streams response with tool calls - statelessGenerate(): single-pass generation from messages + storeState - Prompt builder: role guidelines per agent type, whiteboard ledger context, peer context, state context - SSE streaming: StatelessEvent chunks with {type: 'text'|'tool_call'|'done'|'error', agentId, content} ### PLAYBACK ENGINE Class-based PlaybackEngine with state machine (idle β†’ playing β†’ paused, idle β†’ live β†’ paused): - Consumes Scene.actions[] sequentially via ActionEngine - ActionEngine: processes spotlight (highlight element), laser (pointer effect), speech (TTS), whiteboard actions, discussion triggers - Speech TTS: fetches audio from /api/generate/tts, plays via AudioPlayer, shows speech overlay - Discussion triggers: pause playback, switch to live mode, enable chat input - Auto-resume generation for pending outlines on classroom load - Speed control: 1x, 1.25x, 1.5x, 2x ### ROUNDTABLE UI (95KB component) Main classroom interaction panel with: - Voice waveform animation during speech - Agent avatars with speaking indicator - Chat input with voice recording (ASR) - Proactive discussion cards - Slide navigation controls - Presentation mode (fullscreen) - Whiteboard toggle - Playback progress bar - Speed selector - Thinking state indicator ### CHAT SYSTEM - ChatSession: id, type (qa|discussion|lecture), status, messages (UIMessage[]), config (agentIds, maxTurns, triggerAgentId) - useChatSessions hook: manages sessions, sends messages via POST /api/chat SSE, handles interruption, persists to IndexedDB - StreamBuffer: buffers SSE text chunks, reveals text word-by-word for natural speech feel - Message metadata: senderName, senderAvatar, agentId, agentColor, actions (spotlight/highlight/insert) - Chat area with session list, message bubbles, inline action tags, lecture notes view ### WHITEBOARD SYSTEM - Collaborative whiteboard overlay on slides - Elements: text, shapes (rect/circle/triangle), charts, LaTeX, tables, lines, code blocks - ActionEngine executes wb_draw_*, wb_delete, wb_clear, wb_open, wb_close - WhiteboardCanvas with drawing tools - WhiteboardHistory for undo/redo - Whiteboard conflicts summarizer for multi-agent coordination ### INTERACTIVE WIDGETS 5 widget types, each generates self-contained HTML rendered in sandboxed iframe: 1. **Simulation**: variable sliders, physics/math simulations, presets 2. **Diagram**: React Flow nodes/edges, decision trees, flowcharts 3. **Code**: executable code editor with output panel 4. **Game**: educational games (quizzes, puzzles) 5. **Visualization3D**: Three.js/WebGL 3D models - Widget teacher actions: highlight, setState, annotation, reveal - postMessage bridge for iframe ↔ parent communication ### PBL (Project-Based Learning) - Agentic loop using Vercel AI SDK generateText + stopWhen - MCP tools: ModeMCP, ProjectMCP, AgentMCP, IssueboardMCP - Generates: project config, roles, issues, workspaces - PBL renderer with: role selection, chat panel, issue board, workspace, guide ### MEDIA GENERATION - Media orchestrator dispatches parallel API calls - Image providers: Seedream (ByteDance), OpenAI Image, Qwen Image, Nano Banana, MiniMax, Grok - Video providers: Seedance (ByteDance), Kling (Kuaishou), Veo (Google), MiniMax, Grok - Async task pattern: submit β†’ poll β†’ download blob β†’ IndexedDB - MediaGenerationStore tracks task status per elementId ### AUDIO SYSTEM - TTS providers: OpenAI, Azure, GLM, Qwen, MiniMax, Doubao, ElevenLabs, VoxCPM, Browser native - ASR providers: OpenAI Whisper, Qwen ASR, Browser native - Voice resolver: maps agent voices across providers - AudioPlayer: Web Audio API playback with speed control - useAudioRecorder: MediaRecorder API for voice input - useBrowserTTS/useDiscussionTTS: manages TTS lifecycle during discussions ### EXPORT SYSTEM - PowerPoint export via pptxgenjs: converts PPTElement[] to PPTX with shapes, images, charts, tables, LaTeXβ†’OMML - Classroom ZIP export/import: stages + scenes + audio + images + agents as portable archive - HTML parser for slide text β†’ PPTX rich text conversion - SVG path parser for shape export - LaTeX β†’ OMML converter via mathml2omml ### I18N SYSTEM - i18next + react-i18next + resources-to-backend - 5 locales: zh-CN, en-US, ja-JP, ru-RU, ar-SA - Dynamic import: `import(\`./locales/${language}.json\`)` - useI18n hook with locale detection from localStorage/navigator - Language switcher component ### API ROUTES (27 endpoints) All under /api/: - /chat β€” SSE chat stream (multi-agent) - /generate/scene-outlines-stream β€” SSE outline generation - /generate/scene-content β€” scene content generation - /generate/scene-actions β€” scene action generation - /generate/agent-profiles β€” agent profile generation - /generate/image β€” image generation - /generate/video β€” video generation - /generate/tts β€” single TTS audio generation - /parse-pdf β€” PDF parsing - /classroom β€” CRUD for server-stored classrooms - /classroom-media/[classroomId]/[...path] β€” media file serving - /generate-classroom β€” background classroom generation job - /generate-classroom/[jobId] β€” job status polling - /quiz-grade β€” LLM-based quiz grading - /pbl/chat β€” PBL runtime chat - /web-search β€” Tavily web search - /proxy-media β€” CORS proxy for remote media - /server-providers β€” server-configured provider list - /verify-model, /verify-image-provider, /verify-video-provider, /verify-pdf-provider β€” credential verification - /azure-voices β€” Azure TTS voice list - /transcription β€” audio transcription - /access-code/status, /access-code/verify β€” access code authentication - /health β€” health check ### SECURITY - Access code guard (HMAC-signed cookie) - SSRF guard for server-side URL fetching - Content Security Policy headers - Input validation on all API routes ### TESTING - Vitest for unit tests (29 test files) - Playwright for E2E tests (4 test suites) - Evaluation framework for whiteboard layout scoring and outline language detection Build the complete application with all 950+ source files, full type safety, and production-ready error handling. ``` --- ## πŸ”„ MULTI-SHOT PHASED PROMPTS ### Phase 1: Foundation & Scaffold ``` Me: Start building "MultiMind Classroom" β€” an AI interactive classroom. Set up the project foundation. Do: 1. Initialize React 19 + Vite 6 + TypeScript project 2. Configure: Tailwind CSS v4 with PostCSS, path alias @/* β†’ ./src/*, oklch color system 3. Install core deps: react-router-dom, zustand, dexie, lucide-react, motion, sonner, clsx, tailwind-merge, class-variance-authority, nanoid, zod 4. Install shadcn/ui components: button, dialog, dropdown-menu, popover, tooltip, tabs, input, textarea, select, checkbox, switch, slider, scroll-area, command, alert-dialog, card, badge, carousel, separator, progress, label, hover-card, context-menu, collapsible, avatar, alert, field, input-group, button-group, combobox 5. Create App.tsx with BrowserRouter wrapping: ThemeProvider β†’ I18nProvider β†’ AccessCodeGuard β†’ lazy routes (/, /classroom/:id, /generation-preview, /eval/whiteboard) β†’ Toaster 6. Create ThemeProvider (light/dark/system, localStorage persist, document.documentElement.classList toggle) 7. Create I18nProvider with i18next + react-i18next + resources-to-backend, 5 locales (zh-CN, en-US, ja-JP, ru-RU, ar-SA), dynamic JSON imports, localStorage locale persistence 8. Create globals.css with full oklch color system (:root + .dark), CSS custom properties for all shadcn tokens, Tailwind @theme inline block, ProseMirror styles, animation keyframes (wave, shimmer, breathing-bar, interactive-mode-breathe) 9. Create createLogger utility (timestamp + level + tag formatting) 10. Verify: `vite build` succeeds with 0 errors ``` ### Phase 2: Data Layer & State Management ``` Me: Build the data layer and state management for MultiMind Classroom. Do: 1. Create Dexie database 'multimind-db' with tables: stages (id, name, description, createdAt, updatedAt, languageDirective, style, currentSceneId, agentIds, interactiveMode), scenes (id, stageId, type, title, order, content, actions, whiteboard), audioFiles (id, blob, duration, format, text, voice), imageStore (id, blob, mimeType), chatSessions (id, stageId, sceneId, type, status, messages, config), outlines (id, stageId, outlines), mediaFiles (id, blob, mimeType), generatedAgents (id, agents) 2. Create stage-storage utilities: listStages() β†’ StageListItem[], deleteStageData(), renameStage(), getFirstSlideByStages() β†’ Record 3. Create image-storage utilities: storePdfBlob(), loadPdfBlob(), storeImages(), loadImageMapping(), cleanupOldImages() 4. Create useStageStore (Zustand): stage, scenes[], currentSceneId, outlines[], chats[], mode, generationStatus, generationEpoch, failedOutlines[], toolbarState. Actions: setStage, addScene, updateScene, deleteScene, setCurrentSceneId, loadFromStorage (IndexedDB β†’ state), saveToStorage (debounced state β†’ IndexedDB), getCurrentScene() 5. Create useCanvasStore: viewportSize, canvasScale, selectedElementIds[], editingElementId, isDrawing, creatingElement, ctrlOrShiftKeyActive, showGridLines, showRuler, snapToGrid 6. Create useSettingsStore with zustand/persist: providerId, modelId, thinkingConfigs, providersConfig, ttsProviderId/Voice/Speed, asrProviderId/Language, imageProviderId, videoProviderId, pdfProviderId, webSearchProviderId, playbackSpeed, sidebarCollapsed, chatAreaWidth, chatAreaCollapsed, agentMode, selectedAgentIds, fetchServerProviders(), all setters. Validate provider/model on rehydration. 7. Create useSnapshotStore, useKeyboardStore, useMediaGenerationStore, useWhiteboardHistoryStore, useWidgetIframeStore, useUserProfileStore 8. Create useAgentRegistry with zustand/persist: agents map, 3 default agents (teacher: "AI teacher" with full slide+whiteboard actions priority 10, assistant: "AIεŠ©ζ•™" with whiteboard-only priority 5, student: "ε₯½ε₯‡ε­¦η”Ÿ" with whiteboard-only priority 3). Each agent: id, name, role, persona (detailed teaching style), avatar, color, allowedActions, priority, voiceConfig, isDefault 9. Define all TypeScript types in lib/types/: slides.ts (PPTElement union with 10 element types, Slide, SlideTheme, SlideBackground), action.ts (20+ action types), stage.ts (Stage, Scene, SceneType, StageMode), chat.ts (ChatSession, StatelessChatRequest/Event), generation.ts (SceneOutline, UserRequirements, PdfImage), provider.ts (ProviderId, ProviderConfig, ModelInfo, ThinkingConfig), widgets.ts (5 widget configs), settings.ts, roundtable.ts, web-search.ts, pdf.ts, edit.ts, export.ts ``` ### Phase 3: AI Provider System ``` Me: Build the AI provider system with 15+ LLM providers. Do: 1. Create PROVIDERS registry with full configs for: openai (gpt-4o, gpt-5.5, o3-mini, o4-mini), anthropic (claude-4-sonnet, claude-3.7-sonnet), google (gemini-2.5-pro, gemini-2.5-flash), deepseek (deepseek-chat, deepseek-reasoner), qwen (qwen3-235b, qwen-max, qwen-plus), kimi (moonshot-v1-auto), glm (glm-4-plus, glm-z1-air), minimax (MiniMax-M1), siliconflow (meta-llama, Qwen, DeepSeek), doubao (doubao-pro, doubao-1.5-pro), hunyuan, xiaomi (MiMo-7B), grok (grok-3), openrouter (pass-through), ollama (local models) 2. Each provider: type ('openai'|'anthropic'|'google'|'openai-compatible'), defaultBaseUrl, requiresApiKey, icon path, models[] with contextWindow, outputWindow, capabilities (streaming, tools, vision, thinking config) 3. createLanguageModel(config) factory: routes to @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google based on provider type. OpenAI-compatible providers use createOpenAI with custom baseURL. 4. Thinking config system: ThinkingConfig = {mode: 'disabled'|'auto'|'manual', enabled, budget?}. Per-model thinking capability (toggleable, budgetAdjustable, defaultEnabled). getThinkingMode() and pickThinkingBudget() utilities. 5. callLLM(model, options) β€” single-shot generation with thinking support. streamLLM(model, options) β€” streaming with thinking. 6. Model metadata: applyModelMetadata() enriches model configs with catalog data. getCatalogThinkingCapability() returns thinking support level. 7. Server-side resolveModel() β€” resolves model string + API key + base URL into LanguageModel instance, handles server-configured providers from env vars. ``` ### Phase 4: Generation Pipeline ``` Me: Build the two-stage generation pipeline for creating classroom content from PDF. Do: 1. Create prompt template system: lib/prompts/ with loader.ts (loadPrompt, buildPrompt, interpolateVariables, processSnippets, processConditionalBlocks), types.ts (PromptId, SnippetId). Templates in templates/{promptId}/system.md + user.md. Snippets in snippets/*.md. Syntax: {{variable}}, {{snippet:name}}, {{#if flag}}...{{/if}}. 2. Create 20+ prompt templates: requirements-to-outlines, interactive-outlines, slide-content, quiz-content, slide-actions, quiz-actions, interactive-actions, simulation-content, diagram-content, code-content, game-content, visualization3d-content, widget-teacher-actions, pbl-actions, pbl-design, agent-system (4 variants: base, wb-teacher, wb-assistant, wb-student), director, web-search-query-rewrite 3. Stage 1 β€” Outline Generator: generateSceneOutlinesFromRequirements() builds prompt from PDF content + user requirements + agent info + language directive. SSE API endpoint streams outlines as incremental JSON objects. Frontend GenerationPreview page shows step visualization. 4. Stage 2 β€” Scene Generator: generateSceneContent(outline, context, model) dispatches to slide/quiz/interactive/PBL generators based on outline.type. generateSceneActions(content, outline, context, model) generates teacher speech + visual action sequences. createSceneWithActions() assembles final Scene. 5. Scene builder: buildSceneFromOutline() converts generated content to PPTElement[]. uniquifyMediaElementIds() ensures globally unique IDs for media placeholders. 6. Interactive post-processor: sanitizes HTML, injects CSS isolation, wraps in responsive container. 7. Action parser: parseActionsFromStructuredOutput() extracts [{type:"action", name, params}, {type:"text", content}] from LLM JSON output. 8. JSON repair: parseJsonResponse() handles malformed LLM JSON with bracket balancing, markdown fence stripping, partial parse recovery. 9. Pipeline runner: createGenerationSession() + runGenerationPipeline() orchestrates the full flow with callbacks. 10. API routes: /api/generate/scene-outlines-stream (SSE), /api/generate/scene-content (POST), /api/generate/scene-actions (POST), /api/generate/agent-profiles (POST) ``` ### Phase 5: Slide Renderer & Canvas ``` Me: Build the full slide renderer with PowerPoint-compatible elements and interactive canvas. Do: 1. Create Editor/Canvas with: viewport scaling (useViewportSize), drag-to-select (useMouseSelection), element selection (useSelectElement), element dragging (useDragElement), element scaling (useScaleElement with 8 resize handles), element rotation (useRotateElement), alignment lines (AlignmentLine), grid lines (GridLines), ruler (Ruler), drop support (useDrop) 2. Create 10 element renderers: - TextElement: ProseMirror editor with custom schema (paragraph, heading, bulletList, orderedList, hardBreak, marks: bold, italic, underline, strikethrough, color, backgroundColor, fontSize, fontFamily, textAlign, textIndent, lineHeight, superscript, subscript, link) - ImageElement: clip paths (rect, ellipse, polygon), filters, flip, shadow, outline - ShapeElement: 20+ SVG path formulas (roundRect, triangle, parallelogram, trapezoid, etc), gradient fills (linear/radial), pattern fills - LineElement: cubic bezier curves, arrow markers, point dragging (useDragLineElement) - ChartElement: ECharts integration (bar, line, pie, scatter, radar, area) - TableElement: cell editing, merge, border styling - LatexElement: KaTeX rendering with Temml fallback - VideoElement: HTML5 video with poster, autoplay control - CodeElement: Shiki syntax highlighting with 50+ language grammars - AudioElement: audio player UI 3. Create operate overlays: CommonElementOperate, ImageElementOperate, ShapeElementOperate (keypoint drag for path shapes), LineElementOperate (endpoint drag), TableElementOperate, TextElementOperate, MultiSelectOperate 4. Create ThumbnailSlide for sidebar scene list (scaled-down readonly render) 5. Create ThumbnailInteractive for interactive widget previews 6. Create ScreenElement and ScreenCanvas for presentation mode 7. Create ViewportBackground (slide background with solid/gradient/image fills) 8. Create element hooks: useElementFill, useElementFlip, useElementOutline, useElementShadow 9. Create canvas operations hook: useCanvasOperations with element CRUD, alignment, distribution, z-order, grouping ``` ### Phase 6: Multi-Agent Orchestration ``` Me: Build the LangGraph-based multi-agent orchestration system. Do: 1. Create OrchestratorState (LangGraph Annotation.Root): messages, storeState, availableAgentIds, maxTurns, languageModel, thinkingConfig, discussionContext, triggerAgentId, userProfile, agentConfigOverrides, turnSummaries[], whiteboardActions[], nextAgentId, isComplete, generatedChunks 2. Create director node: LLM-based multi-agent decision (who speaks next, what to do). Code fast-paths for turn 0 (trigger agent) and turn limits. Single-agent mode: pure code logic, no LLM call. 3. Create agent_generate node: resolves AgentConfig, builds structured prompt via buildStructuredPrompt(), streams LLM response, parses structured chunks [{type, name/content}], emits StatelessEvent via config.writer() 4. Create StateGraph: START β†’ director β†’ (endβ†’END | nextβ†’agent_generateβ†’director loop) 5. Create buildStructuredPrompt(): combines role guidelines, persona, state context (current slide elements), whiteboard ledger (spatial layout of all whiteboard elements), peer context (other agents' recent actions), available action descriptions, format examples 6. Create summarizers: conversation-summary (compress old messages), message-converter (UIMessage β†’ OpenAI format), state-context (current slide description), whiteboard-ledger (virtual whiteboard spatial state), whiteboard-conflicts (detect conflicting draws), peer-context (recent agent actions) 7. Create director-prompt: buildDirectorPrompt() with agent profiles, conversation history, available tools. parseDirectorDecision() extracts {nextAgentId, reason, isComplete} 8. Create tool-schemas: getEffectiveActions(role) returns allowed action schemas. getActionDescriptions() generates human-readable action docs. 9. Create AISdkLangGraphAdapter: bridges Vercel AI SDK LanguageModel to LangGraph's BaseChatModel interface 10. Create statelessGenerate(): entry point called by /api/chat, invokes graph.stream(), yields StatelessEvent SSE chunks ``` ### Phase 7: Playback Engine & Roundtable ``` Me: Build the PlaybackEngine state machine and Roundtable UI. Do: 1. Create PlaybackEngine class: state machine (idle/playing/paused/live), consumes Scene.actions[] via ActionEngine, manages scene transitions, handles discussion triggers, speed control (1x/1.25x/1.5x/2x) 2. Create ActionEngine: processes action queue β€” spotlight (dim other elements, highlight target), laser (red pointer effect), speech (fetch TTS β†’ AudioPlayer β†’ wait for completion), wb_open/close (toggle whiteboard overlay), wb_draw_* (add elements to whiteboard), wb_delete/clear, discussion (pause playback, switch to live mode), play_video, widget actions 3. Create AudioPlayer: Web Audio API wrapper with play/pause/stop, speed adjustment, volume control, onEnd callback 4. Create PlaybackEngine callbacks: onModeChange, onSceneChange, onActionStart/End, onSpeechStart/End, onDiscussionTrigger, onComplete 5. Create computePlaybackView() β€” derives presentation state from engine: currentSpeech, speakingAgentId, audioState, progress 6. Create Stage component (main container): integrates SceneSidebar + CanvasArea + Roundtable + ChatArea. Manages PlaybackEngine lifecycle, discussion flow (trigger β†’ live chat β†’ end β†’ resume), TTS during discussions via useDiscussionTTS 7. Create Roundtable component: voice waveform bars, agent avatar ring with speaking indicator, chat input with send button + voice recording, proactive discussion cards, slide navigation (prev/next), playback controls (play/pause/speed), presentation mode toggle, whiteboard toggle, thinking state display, end flash animation 8. Create PresentationSpeechOverlay: full-screen speech text display during presentation mode 9. Create SceneSidebar: scene thumbnail list with drag-to-reorder, generation progress indicators, failed outline retry, home navigation 10. Create Header: back button, settings gear, theme switcher, language switcher, export dropdown (PPTX, classroom ZIP) ``` ### Phase 8: Interactive Widgets & PBL ``` Me: Build the interactive widget system and PBL mode. Do: 1. Create 5 widget content generators (each calls LLM with specialized prompts): - simulation-content: generates HTML with variable sliders, canvas/SVG visualization, physics formulas - diagram-content: generates React Flow JSON (nodes, edges, layout) - code-content: generates executable code with output panel, language selector - game-content: generates HTML5 game with scoring, levels, educational goals - visualization3d-content: generates Three.js scene with camera controls, annotations 2. Create InteractiveRenderer: sandboxed iframe loading widget HTML, postMessage bridge for teacher actions (highlight, setState, annotation, reveal) 3. Create widget teacher action generation: widget-teacher-actions prompt generates action sequence for teacher to guide students through widget 4. Create useWidgetIframeStore: register/unregister iframes, send setState/highlight/annotation/reveal messages 5. Create PBL generation system: - generatePBLContent() using Vercel AI SDK generateText with tools and stepCountIs stopWhen - MCP tools: ModeMCP (set PBL mode), ProjectMCP (set project config), AgentMCP (create agent roles), IssueboardMCP (create issues with acceptance criteria) - buildPBLSystemPrompt() with project topic, skills, language directive 6. Create PBL renderer components: - PBLRenderer: main container with role selection β†’ workspace - RoleSelection: choose student role from generated options - ChatPanel: per-role chat with @mention routing to agents - IssueboardPanel: kanban-style issue tracking - Workspace: collaborative workspace area - Guide: step-by-step project guide 7. Create /api/pbl/chat endpoint: handles @mention routing, generates agent responses per role ``` ### Phase 9: Media Generation Pipeline ``` Me: Build the media generation pipeline for AI images and videos. Do: 1. Create MediaGenerationStore: tasks Map, enqueueTasks(), completeTask(), failTask(), restoreFromDB(), revokeObjectUrls() 2. Create media orchestrator: generateMediaForOutlines() collects all media requests from outlines[].mediaGenerations, filters by enabled providers, processes serially (API concurrency limits) 3. Create image provider adapters: - Seedream (ByteDance): POST to ark.cn-beijing.volces.com with HMAC auth - OpenAI Image: POST to /v1/images/generations - Qwen Image: POST to dashscope with async task pattern - Nano Banana: POST with banana.dev API - MiniMax Image: POST to api.minimax.chat - Grok Image: POST to api.x.ai 4. Create video provider adapters (all async task pattern: submit β†’ poll β†’ download): - Seedance (ByteDance): HMAC-signed requests, JWT token for kling - Kling (Kuaishou): JWT auth, task polling - Veo (Google DeepMind): OAuth, long-running operations - MiniMax Video, Grok Video 5. Each adapter: generate(config, options) β†’ {url, blob}, testConnectivity(config) β†’ boolean 6. Create /api/generate/image and /api/generate/video endpoints 7. Create /api/proxy-media endpoint for CORS proxy of remote media URLs 8. Create /api/verify-image-provider and /api/verify-video-provider for credential testing 9. Create MediaPopover UI: shows generation progress per media element, retry failed, preview generated media ``` ### Phase 10: Audio System (TTS/ASR) ``` Me: Build the TTS and ASR audio system with 8+ providers. Do: 1. Create TTS provider registry with configs: openai-tts (alloy/echo/fable/onyx/nova/shimmer), azure-tts (500+ voices from azure.json), glm-tts, qwen-tts (sambert voices), minimax-tts (3 models), doubao-tts, elevenlabs-tts, voxcpm (custom voice cloning), browser-tts (Web Speech API) 2. Each TTS provider: id, name, requiresApiKey, defaultBaseUrl, icon, voices[], supportedFormats, speedRange 3. Create generateTTS(config, text) router: dispatches to provider-specific functions, returns {audio: Uint8Array, format: string} 4. Create ASR provider registry: openai-whisper, qwen-asr, browser-asr 5. Create transcribeAudio(config, audioBlob) router 6. Create voice resolver: getAvailableProvidersWithVoices(), maps agent voiceConfig to provider+voice 7. Create VoxCPM integration: custom voice profiles, VLLM model support, voice cloning 8. Create /api/generate/tts endpoint (single TTS generation) 9. Create /api/transcription endpoint 10. Create /api/azure-voices endpoint (Azure voice list) 11. Create useAudioRecorder hook: MediaRecorder API, audio visualization, silence detection 12. Create useBrowserTTS hook: Web Speech API fallback 13. Create useDiscussionTTS hook: manages TTS lifecycle during live discussions, queues speech, handles interruption 14. Create useTTSPreview hook: preview voice in settings 15. Create SpeechButton component: toggle voice recording with waveform 16. Create TTSConfigPopover: voice selector, speed slider, provider selector ``` ### Phase 11: Chat System & Streaming ``` Me: Build the chat system with SSE streaming and session management. Do: 1. Create StatelessChatRequest type: messages (UIMessage[]), storeState ({stage, scenes, currentSceneId, mode}), config ({agentIds, sessionType, maxTurns, triggerAgentId}), model, apiKey, baseUrl, providerType, userProfile, agentConfigs 2. Create StatelessEvent type: {type: 'text'|'tool_call'|'error'|'done', agentId?, content?, toolName?, args?} 3. Create /api/chat POST endpoint: validates request, resolves model, invokes statelessGenerate(), streams SSE events via ReadableStream + TextEncoder 4. Create useChatSessions hook (53KB): manages multiple ChatSession instances per scene, sendMessage() β†’ fetch SSE β†’ parse events β†’ update messages, handleInterrupt() β†’ abort controller, auto-create QA session on first user message, create discussion session from proactive card, persist sessions to IndexedDB, restore on load 5. Create StreamBuffer: accumulates text chunks from SSE, reveals words incrementally for natural TTS sync, tracks reveal progress (0-1) for auto-scroll 6. Create ChatArea component: session tab list, message list with agent avatars + colors, inline action tags (spotlight/highlight buttons in messages), lecture notes view (extracted from speech actions), typing indicator 7. Create ChatSession component: handles individual session rendering, message input, send/interrupt buttons 8. Create ProactiveCard component: discussion invitation cards with topic, prompt, accept/skip buttons, animation 9. Create InlineActionTag component: clickable action buttons within messages (triggers spotlight/insert on slide) 10. Create LectureNotesView: extracts and displays all speech text from actions as structured notes ``` ### Phase 12: Settings, i18n, Export, Polish ``` Me: Build settings, internationalization, export system, and polish everything. Do: 1. Create SettingsDialog with tabbed sections: General (theme, language, access code), Model (provider selector, model selector with search, API key input, base URL, thinking config toggle), Audio (TTS provider/voice/speed, ASR provider, per-agent voice assignment), Image (provider, model, API key, aspect ratio), Video (provider, model, API key), PDF (provider, API key), Web Search (provider, API key), Agent (agent list with add/edit/delete, persona editor, action permissions, priority) 2. Create ModelSelector: searchable dropdown with provider grouping, model capabilities badges (vision, tools, thinking), context window display 3. Create AddProviderDialog: custom provider registration with name, base URL, API key, models 4. Create ProviderConfigPanel: provider-specific settings form 5. Create i18n translation files for all 5 locales (1500+ translation keys each): home, classroom, settings, generation, chat, quiz, whiteboard, export, agents, audio, errors, common 6. Create LanguageSwitcher component: dropdown with locale labels + short codes 7. Create PowerPoint export: useExportPPTX hook converts scenes to PPTX via pptxgenjs. Handles: text with rich formatting, images with clip paths, shapes with SVG paths, charts (ECharts β†’ static image), tables, LaTeX (KaTeX β†’ MathML β†’ OMML), videos (poster image), code blocks (syntax-highlighted HTML) 8. Create classroom ZIP export/import: useExportClassroom hook creates ZIP with manifest.json + scenes + audio + images + agents. useImportClassroom hook parses ZIP and restores to IndexedDB. 9. Create HTML parser for PPTX: lexer β†’ parser β†’ format β†’ stringify pipeline for converting ProseMirror HTML to PPTX rich text runs 10. Create LaTeX to OMML converter chain: KaTeX β†’ MathML β†’ OMML (via mathml2omml package) for PowerPoint math equations 11. Create AccessCodeGuard + AccessCodeModal: HMAC-signed token verification, cookie persistence 12. Create ServerProvidersInit: fetches /api/server-providers on mount, merges into settings store 13. Create UserProfile component: expandable pill with avatar picker (12 built-in + custom upload), nickname editor, bio textarea 14. Create GeneratingProgress component: step-by-step progress during classroom generation 15. Create OutlinesEditor: edit generated outlines before scene generation (reorder, delete, rename) 16. Add all CSS animations, transitions, hover states, dark mode variants 17. Verify: full build succeeds, all routes render, settings persist, i18n switches correctly ``` --- ## πŸ”§ Key Technical Decisions | Decision | Rationale | |----------|-----------| | **Zustand over Redux** | Simpler API, better TypeScript support, no boilerplate, built-in persist middleware | | **Dexie over raw IndexedDB** | Type-safe queries, promise-based API, versioned migrations, compound indexes | | **Vercel AI SDK** | Unified streaming interface across 15+ providers, built-in tool calling, thinking support | | **LangGraph for orchestration** | Stateful graph execution, conditional routing, streaming writer API, battle-tested | | **ProseMirror over Slate/TipTap** | Lower-level control needed for PowerPoint-compatible rich text, custom schema/marks | | **Vite over Next.js (conversion)** | Client-only SPA, no SSR needed, faster builds, simpler deployment | | **File-based prompts** | Version-controllable, composable via snippets, conditional blocks for feature flags | | **ActionEngine pattern** | Unified sync/async action execution, same types for live streaming and playback | | **IndexedDB for everything** | Offline-first, large blob storage (audio/images), no server dependency for user data | | **iframe sandbox for widgets** | Security isolation for LLM-generated HTML/JS, postMessage for controlled communication | --- ## πŸ–Ό Frontend β€” Full UI Build Prompt ### F1: Page Layouts & Routing **4 pages, all lazy-loaded via `React.lazy()` + ``:** | Route | Component | Layout | |-------|-----------|--------| | `/` | `HomePage` (890 lines) | Full-screen gradient bg (`from-slate-50 to-slate-100`). Top-right floating toolbar pill (glass morphism: bg-white/60 backdrop-blur-md border rounded-full) with 3-way theme toggle (Sun/Moon/Monitor icons cycling), LanguageSwitcher dropdown, Settings gear button. Center: animated logo + tagline. Main card: requirement textarea (auto-resize) + PDF upload button + InteractiveMode toggle (Atom icon with breathing glow) + GenerationToolbar below. Below main card: collapsible "Recent Classrooms" grid (responsive 1β†’2β†’3β†’4 cols) with ThumbnailSlide previews, date badges, rename/delete with inline confirmation overlay. User profile expandable pill (top-left). Classroom import via hidden file input. Full search with real-time filtering. | | `/generation-preview` | `GenerationPreviewPage` (900 lines) | Centered card with vertical step progress. Steps: PDF Analysis (scanning laser animation), Web Search (globe + card stack), Outline Generation (streaming outline cards with type icons), Agent Generation (reveals agents with `AgentRevealModal`), Scene Content (slide assembly animation), Actions (action sequence visualization). Error state with retry. Auto-navigates to `/classroom/:id` on completion. | | `/classroom/:id` | `ClassroomPage` (180 lines) | Loads classroom from IndexedDB (or server fallback), restores agents, auto-resumes pending generation. Full-screen flex layout: `SceneSidebar` (left, collapsible) + center column (`Header` + `CanvasArea` with `Whiteboard` overlay) + `ChatArea` (right, resizable width, collapsible) + `Roundtable` (bottom overlay). | | `/eval/whiteboard` | `WhiteboardEvalPage` (60 lines) | Debug tool. Bootstraps synthetic stage/scene, renders `ScreenElement` for whiteboard layout evaluation. | ### F2: Component Hierarchy (203 files) **Top-10 largest components by complexity:** | Component | Lines | Responsibility | |-----------|-------|---------------| | `Roundtable` | 2094 | Main classroom interaction: 12-bar voice waveform, agent avatar ring with speaking indicator, chat input with Send/Mic toggle, ProactiveCard for discussion invitations, slide nav, playback controls (Play/Pause + speed 1Γ—β†’2Γ—), presentation mode, whiteboard toggle, volume slider with mute, thinking state, end-of-discussion flash, PresentationSpeechOverlay | | `useChatSessions` | 1525 | Hook managing all chat state: session CRUD, SSE streaming (fetch β†’ ReadableStream β†’ TextDecoder β†’ JSON.parse per line), abort controller for interruption, tool call execution, IndexedDB persistence | | `Stage` | 1271 | Master orchestrator: creates PlaybackEngine, wires all callbacks, manages ActionEngine + AudioPlayer lifecycles, computes playbackView, handles discussion flow, connects useDiscussionTTS | | `PromptInput` | 1267 | Rich text input with @mention support, file attachments, voice recording integration, model selector inline | | `TTSSettings` | 1264 | Full TTS configuration: provider list with enable/disable toggles, voice preview with play button, speed slider, custom model CRUD, VoxCPM configuration | | `SettingsDialog` | 1143 | Tabbed dialog (10 tabs): providers, image, video, tts, asr, pdf, web-search, general, agents. Left sidebar navigation with icons. Provider list column pattern. | | `AgentBar` | 997 | Agent selection/configuration for generation: horizontal scrollable agent cards with checkbox, voice config popover per agent, shuffle random selection | | `QuizView` | 985 | Full quiz interface: 5 question types, 4 phases (not_started β†’ answering β†’ grading β†’ reviewing), score pie chart, per-question feedback, retry, draft persistence | | `GenerationToolbar` | 893 | Toolbar: model selector, PDF upload, PDF provider selector, web search toggle, thinking config, media settings popover | | `AudioSettings` | 799 | TTS + ASR combined settings: provider cards with logo, API key inputs, voice selector with preview | **Full directory structure with responsibilities (all under `components/`):** ``` access-code-guard.tsx β€” Fetches /api/access-code/status, shows modal if auth needed access-code-modal.tsx β€” Animated overlay: shield icon, input, submit, success checkmark agent/ agent-avatar.tsx β€” Avatar: URLβ†’AvatarImage or emojiβ†’AvatarFallback, 3 sizes (sm/md/lg), color ring agent-bar.tsx β€” Agent selection bar with per-agent voice config popover agent-config-panel.tsx β€” Edit agent: name, role, persona textarea, color picker, priority slider agent-reveal-modal.tsx β€” Staggered card flip animation revealing generated agents with role icons and sparkle particles ai-elements/ β€” 19 Vercel AI SDK UI components (message, prompt-input, code-block, reasoning, sources, etc.) audio/ speech-button.tsx β€” Mic toggle with waveform bars animation, long-press to record tts-config-popover.tsx β€” Voice + speed selector popover canvas/ canvas-area.tsx β€” Main slide display: SceneRenderer + Whiteboard overlay + play hint + CanvasToolbar canvas-toolbar.tsx β€” Bottom toolbar: sidebar toggle, slide nav, play/pause, volume, speed, whiteboard, presentation, chat toggle chat/ chat-area.tsx β€” Right panel: session tabs, message list, lecture notes toggle chat-session.tsx β€” Individual session: messages with agent avatars, input, send/stop buttons inline-action-tag.tsx β€” Clickable action buttons in messages (spotlight/highlight) lecture-notes-view.tsx β€” Extracted speech texts as structured notes proactive-card.tsx β€” Discussion invitation: topic text, accept/skip, animated border gradient session-list.tsx β€” Horizontal tab bar for QA/discussion/lecture sessions use-chat-sessions.ts β€” Master hook for all chat state + SSE streaming generation/ generating-progress.tsx β€” Step progress with completion checkmarks generation-toolbar.tsx β€” Model + PDF + search + thinking + media toolbar media-popover.tsx β€” Image/video provider/model/API key configuration popover outlines-editor.tsx β€” Edit outlines: drag reorder, delete, rename, type badges header.tsx β€” Top bar: back arrow, title, settings, theme switcher, language, export menu language-switcher.tsx β€” Dropdown: 5 locales with native labels + short codes roundtable/ β€” index.tsx (2094), presentation-speech-overlay.tsx (498), audio-indicator.tsx, constants.ts scene-renderers/ β€” classroom-complete.tsx, interactive-renderer.tsx, pbl-renderer.tsx, pbl/ (6 files), quiz-renderer.tsx, quiz-view.tsx server-providers-init.tsx β€” Side-effect: fetches server providers on mount settings/ β€” 17 files total (see F8 Phase 6 for details) slide-renderer/ Editor/Canvas/ β€” index.tsx (415), 5 canvas sub-components, Operate/ (7 files), hooks/ (11 hooks) Editor/ β€” HighlightOverlay, LaserOverlay, ScreenCanvas, ScreenElement, SpotlightOverlay, ZoomWrapper components/element/ β€” 10 element types (Text, Image, Shape, Line, Chart, Table, Latex, Video, Code) + hooks/ components/ThumbnailSlide/, ThumbnailInteractive/ stage.tsx (1271) β€” Master classroom orchestrator stage/scene-renderer.tsx β€” Routes scene.type β†’ Canvas/QuizRenderer/InteractiveRenderer/PBLRenderer stage/scene-sidebar.tsx β€” Left sidebar: home, thumbnail list, generation progress, failed retry ui/ β€” 32 shadcn/ui primitives (see F3) user-profile.tsx β€” Expandable pill: avatar picker, name editor, bio textarea whiteboard/ β€” index.tsx (container), whiteboard-canvas.tsx (445), whiteboard-history.tsx ``` ### F3: shadcn/ui Component Library (32 primitives) All in `components/ui/`, built on Radix primitives + CVA: `alert-dialog` (184), `alert` (73), `avatar` (96), `avatar-display` (29), `badge` (45), `button` (67, variants: default/destructive/outline/secondary/ghost/link, sizes: default/sm/lg/icon), `button-group` (78), `card` (92), `carousel` (231, embla-carousel-react), `checkbox` (28), `collapsible` (21), `combobox` (275, cmdk + popover), `command` (180, cmdk), `context-menu` (239), `dialog` (142), `dropdown-menu` (242), `field` (224, @base-ui/react), `hover-card` (38), `input` (19), `input-group` (144), `label` (21), `popover` (31), `progress` (31), `scroll-area` (55), `select` (184), `separator` (28), `slider` (25), `sonner` (45, uses custom useTheme), `switch` (29), `tabs` (80), `textarea` (18), `tooltip` (57) ### F4: Hooks & Contexts (15 hooks, 2 contexts) | Hook | Lines | Purpose | |------|-------|---------| | `useCanvasOperations` | 587 | Element CRUD, alignment, distribution, z-order, group/ungroup, clipboard, delete, select all | | `useSceneGenerator` | 576 | Orchestrates scene generation: generateRemaining(), retrySingleOutline(), stop() | | `useDiscussionTTS` | 343 | TTS during live discussions: queue speech chunks, play sequentially, handle interruption | | `useAudioRecorder` | 325 | MediaRecorder API: start/stop, audio visualization, silence detection, output Blob | | `useOrderElement` | 191 | Z-order operations: bring to front, send to back, move forward/backward | | `useBrowserASR` | 155 | Web Speech API recognition: start/stop, interim/final results, language | | `useBrowserTTS` | 150 | Web Speech API synthesis: speak, cancel, voice selection, speed/pitch | | `useStreamingText` | 124 | Word-by-word text reveal from StreamBuffer, progress tracking (0β†’1) | | `useDraftCache` | 95 | Generic localStorage cache for form drafts with TTL | | `useTheme` | 71 | Theme context: light/dark/system, resolvedTheme, media query listener, localStorage | | `useI18n` | 66 | I18n context: locale, setLocale, t(), browser detection, localStorage | | `useSlideBackgroundStyle` | 54 | Computes CSS background from SlideBackground type | | `useHistorySnapshot` | 41 | Wraps snapshot store: push, undo, redo, canUndo/canRedo | | `useExportPPTX` | ~1000 | PowerPoint export: PPTElement[] β†’ pptxgenjs calls, HTMLβ†’rich text, SVGβ†’polygon, LaTeXβ†’OMML | | `useExportClassroom` | ~200 | ZIP export: manifest.json + scenes + audio + images + agents | **Contexts:** `SceneContext` (211 lines β€” provides current scene data via `SceneProvider`, `useSceneData()`, `useSceneSelector()`), `MediaStageContext` (18 lines β€” provides stageId for IndexedDB keys) ### F5: CSS System, Animations & Theming **globals.css (218 lines):** `@import 'tailwindcss'` + `'tw-animate-css'` + `'shadcn/tailwind.css'`. `@custom-variant dark`. `@theme inline` with 30+ oklch color tokens. `:root` light theme (--primary: #722ed1 purple). `.dark` theme (--primary: #8b47ea). `--radius: 0.625rem` base. **6 Keyframe Animations:** `wave` (audio bars), `breathing-bar-1/2/3` (speech indicators), `shimmer` (skeleton loading), `interactive-mode-breathe` (button glow) **Motion Patterns:** `` for enter/exit, `` for conditional, spring physics (`damping:20, stiffness:300`), `layout` for reflows, `staggerChildren:0.1`, gesture (`whileHover scale:1.02`, `whileTap scale:0.97`) ### F6: Configuration Objects (13 config files) `shapes.ts` (1031 lines, 20+ SVG path formulas), `symbol.ts` (~700, unicode categories), `animation.ts` (~200, enter/exit animation defs), `theme.ts` (~100, 10+ preset themes), `hotkey.ts` (~130, keyboard shortcuts), `image-clip.ts` (~170, clip path presets), `latex.ts` (~200, symbol palette), `chart.ts` (~70, chart type presets), `font.ts` (~40, font families), `lines.ts` (~40, line styles), `element.ts` (~10, default dimensions), `mime.ts` (~15, MIME mapping), `storage.ts` (~3, localStorage keys) ### F7: One-Shot Frontend Mega Prompt ``` Me: Build the complete frontend for MultiMind Classroom β€” every component, page, hook, and animation. Do: Create 203 React components, 15 hooks, 2 contexts, 32 shadcn/ui primitives, and 13 config files: PAGES: HomePage (890 lines, gradient bg, floating toolbar pill with theme/language/settings, centered logo animation, main card with auto-resize Textarea + InteractiveMode Atom toggle + GenerationToolbar + gradient Generate button, collapsible Recent Classrooms responsive grid with ThumbnailSlide previews + rename/delete + search, UserProfileCard expandable pill). GenerationPreviewPage (900 lines, vertical step list with 6 StepVisualizer animations: PdfScan laser, WebSearch globe+cards, StreamingOutlines stagger, AgentReveal flip-in cards, Content assembly, Actions sequence, AgentRevealModal with staggered rotateY flip + role icons + color borders). ClassroomPage (180 lines, loads from IndexedDB, renders Stage). WhiteboardEvalPage (60 lines, debug tool). STAGE (1271 lines): PlaybackEngine lifecycle, ActionEngine+AudioPlayer wiring, discussion flow state machine, useDiscussionTTS integration, fullscreen container ref, AlertDialog confirmations. Layout: SceneSidebar (left) + Header+CanvasArea+Whiteboard (center) + ChatArea (right, resizable) + Roundtable (bottom overlay). ROUNDTABLE (2094 lines): 12 motion.div voice waveform bars (peaks 14-27px, durations 0.53-0.78s), agent avatar ring with color+speaking pulse, Textarea chat input + Send ArrowUp + Mic toggle, ProactiveCard gradient border, slide ChevronLeft/Right nav, Play/Pause toggle, speed dropdown 1Γ—/1.25Γ—/1.5Γ—/2Γ—, Repeat restart, Volume slider+mute, PencilLine whiteboard toggle, Maximize2 presentation mode, Loader2 thinking state, PresentationSpeechOverlay word-by-word reveal. CANVAS: Editor/Canvas (415 lines) with 11 hooks (viewport, select, drag, scale with 8 handles, rotate, mouse selection, line drag, keypoint move, create, drop, common). 10 element renderers (Text/ProseMirror with 10 marks, Image with clip+filters, Shape with SVG paths+gradients, Line with bezier+markers, Chart/ECharts, Table, Latex/KaTeX, Video, Code/Shiki). 7 Operate overlays. ThumbnailSlide. ScreenElement/ScreenCanvas. Spotlight/Highlight/Laser overlays. WHITEBOARD: Container with slide-up animation, toolbar (Eraser+History+Close), WhiteboardCanvas (pan/zoom, AnimatedElement staggered entrance scale 0β†’1 delay index*0.06s, cascade exit reverse-order rotate+scaleβ†’0), WhiteboardHistory snapshot timeline. CHAT: useChatSessions (1525 lines, SSE streaming, abort, IndexedDB persistence), ChatArea (session tabs + message list + lecture notes), ChatSession (agent avatars+colors, markdown, inline action tags), ProactiveCard (animated gradient border), LectureNotesView. QUIZ: QuizView (985 lines, 5 question types: single-choice radio, multiple-choice checkbox, fill-in-blank input, true-false toggle, short-answer textarea+voice. 4 phases: not_startedβ†’answeringβ†’gradingβ†’reviewing. Score pie chart, feedback accordion, draft persistence). SETTINGS: SettingsDialog (1143 lines, 10 tabs with left nav icons, ProviderListColumn pattern), 17 sub-files. ModelSelector (Combobox with search, provider grouping, capability badges Eye/Wrench/Brain). AgentBar (997 lines, scrollable cards, voice config hierarchy popover). TTSSettings (1264 lines), AudioSettings (799 lines), ASRSettings (559 lines). SCENE RENDERERS: ClassroomComplete (confetti 7 colors, scene type breakdown, score summary), InteractiveRenderer (iframe sandbox + postMessage), PBLRenderer + 6 sub-components (RoleSelection, ChatPanel, IssueboardPanel, Workspace, Guide). UI PRIMITIVES: 32 shadcn/ui components on Radix + CVA. Button variants (default/destructive/outline/secondary/ghost/link, sizes default/sm/lg/icon). Full dark mode. cn() utility everywhere. All components use: cn() for conditional classes, useI18n() for all text, motion/react for animations, lucide-react for icons, readonly props, sonner for toasts, controlled state (useState). ``` ### F8: Multi-Shot Frontend Build (6 phases) #### F-Phase 1: UI Primitives & Layout Shell ``` Me: Build the UI foundation β€” shadcn/ui components, layout shell, and page routing. Do: 1. Create all 32 shadcn/ui components in components/ui/ with Radix primitives, CVA variants, cn() utility 2. Create globals.css: @import tailwindcss + tw-animate-css + shadcn/tailwind.css. @custom-variant dark. @theme inline with 30+ oklch tokens. :root light (--primary:#722ed1). .dark (--primary:#8b47ea). 6 keyframe animations. scrollbar-hide utility. ProseMirror styles. 3. Create App.tsx: BrowserRouter β†’ ThemeProvider β†’ I18nProvider β†’ ServerProvidersInit β†’ AccessCodeGuard β†’ Suspense β†’ Routes β†’ Toaster 4. Create page shells for all 4 routes 5. Create Header: back arrow, settings gear, theme switcher (Sun/Moon/Monitor cycle), LanguageSwitcher, export dropdown 6. Create AccessCodeGuard + AccessCodeModal (animated overlay, shield icon, input, success animation) 7. Create UserProfileCard: collapsible pill, avatar grid (12 SVGs + upload), nickname edit, bio textarea, Motion expand/collapse 8. Create LanguageSwitcher: dropdown with 5 locales, click-outside close 9. Create cn() utility, createLogger(), all 13 config files (shapes.ts 1031 lines, animation.ts, theme.ts, hotkey.ts, image-clip.ts, latex.ts, chart.ts, font.ts, lines.ts, element.ts, mime.ts, storage.ts, symbol.ts) ``` #### F-Phase 2: HomePage & Generation Flow ``` Me: Build the HomePage and GenerationPreviewPage with all interactions. Do: 1. HomePage (890 lines): gradient bg, fixed toolbar pill, centered logo animation, main card with Textarea + InteractiveMode toggle (Atom breathing) + GenerationToolbar + gradient Generate button 2. GenerationToolbar: inline model selector, PDF upload (Paperclip + badge), PDF provider Select, web search Globe toggle, thinking Brain popover, MediaPopover 3. Recent Classrooms: collapsible chevron, responsive grid, ClassroomCard (ThumbnailSlide, metadata badge, name tooltip+copy, rename, delete overlay confirmation) 4. Search: InputGroup with Search icon, real-time filter, AnimatePresence 5. GenerationPreviewPage (900 lines): vertical step list, 6 StepVisualizers (PdfScan laser, WebSearch globe+cards, StreamingOutlines stagger, AgentGeneration, Content, Actions) 6. AgentRevealModal: full-screen overlay, staggered flip-in (rotateY), role icons (πŸ‘¨β€πŸ«/πŸ“š/πŸŽ“), color borders, auto-continue 7. OutlinesEditor: drag-reorder, delete/rename, type badges 8. Wire flow: HomePage form β†’ sessionStorage β†’ GenerationPreviewPage SSE β†’ IndexedDB β†’ navigate /classroom/:id ``` #### F-Phase 3: Slide Renderer & Canvas System ``` Me: Build the full slide renderer with all 10 element types and interactive canvas. Do: 1. Editor/Canvas (415 lines): viewport scaling, element rendering loop, mouse events 2. 11 canvas hooks: useViewportSize, useSelectElement, useDragElement, useScaleElement (8 resize handles), useRotateElement, useMouseSelection, useDragLineElement, useMoveShapeKeypoint, useInsertFromCreateSelection, useDrop, useCommonOperate 3. 10 element renderers: TextElement (ProseMirror with paragraph/heading/bulletList/orderedList + 10 marks: bold/italic/underline/strikethrough/forecolor/backcolor/fontsize/fontname/textAlign/lineHeight/subscript/superscript/link), ImageElement (clip-path + CSS filters + flip), ShapeElement (SVG path formulas + gradient/pattern fills), LineElement (cubic bezier + arrow markers), ChartElement (ECharts), TableElement, LatexElement (KaTeX + Temml), VideoElement, CodeElement (Shiki 50+ grammars) 4. 7 Operate overlays per element type 5. ThumbnailSlide, ScreenElement/ScreenCanvas, ViewportBackground 6. HighlightOverlay, LaserOverlay, SpotlightOverlay ``` #### F-Phase 4: Classroom Layout ``` Me: Build the classroom layout β€” Stage orchestrator, sidebar, canvas area, chat. Do: 1. Stage (1271 lines): PlaybackEngine lifecycle, discussion flow, useDiscussionTTS, sidebar/chat/whiteboard state 2. SceneSidebar (559 lines): home button, ThumbnailSlide list, active highlight, generation progress, failed retry, collapse animation 3. CanvasArea (274 lines): SceneRenderer routing, Whiteboard overlay, play hint, CanvasToolbar 4. CanvasToolbar (440 lines): sidebar toggle, slide nav, play/pause, speed dropdown, volume slider+mute, whiteboard toggle, chat toggle, presentation mode, stop discussion 5. ChatArea (340 lines): resizable right panel (drag handle min 280px max 500px), session tabs, message list, lecture notes 6. ChatSession (367 lines): agent avatar+color bubbles, markdown, inline action tags, input+send/stop 7. SessionList, ProactiveCard (gradient border animation), InlineActionTag, LectureNotesView ``` #### F-Phase 5: Roundtable, Whiteboard & Interactive ``` Me: Build Roundtable, Whiteboard, and interactive renderers. Do: 1. Roundtable (2094 lines): 12-bar waveform, agent avatars with speaking pulse, chat input+Send+Mic, ProactiveCard, slide nav, playback controls, volume, speed, whiteboard toggle, presentation mode, thinking state 2. PresentationSpeechOverlay (498 lines): fullscreen speech word-by-word reveal, agent avatar, breathing bars 3. Whiteboard container: AnimatePresence slide-up, toolbar (Eraser/History/Close), element count badge 4. WhiteboardCanvas (445 lines): pan+zoom, AnimatedElement staggered entrance (scale 0β†’1, delay index*0.06s), cascade exit 5. WhiteboardHistory: snapshot timeline with thumbnails and restore 6. InteractiveRenderer: iframe sandbox + postMessage, QuizRenderer β†’ QuizView (985 lines, 5 question types, 4 phases) 7. ClassroomComplete: confetti (7 colors), scene type breakdown, score summary, encouragement 8. PBL components: PBLRenderer, RoleSelection, ChatPanel, IssueboardPanel, Workspace, Guide ``` #### F-Phase 6: Settings, Agents & Export ``` Me: Build Settings dialog, Agent system UI, and Export UI. Do: 1. SettingsDialog (1143 lines): two-column (left nav 10 tabs + right content), ProviderListColumn pattern 2. ProviderConfigPanel (438 lines): API key, base URL, model list, test connection 3. ModelSelector (423 lines): Combobox search, provider grouping, capability badges (Eye/Wrench/Brain) 4. ModelEditDialog, AddProviderDialog, AddAudioProviderDialog 5. All settings sub-pages: GeneralSettings, ImageSettings, VideoSettings, TTSSettings (1264 lines), ASRSettings (559), AudioSettings (799), PDFSettings (303), WebSearchSettings 6. AgentBar (997 lines): scrollable cards, voice config popover with providerβ†’modelβ†’voice hierarchy + search 7. AgentAvatar, AgentConfigPanel (persona editor, color picker, priority slider, action checkboxes) 8. Export UI in Header: PPTX (useExportPPTX), ZIP (useExportClassroom), Import (file input), loading toasts 9. MediaPopover (460 lines): image/video provider+model+key, enable toggles, aspect ratio 10. GeneratingProgress: step indicators with elapsed time ```