muthuk1
/

OpenMAIC-React

Model card Files Files and versions

xet

Community

muthuk1 commited on 10 days ago

Commit

8dd8a8d

verified ·

1 Parent(s): a0ebf39

Add comprehensive MeDo-styled prompt document for recreating the full project

Browse files

Files changed (1) hide show

PROMPT.md +627 -0

PROMPT.md ADDED Viewed

	@@ -0,0 +1,627 @@

+# 🧠 PROMPT.md — Build OpenMAIC from Scratch
+> **MeDo-Styled Prompts** to recreate the full OpenMAIC AI Interactive Classroom application.
+> One-shot (single mega-prompt) and Multi-shot (phased build) variants included.
+---
+## 📋 Table of Contents
+1. [Project Overview](#-project-overview)
+2. [Architecture Blueprint](#-architecture-blueprint)
+3. [One-Shot Mega Prompt](#-one-shot-mega-prompt)
+4. [Multi-Shot Phased Prompts](#-multi-shot-phased-prompts)
+   - Phase 1: Foundation & Scaffold
+   - Phase 2: Data Layer & State Management
+   - Phase 3: AI Provider System
+   - Phase 4: Generation Pipeline
+   - Phase 5: Slide Renderer & Canvas
+   - Phase 6: Multi-Agent Orchestration
+   - Phase 7: Playback Engine & Roundtable
+   - Phase 8: Interactive Widgets & PBL
+   - Phase 9: Media Generation Pipeline
+   - Phase 10: Audio System (TTS/ASR)
+   - Phase 11: Chat System & Streaming
+   - Phase 12: Settings, i18n, Export, Polish
+5. [Key Technical Decisions](#-key-technical-decisions)
+---
+## 🎯 Project Overview
+**OpenMAIC** is an open-source AI interactive classroom platform. Users upload a PDF, the system generates an immersive multi-agent learning experience with:
+- **AI-generated slide presentations** from PDF content
+- **Multi-agent roundtable discussions** (teacher, assistant, student agents)
+- **Real-time TTS/ASR** for voice-driven lectures
+- **Interactive whiteboard** with collaborative drawing
+- **Quiz generation** with auto-grading
+- **Interactive widgets** (simulations, diagrams, code editors, 3D visualizations, games)
+- **Project-Based Learning (PBL)** mode with MCP tool-calling agents
+- **Media generation** (AI images + videos embedded in slides)
+- **PowerPoint export** and classroom zip import/export
+- **5-language i18n** (zh-CN, en-US, ja-JP, ru-RU, ar-SA)
+### Tech Stack
+| Layer | Technology |
+|-------|-----------|
+| Framework | React 19 + Vite 6 |
+| Routing | React Router DOM v7 |
+| State | Zustand 5 (5 stores: stage, canvas, settings, snapshot, keyboard) |
+| Storage | Dexie (IndexedDB) — stages, scenes, audio, images, chat, outlines |
+| UI | shadcn/ui (Radix primitives) + Tailwind CSS v4 + Motion (Framer) |
+| AI SDK | Vercel AI SDK 6 + LangGraph (multi-agent director graph) |
+| AI Providers | OpenAI, Anthropic, Google Gemini, DeepSeek, Qwen, GLM, MiniMax, Ollama, OpenRouter, +10 more |
+| TTS/ASR | OpenAI, Azure, GLM, Qwen, MiniMax, Doubao, ElevenLabs, Browser native, VoxCPM |
+| Image Gen | Seedream, OpenAI Image, Qwen Image, Nano Banana, MiniMax, Grok |
+| Video Gen | Seedance, Kling, Veo, Sora, MiniMax, Grok |
+| Rich Text | ProseMirror (custom schema with marks for bold/italic/underline/color/align/indent/lists) |
+| Charts | ECharts 6 |
+| Diagrams | @xyflow/react (React Flow) |
+| Export | pptxgenjs (PowerPoint), JSZip (classroom archives) |
+| Math | KaTeX + Temml + mathml2omml (for PPTX export) |
+| Code Highlighting | Shiki |
+| PDF Parsing | unpdf + MinerU cloud + custom providers |
+---
+## 🏗 Architecture Blueprint
+### Data Flow
+```
+PDF Upload → Outline Generation (SSE stream) → Scene Content Generation → Action Generation
+                                                        ↓
+                                                  Media Generation (parallel)
+                                                        ↓
+                                              IndexedDB Storage (Dexie)
+                                                        ↓
+                                              Playback Engine (state machine)
+                                                        ↓
+                                    Roundtable UI ←→ Chat System ←→ Multi-Agent LangGraph
+```
+### Store Architecture (Zustand)
+| Store | Purpose | Persistence |
+|-------|---------|-------------|
+| `useStageStore` | Current stage, scenes, outlines, generation status | IndexedDB |
+| `useCanvasStore` | Viewport, zoom, selected elements, editing state | Memory |
+| `useSettingsStore` | All provider configs, model selection, UI prefs | localStorage |
+| `useSnapshotStore` | Undo/redo history | Memory |
+| `useKeyboardStore` | Keyboard shortcut state | Memory |
+| `useMediaGenerationStore` | Image/video generation tasks and status | IndexedDB |
+| `useWhiteboardHistoryStore` | Whiteboard undo/redo per scene | Memory |
+| `useWidgetIframeStore` | Widget iframe communication state | Memory |
+| `useUserProfileStore` | User nickname, avatar, bio | localStorage |
+| `useAgentRegistry` | Agent configs (default + custom + generated) | localStorage |
+### Database Schema (Dexie/IndexedDB)
+```
+stages:        id, name, description, createdAt, updatedAt, languageDirective, style, agentIds
+scenes:        id, stageId, type, title, order, content (JSON), actions (JSON), whiteboard (JSON)
+audioFiles:    id (audioId), blob, duration, format, text, voice
+imageStore:    id (storageId), blob, mimeType
+chatSessions:  id, stageId, sceneId, type, status, messages, config
+outlines:      id, stageId, outlines (JSON array)
+mediaFiles:    id (stageId_elementId), blob, mimeType
+generatedAgents: id (stageId), agents (AgentConfig[])
+```
+### Prompt Template System
+File-based with composition:
+- `lib/prompts/templates/{promptId}/system.md` + `user.md`
+- `lib/prompts/snippets/{name}.md`
+- Syntax: `{{variable}}`, `{{snippet:name}}`, `{{#if flag}}...{{/if}}`
+- 20+ prompt templates for outlines, slides, quizzes, actions, widgets, PBL, agents
+### Action System
+Two categories executed by ActionEngine:
+- **Fire-and-forget**: `spotlight`, `laser`, `play_video`
+- **Synchronous** (wait for completion): `speech`, `wb_open`, `wb_close`, `wb_draw_text`, `wb_draw_shape`, `wb_draw_chart`, `wb_draw_latex`, `wb_draw_table`, `wb_draw_line`, `wb_draw_code`, `wb_edit_code`, `wb_clear`, `wb_delete`, `discussion`
+- **Widget actions**: `widget_highlight`, `widget_setState`, `widget_annotation`, `widget_reveal`
+### Multi-Agent Orchestration (LangGraph)
+```
+START → director ──(end)──→ END
+           │
+           └─(next)→ agent_generate ──→ director (loop)
+```
+- Director: LLM-based for multi-agent, code-only for single-agent
+- Agents: teacher (full slide+whiteboard control), assistant (whiteboard), student (whiteboard, short responses)
+- Per-agent: persona prompt, allowed actions, TTS voice, avatar, color
+### Playback Engine State Machine
+```
+         start()                pause()
+idle ──────────→ playing ──────────→ paused
+  ▲                 ▲                   │
+  │                 │  resume()         │
+  │                 └───────────────────┘
+  │  handleEndDiscussion()
+  │                    confirmDiscussion()
+  └──────────────── live ──────────→ paused
+```
+---
+## 🚀 ONE-SHOT MEGA PROMPT
+> Use this single prompt to generate the entire application in one conversation.
+```
+Me: Build me "OpenMAIC" — an open-source AI interactive classroom platform.
+Do: Create a React 19 + Vite 6 + TypeScript application with these EXACT specifications:
+### CORE SETUP
+- React Router DOM v7 with 4 lazy-loaded routes: / (HomePage), /classroom/:id (ClassroomPage), /generation-preview (GenerationPreviewPage), /eval/whiteboard (WhiteboardEvalPage)
+- App.tsx wraps all routes in: BrowserRouter → ThemeProvider → I18nProvider → ServerProvidersInit → AccessCodeGuard → Toaster
+- Tailwind CSS v4 with PostCSS, shadcn/ui (Radix-based), oklch color system with light/dark mode via CSS variables
+- Path alias @/* → ./src/*
+### STATE MANAGEMENT (Zustand 5)
+Create 10 Zustand stores:
+1. **useStageStore** — stage (name, description, languageDirective, style, agentIds), scenes[], currentSceneId, outlines[], generationStatus, chats[], mode ('autonomous'|'playback'). Actions: loadFromStorage, saveToStorage, addScene, setCurrentSceneId. Debounced IndexedDB persistence.
+2. **useCanvasStore** — viewportSize, canvasScale, selectedElementIds, editingElementId, isDrawing, creatingElement, ctrlOrShiftKeyActive. All canvas interaction state.
+3. **useSettingsStore** — providerId, modelId, providersConfig (unified JSON), thinkingConfigs, ttsProviderId, ttsVoice, ttsSpeed, asrProviderId, imageProviderId, videoProviderId, pdfProviderId, webSearchProviderId, playbackSpeed, sidebarCollapsed, chatAreaWidth, agentMode ('preset'|'auto'), selectedAgentIds. Persisted via zustand/persist to localStorage. Has fetchServerProviders() to merge server-configured providers.
+4. **useSnapshotStore** — history stack for undo/redo
+5. **useKeyboardStore** — keyboard shortcut state
+6. **useMediaGenerationStore** — tasks map (elementId → {status, objectUrl, error}), enqueueTasks, restoreFromDB
+7. **useWhiteboardHistoryStore** — per-scene whiteboard undo/redo
+8. **useWidgetIframeStore** — widget iframe postMessage communication
+9. **useUserProfileStore** — nickname, avatar, bio. Persisted localStorage.
+10. **useAgentRegistry** — agents map (id → AgentConfig), addAgent, updateAgent, deleteAgent. 3 default agents: teacher (AI teacher), assistant (AI助教), student (好奇学生). Each has: id, name, role, persona, avatar, color, allowedActions, priority, voiceConfig, isDefault, isGenerated, boundStageId.
+### DATABASE (Dexie/IndexedDB)
+Database name 'maic-local-db' with tables: stages, scenes, audioFiles, imageStore, chatSessions, outlines, mediaFiles, generatedAgents. Full CRUD operations. Stage storage utilities: listStages, deleteStageData, renameStage, getFirstSlideByStages.
+### AI PROVIDER SYSTEM
+- Unified provider registry (PROVIDERS) with 15+ providers: openai, anthropic, google, deepseek, qwen, kimi, glm, minimax, siliconflow, doubao, hunyuan, xiaomi, grok, openrouter, ollama
+- Each provider: id, name, type ('openai'|'anthropic'|'google'|'openai-compatible'), defaultBaseUrl, requiresApiKey, icon, models[]
+- Each model: id, name, contextWindow, outputWindow, capabilities (streaming, tools, vision, thinking)
+- createLanguageModel() factory using @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google
+- Thinking config system: toggleable, budgetAdjustable, defaultEnabled per model
+- callLLM() and streamLLM() wrappers with thinking support
+### GENERATION PIPELINE (Two-Stage)
+**Stage 1 — Outline Generation:**
+- POST /api/generate/scene-outlines-stream → SSE stream
+- Input: PDF text + images + user requirements + agents + language
+- Output: SceneOutline[] with type ('slide'|'quiz'|'interactive'|'pbl'), title, notes, order, mediaGenerations[]
+- Incremental JSON parsing from LLM stream
+- Generation Preview page with step-by-step visualization (outline streaming → agent profile generation → scene content → navigate to classroom)
+**Stage 2 — Scene Generation (per outline):**
+- generateSceneContent() → POST /api/generate/scene-content — generates slides/quiz/interactive/pbl content
+- generateSceneActions() → POST /api/generate/scene-actions — generates teacher speech + visual actions for each scene
+- createSceneWithActions() — assembles Scene object with elements, actions, whiteboard data
+- Interactive post-processor: sanitizes HTML widgets, injects CSS isolation
+- Action parser: extracts structured [{type, name, params}, {type: "text", content}] from LLM output
+### SLIDE TYPE SYSTEM
+Scene types with specific content structures:
+- **slide**: PPTElement[] (text, image, shape, line, chart, table, latex, video, audio, code elements). Each element has: id, type, left, top, width, height, rotate, opacity, shadow, outline, fill, link, groupId, lock, name.
+- **quiz**: QuizQuestion[] with type (single-choice, multiple-choice, fill-in-blank, true-false, short-answer), question text, options, correctAnswer, explanation
+- **interactive**: WidgetConfig with type (simulation, diagram, code, game, visualization3d), HTML/iframe content, teacherActions[]
+- **pbl**: PBLProjectConfig with roles, issues, workspaces
+### SLIDE RENDERER
+Full PowerPoint-compatible renderer in React:
+- Editor/Canvas with viewport scaling, drag/drop, resize handles, rotation handles, alignment lines, grid, ruler
+- Element types: TextElement (ProseMirror), ImageElement (clip masks, filters), ShapeElement (SVG paths, gradients, patterns), LineElement (cubic bezier, markers), ChartElement (ECharts), TableElement, LatexElement (KaTeX), VideoElement, CodeElement (Shiki)
+- ThumbnailSlide for sidebar previews
+- ScreenElement for presentation mode
+- useScaleElement, useDragElement, useRotateElement, useSelectElement hooks
+### MULTI-AGENT ORCHESTRATION (LangGraph)
+- StateGraph with OrchestratorState annotation
+- Director node: multi-agent LLM decision or single-agent code-only
+- agent_generate node: builds structured prompt per agent, streams response with tool calls
+- statelessGenerate(): single-pass generation from messages + storeState
+- Prompt builder: role guidelines per agent type, whiteboard ledger context, peer context, state context
+- SSE streaming: StatelessEvent chunks with {type: 'text'|'tool_call'|'done'|'error', agentId, content}
+### PLAYBACK ENGINE
+Class-based PlaybackEngine with state machine (idle → playing → paused, idle → live → paused):
+- Consumes Scene.actions[] sequentially via ActionEngine
+- ActionEngine: processes spotlight (highlight element), laser (pointer effect), speech (TTS), whiteboard actions, discussion triggers
+- Speech TTS: fetches audio from /api/generate/tts, plays via AudioPlayer, shows speech overlay
+- Discussion triggers: pause playback, switch to live mode, enable chat input
+- Auto-resume generation for pending outlines on classroom load
+- Speed control: 1x, 1.25x, 1.5x, 2x
+### ROUNDTABLE UI (95KB component)
+Main classroom interaction panel with:
+- Voice waveform animation during speech
+- Agent avatars with speaking indicator
+- Chat input with voice recording (ASR)
+- Proactive discussion cards
+- Slide navigation controls
+- Presentation mode (fullscreen)
+- Whiteboard toggle
+- Playback progress bar
+- Speed selector
+- Thinking state indicator
+### CHAT SYSTEM
+- ChatSession: id, type (qa|discussion|lecture), status, messages (UIMessage[]), config (agentIds, maxTurns, triggerAgentId)
+- useChatSessions hook: manages sessions, sends messages via POST /api/chat SSE, handles interruption, persists to IndexedDB
+- StreamBuffer: buffers SSE text chunks, reveals text word-by-word for natural speech feel
+- Message metadata: senderName, senderAvatar, agentId, agentColor, actions (spotlight/highlight/insert)
+- Chat area with session list, message bubbles, inline action tags, lecture notes view
+### WHITEBOARD SYSTEM
+- Collaborative whiteboard overlay on slides
+- Elements: text, shapes (rect/circle/triangle), charts, LaTeX, tables, lines, code blocks
+- ActionEngine executes wb_draw_*, wb_delete, wb_clear, wb_open, wb_close
+- WhiteboardCanvas with drawing tools
+- WhiteboardHistory for undo/redo
+- Whiteboard conflicts summarizer for multi-agent coordination
+### INTERACTIVE WIDGETS
+5 widget types, each generates self-contained HTML rendered in sandboxed iframe:
+1. **Simulation**: variable sliders, physics/math simulations, presets
+2. **Diagram**: React Flow nodes/edges, decision trees, flowcharts
+3. **Code**: executable code editor with output panel
+4. **Game**: educational games (quizzes, puzzles)
+5. **Visualization3D**: Three.js/WebGL 3D models
+- Widget teacher actions: highlight, setState, annotation, reveal
+- postMessage bridge for iframe ↔ parent communication
+### PBL (Project-Based Learning)
+- Agentic loop using Vercel AI SDK generateText + stopWhen
+- MCP tools: ModeMCP, ProjectMCP, AgentMCP, IssueboardMCP
+- Generates: project config, roles, issues, workspaces
+- PBL renderer with: role selection, chat panel, issue board, workspace, guide
+### MEDIA GENERATION
+- Media orchestrator dispatches parallel API calls
+- Image providers: Seedream (ByteDance), OpenAI Image, Qwen Image, Nano Banana, MiniMax, Grok
+- Video providers: Seedance (ByteDance), Kling (Kuaishou), Veo (Google), MiniMax, Grok
+- Async task pattern: submit → poll → download blob → IndexedDB
+- MediaGenerationStore tracks task status per elementId
+### AUDIO SYSTEM
+- TTS providers: OpenAI, Azure, GLM, Qwen, MiniMax, Doubao, ElevenLabs, VoxCPM, Browser native
+- ASR providers: OpenAI Whisper, Qwen ASR, Browser native
+- Voice resolver: maps agent voices across providers
+- AudioPlayer: Web Audio API playback with speed control
+- useAudioRecorder: MediaRecorder API for voice input
+- useBrowserTTS/useDiscussionTTS: manages TTS lifecycle during discussions
+### EXPORT SYSTEM
+- PowerPoint export via pptxgenjs: converts PPTElement[] to PPTX with shapes, images, charts, tables, LaTeX→OMML
+- Classroom ZIP export/import: stages + scenes + audio + images + agents as portable archive
+- HTML parser for slide text → PPTX rich text conversion
+- SVG path parser for shape export
+- LaTeX → OMML converter via mathml2omml
+### I18N SYSTEM
+- i18next + react-i18next + resources-to-backend
+- 5 locales: zh-CN, en-US, ja-JP, ru-RU, ar-SA
+- Dynamic import: `import(\`./locales/${language}.json\`)`
+- useI18n hook with locale detection from localStorage/navigator
+- Language switcher component
+### API ROUTES (27 endpoints)
+All under /api/:
+- /chat — SSE chat stream (multi-agent)
+- /generate/scene-outlines-stream — SSE outline generation
+- /generate/scene-content — scene content generation
+- /generate/scene-actions — scene action generation
+- /generate/agent-profiles — agent profile generation
+- /generate/image — image generation
+- /generate/video — video generation
+- /generate/tts — single TTS audio generation
+- /parse-pdf — PDF parsing
+- /classroom — CRUD for server-stored classrooms
+- /classroom-media/[classroomId]/[...path] — media file serving
+- /generate-classroom — background classroom generation job
+- /generate-classroom/[jobId] — job status polling
+- /quiz-grade — LLM-based quiz grading
+- /pbl/chat — PBL runtime chat
+- /web-search — Tavily web search
+- /proxy-media — CORS proxy for remote media
+- /server-providers — server-configured provider list
+- /verify-model, /verify-image-provider, /verify-video-provider, /verify-pdf-provider — credential verification
+- /azure-voices — Azure TTS voice list
+- /transcription — audio transcription
+- /access-code/status, /access-code/verify — access code authentication
+- /health — health check
+### SECURITY
+- Access code guard (HMAC-signed cookie)
+- SSRF guard for server-side URL fetching
+- Content Security Policy headers
+- Input validation on all API routes
+### TESTING
+- Vitest for unit tests (29 test files)
+- Playwright for E2E tests (4 test suites)
+- Evaluation framework for whiteboard layout scoring and outline language detection
+Build the complete application with all 950+ source files, full type safety, and production-ready error handling.
+```
+---
+## 🔄 MULTI-SHOT PHASED PROMPTS
+### Phase 1: Foundation & Scaffold
+```
+Me: Start building "OpenMAIC" — an AI interactive classroom. Set up the project foundation.
+Do:
+1. Initialize React 19 + Vite 6 + TypeScript project
+2. Configure: Tailwind CSS v4 with PostCSS, path alias @/* → ./src/*, oklch color system
+3. Install core deps: react-router-dom, zustand, dexie, lucide-react, motion, sonner, clsx, tailwind-merge, class-variance-authority, nanoid, zod
+4. Install shadcn/ui components: button, dialog, dropdown-menu, popover, tooltip, tabs, input, textarea, select, checkbox, switch, slider, scroll-area, command, alert-dialog, card, badge, carousel, separator, progress, label, hover-card, context-menu, collapsible, avatar, alert, field, input-group, button-group, combobox
+5. Create App.tsx with BrowserRouter wrapping: ThemeProvider → I18nProvider → AccessCodeGuard → lazy routes (/, /classroom/:id, /generation-preview, /eval/whiteboard) → Toaster
+6. Create ThemeProvider (light/dark/system, localStorage persist, document.documentElement.classList toggle)
+7. Create I18nProvider with i18next + react-i18next + resources-to-backend, 5 locales (zh-CN, en-US, ja-JP, ru-RU, ar-SA), dynamic JSON imports, localStorage locale persistence
+8. Create globals.css with full oklch color system (:root + .dark), CSS custom properties for all shadcn tokens, Tailwind @theme inline block, ProseMirror styles, animation keyframes (wave, shimmer, breathing-bar, interactive-mode-breathe)
+9. Create createLogger utility (timestamp + level + tag formatting)
+10. Verify: `vite build` succeeds with 0 errors
+```
+### Phase 2: Data Layer & State Management
+```
+Me: Build the data layer and state management for OpenMAIC.
+Do:
+1. Create Dexie database 'maic-local-db' with tables: stages (id, name, description, createdAt, updatedAt, languageDirective, style, currentSceneId, agentIds, interactiveMode), scenes (id, stageId, type, title, order, content, actions, whiteboard), audioFiles (id, blob, duration, format, text, voice), imageStore (id, blob, mimeType), chatSessions (id, stageId, sceneId, type, status, messages, config), outlines (id, stageId, outlines), mediaFiles (id, blob, mimeType), generatedAgents (id, agents)
+2. Create stage-storage utilities: listStages() → StageListItem[], deleteStageData(), renameStage(), getFirstSlideByStages() → Record<string, Slide>
+3. Create image-storage utilities: storePdfBlob(), loadPdfBlob(), storeImages(), loadImageMapping(), cleanupOldImages()
+4. Create useStageStore (Zustand): stage, scenes[], currentSceneId, outlines[], chats[], mode, generationStatus, generationEpoch, failedOutlines[], toolbarState. Actions: setStage, addScene, updateScene, deleteScene, setCurrentSceneId, loadFromStorage (IndexedDB → state), saveToStorage (debounced state → IndexedDB), getCurrentScene()
+5. Create useCanvasStore: viewportSize, canvasScale, selectedElementIds[], editingElementId, isDrawing, creatingElement, ctrlOrShiftKeyActive, showGridLines, showRuler, snapToGrid
+6. Create useSettingsStore with zustand/persist: providerId, modelId, thinkingConfigs, providersConfig, ttsProviderId/Voice/Speed, asrProviderId/Language, imageProviderId, videoProviderId, pdfProviderId, webSearchProviderId, playbackSpeed, sidebarCollapsed, chatAreaWidth, chatAreaCollapsed, agentMode, selectedAgentIds, fetchServerProviders(), all setters. Validate provider/model on rehydration.
+7. Create useSnapshotStore, useKeyboardStore, useMediaGenerationStore, useWhiteboardHistoryStore, useWidgetIframeStore, useUserProfileStore
+8. Create useAgentRegistry with zustand/persist: agents map, 3 default agents (teacher: "AI teacher" with full slide+whiteboard actions priority 10, assistant: "AI助教" with whiteboard-only priority 5, student: "好奇学生" with whiteboard-only priority 3). Each agent: id, name, role, persona (detailed teaching style), avatar, color, allowedActions, priority, voiceConfig, isDefault
+9. Define all TypeScript types in lib/types/: slides.ts (PPTElement union with 10 element types, Slide, SlideTheme, SlideBackground), action.ts (20+ action types), stage.ts (Stage, Scene, SceneType, StageMode), chat.ts (ChatSession, StatelessChatRequest/Event), generation.ts (SceneOutline, UserRequirements, PdfImage), provider.ts (ProviderId, ProviderConfig, ModelInfo, ThinkingConfig), widgets.ts (5 widget configs), settings.ts, roundtable.ts, web-search.ts, pdf.ts, edit.ts, export.ts
+```
+### Phase 3: AI Provider System
+```
+Me: Build the AI provider system with 15+ LLM providers.
+Do:
+1. Create PROVIDERS registry with full configs for: openai (gpt-4o, gpt-5.5, o3-mini, o4-mini), anthropic (claude-4-sonnet, claude-3.7-sonnet), google (gemini-2.5-pro, gemini-2.5-flash), deepseek (deepseek-chat, deepseek-reasoner), qwen (qwen3-235b, qwen-max, qwen-plus), kimi (moonshot-v1-auto), glm (glm-4-plus, glm-z1-air), minimax (MiniMax-M1), siliconflow (meta-llama, Qwen, DeepSeek), doubao (doubao-pro, doubao-1.5-pro), hunyuan, xiaomi (MiMo-7B), grok (grok-3), openrouter (pass-through), ollama (local models)
+2. Each provider: type ('openai'|'anthropic'|'google'|'openai-compatible'), defaultBaseUrl, requiresApiKey, icon path, models[] with contextWindow, outputWindow, capabilities (streaming, tools, vision, thinking config)
+3. createLanguageModel(config) factory: routes to @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google based on provider type. OpenAI-compatible providers use createOpenAI with custom baseURL.
+4. Thinking config system: ThinkingConfig = {mode: 'disabled'|'auto'|'manual', enabled, budget?}. Per-model thinking capability (toggleable, budgetAdjustable, defaultEnabled). getThinkingMode() and pickThinkingBudget() utilities.
+5. callLLM(model, options) — single-shot generation with thinking support. streamLLM(model, options) — streaming with thinking.
+6. Model metadata: applyModelMetadata() enriches model configs with catalog data. getCatalogThinkingCapability() returns thinking support level.
+7. Server-side resolveModel() — resolves model string + API key + base URL into LanguageModel instance, handles server-configured providers from env vars.
+```
+### Phase 4: Generation Pipeline
+```
+Me: Build the two-stage generation pipeline for creating classroom content from PDF.
+Do:
+1. Create prompt template system: lib/prompts/ with loader.ts (loadPrompt, buildPrompt, interpolateVariables, processSnippets, processConditionalBlocks), types.ts (PromptId, SnippetId). Templates in templates/{promptId}/system.md + user.md. Snippets in snippets/*.md. Syntax: {{variable}}, {{snippet:name}}, {{#if flag}}...{{/if}}.
+2. Create 20+ prompt templates: requirements-to-outlines, interactive-outlines, slide-content, quiz-content, slide-actions, quiz-actions, interactive-actions, simulation-content, diagram-content, code-content, game-content, visualization3d-content, widget-teacher-actions, pbl-actions, pbl-design, agent-system (4 variants: base, wb-teacher, wb-assistant, wb-student), director, web-search-query-rewrite
+3. Stage 1 — Outline Generator: generateSceneOutlinesFromRequirements() builds prompt from PDF content + user requirements + agent info + language directive. SSE API endpoint streams outlines as incremental JSON objects. Frontend GenerationPreview page shows step visualization.
+4. Stage 2 — Scene Generator: generateSceneContent(outline, context, model) dispatches to slide/quiz/interactive/PBL generators based on outline.type. generateSceneActions(content, outline, context, model) generates teacher speech + visual action sequences. createSceneWithActions() assembles final Scene.
+5. Scene builder: buildSceneFromOutline() converts generated content to PPTElement[]. uniquifyMediaElementIds() ensures globally unique IDs for media placeholders.
+6. Interactive post-processor: sanitizes HTML, injects CSS isolation, wraps in responsive container.
+7. Action parser: parseActionsFromStructuredOutput() extracts [{type:"action", name, params}, {type:"text", content}] from LLM JSON output.
+8. JSON repair: parseJsonResponse() handles malformed LLM JSON with bracket balancing, markdown fence stripping, partial parse recovery.
+9. Pipeline runner: createGenerationSession() + runGenerationPipeline() orchestrates the full flow with callbacks.
+10. API routes: /api/generate/scene-outlines-stream (SSE), /api/generate/scene-content (POST), /api/generate/scene-actions (POST), /api/generate/agent-profiles (POST)
+```
+### Phase 5: Slide Renderer & Canvas
+```
+Me: Build the full slide renderer with PowerPoint-compatible elements and interactive canvas.
+Do:
+1. Create Editor/Canvas with: viewport scaling (useViewportSize), drag-to-select (useMouseSelection), element selection (useSelectElement), element dragging (useDragElement), element scaling (useScaleElement with 8 resize handles), element rotation (useRotateElement), alignment lines (AlignmentLine), grid lines (GridLines), ruler (Ruler), drop support (useDrop)
+2. Create 10 element renderers:
+   - TextElement: ProseMirror editor with custom schema (paragraph, heading, bulletList, orderedList, hardBreak, marks: bold, italic, underline, strikethrough, color, backgroundColor, fontSize, fontFamily, textAlign, textIndent, lineHeight, superscript, subscript, link)
+   - ImageElement: clip paths (rect, ellipse, polygon), filters, flip, shadow, outline
+   - ShapeElement: 20+ SVG path formulas (roundRect, triangle, parallelogram, trapezoid, etc), gradient fills (linear/radial), pattern fills
+   - LineElement: cubic bezier curves, arrow markers, point dragging (useDragLineElement)
+   - ChartElement: ECharts integration (bar, line, pie, scatter, radar, area)
+   - TableElement: cell editing, merge, border styling
+   - LatexElement: KaTeX rendering with Temml fallback
+   - VideoElement: HTML5 video with poster, autoplay control
+   - CodeElement: Shiki syntax highlighting with 50+ language grammars
+   - AudioElement: audio player UI
+3. Create operate overlays: CommonElementOperate, ImageElementOperate, ShapeElementOperate (keypoint drag for path shapes), LineElementOperate (endpoint drag), TableElementOperate, TextElementOperate, MultiSelectOperate
+4. Create ThumbnailSlide for sidebar scene list (scaled-down readonly render)
+5. Create ThumbnailInteractive for interactive widget previews
+6. Create ScreenElement and ScreenCanvas for presentation mode
+7. Create ViewportBackground (slide background with solid/gradient/image fills)
+8. Create element hooks: useElementFill, useElementFlip, useElementOutline, useElementShadow
+9. Create canvas operations hook: useCanvasOperations with element CRUD, alignment, distribution, z-order, grouping
+```
+### Phase 6: Multi-Agent Orchestration
+```
+Me: Build the LangGraph-based multi-agent orchestration system.
+Do:
+1. Create OrchestratorState (LangGraph Annotation.Root): messages, storeState, availableAgentIds, maxTurns, languageModel, thinkingConfig, discussionContext, triggerAgentId, userProfile, agentConfigOverrides, turnSummaries[], whiteboardActions[], nextAgentId, isComplete, generatedChunks
+2. Create director node: LLM-based multi-agent decision (who speaks next, what to do). Code fast-paths for turn 0 (trigger agent) and turn limits. Single-agent mode: pure code logic, no LLM call.
+3. Create agent_generate node: resolves AgentConfig, builds structured prompt via buildStructuredPrompt(), streams LLM response, parses structured chunks [{type, name/content}], emits StatelessEvent via config.writer()
+4. Create StateGraph: START → director → (end→END | next→agent_generate→director loop)
+5. Create buildStructuredPrompt(): combines role guidelines, persona, state context (current slide elements), whiteboard ledger (spatial layout of all whiteboard elements), peer context (other agents' recent actions), available action descriptions, format examples
+6. Create summarizers: conversation-summary (compress old messages), message-converter (UIMessage → OpenAI format), state-context (current slide description), whiteboard-ledger (virtual whiteboard spatial state), whiteboard-conflicts (detect conflicting draws), peer-context (recent agent actions)
+7. Create director-prompt: buildDirectorPrompt() with agent profiles, conversation history, available tools. parseDirectorDecision() extracts {nextAgentId, reason, isComplete}
+8. Create tool-schemas: getEffectiveActions(role) returns allowed action schemas. getActionDescriptions() generates human-readable action docs.
+9. Create AISdkLangGraphAdapter: bridges Vercel AI SDK LanguageModel to LangGraph's BaseChatModel interface
+10. Create statelessGenerate(): entry point called by /api/chat, invokes graph.stream(), yields StatelessEvent SSE chunks
+```
+### Phase 7: Playback Engine & Roundtable
+```
+Me: Build the PlaybackEngine state machine and Roundtable UI.
+Do:
+1. Create PlaybackEngine class: state machine (idle/playing/paused/live), consumes Scene.actions[] via ActionEngine, manages scene transitions, handles discussion triggers, speed control (1x/1.25x/1.5x/2x)
+2. Create ActionEngine: processes action queue — spotlight (dim other elements, highlight target), laser (red pointer effect), speech (fetch TTS → AudioPlayer → wait for completion), wb_open/close (toggle whiteboard overlay), wb_draw_* (add elements to whiteboard), wb_delete/clear, discussion (pause playback, switch to live mode), play_video, widget actions
+3. Create AudioPlayer: Web Audio API wrapper with play/pause/stop, speed adjustment, volume control, onEnd callback
+4. Create PlaybackEngine callbacks: onModeChange, onSceneChange, onActionStart/End, onSpeechStart/End, onDiscussionTrigger, onComplete
+5. Create computePlaybackView() — derives presentation state from engine: currentSpeech, speakingAgentId, audioState, progress
+6. Create Stage component (main container): integrates SceneSidebar + CanvasArea + Roundtable + ChatArea. Manages PlaybackEngine lifecycle, discussion flow (trigger → live chat → end → resume), TTS during discussions via useDiscussionTTS
+7. Create Roundtable component: voice waveform bars, agent avatar ring with speaking indicator, chat input with send button + voice recording, proactive discussion cards, slide navigation (prev/next), playback controls (play/pause/speed), presentation mode toggle, whiteboard toggle, thinking state display, end flash animation
+8. Create PresentationSpeechOverlay: full-screen speech text display during presentation mode
+9. Create SceneSidebar: scene thumbnail list with drag-to-reorder, generation progress indicators, failed outline retry, home navigation
+10. Create Header: back button, settings gear, theme switcher, language switcher, export dropdown (PPTX, classroom ZIP)
+```
+### Phase 8: Interactive Widgets & PBL
+```
+Me: Build the interactive widget system and PBL mode.
+Do:
+1. Create 5 widget content generators (each calls LLM with specialized prompts):
+   - simulation-content: generates HTML with variable sliders, canvas/SVG visualization, physics formulas
+   - diagram-content: generates React Flow JSON (nodes, edges, layout)
+   - code-content: generates executable code with output panel, language selector
+   - game-content: generates HTML5 game with scoring, levels, educational goals
+   - visualization3d-content: generates Three.js scene with camera controls, annotations
+2. Create InteractiveRenderer: sandboxed iframe loading widget HTML, postMessage bridge for teacher actions (highlight, setState, annotation, reveal)
+3. Create widget teacher action generation: widget-teacher-actions prompt generates action sequence for teacher to guide students through widget
+4. Create useWidgetIframeStore: register/unregister iframes, send setState/highlight/annotation/reveal messages
+5. Create PBL generation system:
+   - generatePBLContent() using Vercel AI SDK generateText with tools and stepCountIs stopWhen
+   - MCP tools: ModeMCP (set PBL mode), ProjectMCP (set project config), AgentMCP (create agent roles), IssueboardMCP (create issues with acceptance criteria)
+   - buildPBLSystemPrompt() with project topic, skills, language directive
+6. Create PBL renderer components:
+   - PBLRenderer: main container with role selection → workspace
+   - RoleSelection: choose student role from generated options
+   - ChatPanel: per-role chat with @mention routing to agents
+   - IssueboardPanel: kanban-style issue tracking
+   - Workspace: collaborative workspace area
+   - Guide: step-by-step project guide
+7. Create /api/pbl/chat endpoint: handles @mention routing, generates agent responses per role
+```
+### Phase 9: Media Generation Pipeline
+```
+Me: Build the media generation pipeline for AI images and videos.
+Do:
+1. Create MediaGenerationStore: tasks Map<elementId, {status, objectUrl, blob, error}>, enqueueTasks(), completeTask(), failTask(), restoreFromDB(), revokeObjectUrls()
+2. Create media orchestrator: generateMediaForOutlines() collects all media requests from outlines[].mediaGenerations, filters by enabled providers, processes serially (API concurrency limits)
+3. Create image provider adapters:
+   - Seedream (ByteDance): POST to ark.cn-beijing.volces.com with HMAC auth
+   - OpenAI Image: POST to /v1/images/generations
+   - Qwen Image: POST to dashscope with async task pattern
+   - Nano Banana: POST with banana.dev API
+   - MiniMax Image: POST to api.minimax.chat
+   - Grok Image: POST to api.x.ai
+4. Create video provider adapters (all async task pattern: submit → poll → download):
+   - Seedance (ByteDance): HMAC-signed requests, JWT token for kling
+   - Kling (Kuaishou): JWT auth, task polling
+   - Veo (Google DeepMind): OAuth, long-running operations
+   - MiniMax Video, Grok Video
+5. Each adapter: generate(config, options) → {url, blob}, testConnectivity(config) → boolean
+6. Create /api/generate/image and /api/generate/video endpoints
+7. Create /api/proxy-media endpoint for CORS proxy of remote media URLs
+8. Create /api/verify-image-provider and /api/verify-video-provider for credential testing
+9. Create MediaPopover UI: shows generation progress per media element, retry failed, preview generated media
+```
+### Phase 10: Audio System (TTS/ASR)
+```
+Me: Build the TTS and ASR audio system with 8+ providers.
+Do:
+1. Create TTS provider registry with configs: openai-tts (alloy/echo/fable/onyx/nova/shimmer), azure-tts (500+ voices from azure.json), glm-tts, qwen-tts (sambert voices), minimax-tts (3 models), doubao-tts, elevenlabs-tts, voxcpm (custom voice cloning), browser-tts (Web Speech API)
+2. Each TTS provider: id, name, requiresApiKey, defaultBaseUrl, icon, voices[], supportedFormats, speedRange
+3. Create generateTTS(config, text) router: dispatches to provider-specific functions, returns {audio: Uint8Array, format: string}
+4. Create ASR provider registry: openai-whisper, qwen-asr, browser-asr
+5. Create transcribeAudio(config, audioBlob) router
+6. Create voice resolver: getAvailableProvidersWithVoices(), maps agent voiceConfig to provider+voice
+7. Create VoxCPM integration: custom voice profiles, VLLM model support, voice cloning
+8. Create /api/generate/tts endpoint (single TTS generation)
+9. Create /api/transcription endpoint
+10. Create /api/azure-voices endpoint (Azure voice list)
+11. Create useAudioRecorder hook: MediaRecorder API, audio visualization, silence detection
+12. Create useBrowserTTS hook: Web Speech API fallback
+13. Create useDiscussionTTS hook: manages TTS lifecycle during live discussions, queues speech, handles interruption
+14. Create useTTSPreview hook: preview voice in settings
+15. Create SpeechButton component: toggle voice recording with waveform
+16. Create TTSConfigPopover: voice selector, speed slider, provider selector
+```
+### Phase 11: Chat System & Streaming
+```
+Me: Build the chat system with SSE streaming and session management.
+Do:
+1. Create StatelessChatRequest type: messages (UIMessage[]), storeState ({stage, scenes, currentSceneId, mode}), config ({agentIds, sessionType, maxTurns, triggerAgentId}), model, apiKey, baseUrl, providerType, userProfile, agentConfigs
+2. Create StatelessEvent type: {type: 'text'|'tool_call'|'error'|'done', agentId?, content?, toolName?, args?}
+3. Create /api/chat POST endpoint: validates request, resolves model, invokes statelessGenerate(), streams SSE events via ReadableStream + TextEncoder
+4. Create useChatSessions hook (53KB): manages multiple ChatSession instances per scene, sendMessage() → fetch SSE → parse events → update messages, handleInterrupt() → abort controller, auto-create QA session on first user message, create discussion session from proactive card, persist sessions to IndexedDB, restore on load
+5. Create StreamBuffer: accumulates text chunks from SSE, reveals words incrementally for natural TTS sync, tracks reveal progress (0-1) for auto-scroll
+6. Create ChatArea component: session tab list, message list with agent avatars + colors, inline action tags (spotlight/highlight buttons in messages), lecture notes view (extracted from speech actions), typing indicator
+7. Create ChatSession component: handles individual session rendering, message input, send/interrupt buttons
+8. Create ProactiveCard component: discussion invitation cards with topic, prompt, accept/skip buttons, animation
+9. Create InlineActionTag component: clickable action buttons within messages (triggers spotlight/insert on slide)
+10. Create LectureNotesView: extracts and displays all speech text from actions as structured notes
+```
+### Phase 12: Settings, i18n, Export, Polish
+```
+Me: Build settings, internationalization, export system, and polish everything.
+Do:
+1. Create SettingsDialog with tabbed sections: General (theme, language, access code), Model (provider selector, model selector with search, API key input, base URL, thinking config toggle), Audio (TTS provider/voice/speed, ASR provider, per-agent voice assignment), Image (provider, model, API key, aspect ratio), Video (provider, model, API key), PDF (provider, API key), Web Search (provider, API key), Agent (agent list with add/edit/delete, persona editor, action permissions, priority)
+2. Create ModelSelector: searchable dropdown with provider grouping, model capabilities badges (vision, tools, thinking), context window display
+3. Create AddProviderDialog: custom provider registration with name, base URL, API key, models
+4. Create ProviderConfigPanel: provider-specific settings form
+5. Create i18n translation files for all 5 locales (1500+ translation keys each): home, classroom, settings, generation, chat, quiz, whiteboard, export, agents, audio, errors, common
+6. Create LanguageSwitcher component: dropdown with locale labels + short codes
+7. Create PowerPoint export: useExportPPTX hook converts scenes to PPTX via pptxgenjs. Handles: text with rich formatting, images with clip paths, shapes with SVG paths, charts (ECharts → static image), tables, LaTeX (KaTeX → MathML → OMML), videos (poster image), code blocks (syntax-highlighted HTML)
+8. Create classroom ZIP export/import: useExportClassroom hook creates ZIP with manifest.json + scenes + audio + images + agents. useImportClassroom hook parses ZIP and restores to IndexedDB.
+9. Create HTML parser for PPTX: lexer → parser → format → stringify pipeline for converting ProseMirror HTML to PPTX rich text runs
+10. Create LaTeX to OMML converter chain: KaTeX → MathML → OMML (via mathml2omml package) for PowerPoint math equations
+11. Create AccessCodeGuard + AccessCodeModal: HMAC-signed token verification, cookie persistence
+12. Create ServerProvidersInit: fetches /api/server-providers on mount, merges into settings store
+13. Create UserProfile component: expandable pill with avatar picker (12 built-in + custom upload), nickname editor, bio textarea
+14. Create GeneratingProgress component: step-by-step progress during classroom generation
+15. Create OutlinesEditor: edit generated outlines before scene generation (reorder, delete, rename)
+16. Add all CSS animations, transitions, hover states, dark mode variants
+17. Verify: full build succeeds, all routes render, settings persist, i18n switches correctly
+```
+---
+## 🔧 Key Technical Decisions
+| Decision | Rationale |
+|----------|-----------|
+| **Zustand over Redux** | Simpler API, better TypeScript support, no boilerplate, built-in persist middleware |
+| **Dexie over raw IndexedDB** | Type-safe queries, promise-based API, versioned migrations, compound indexes |
+| **Vercel AI SDK** | Unified streaming interface across 15+ providers, built-in tool calling, thinking support |
+| **LangGraph for orchestration** | Stateful graph execution, conditional routing, streaming writer API, battle-tested |
+| **ProseMirror over Slate/TipTap** | Lower-level control needed for PowerPoint-compatible rich text, custom schema/marks |
+| **Vite over Next.js (conversion)** | Client-only SPA, no SSR needed, faster builds, simpler deployment |
+| **File-based prompts** | Version-controllable, composable via snippets, conditional blocks for feature flags |
+| **ActionEngine pattern** | Unified sync/async action execution, same types for live streaming and playback |
+| **IndexedDB for everything** | Offline-first, large blob storage (audio/images), no server dependency for user data |
+| **iframe sandbox for widgets** | Security isolation for LLM-generated HTML/JS, postMessage for controlled communication |