# Surrogate-1 Feature Roadmap **Updated**: 2026-04-28 **Status legend**: โœ… shipped โ”‚ ๐Ÿšง in progress โ”‚ โณ planned โ”‚ ๐Ÿ’ก idea --- ## ๐ŸŸข Already Shipped (Foundation) ### Pipeline (parallel orchestrate) - โœ… 6-stage chain: SA โ†’ [Architect โˆฅ QA-TDD] โ†’ DEV โ†’ [QA-Verify โˆฅ OPS] โ†’ Reviewer - โœ… Direct LLM call (skip broken tool-loop) - โœ… Marker-extraction โ†’ real code blocks โ†’ real files in cwd - โœ… Auto-commit + git push on APPROVE - โœ… 12-rung LLM ladder (Cerebras / Groq / Gemini ร— 2 / Samba / GH Models / Chutes / OR ร— 2 / **HF Router ร— 4**) ### Data + Knowledge - โœ… 26 public datasets covering all SDLC domains - โœ… Training-pair feedback loop (every stage โ†’ ~/.surrogate/training-pairs.jsonl โ†’ HF dataset every 3 min) - โœ… Web research preamble (DDG search โ†’ context for PRD/orchestrate) - โœ… Agentic crawler (URL frontier + visited stamps + BFS link discovery, 6 workers) - โœ… Skill synthesis daemon (3-min cycles โ†’ ~/.surrogate/skills/{cat}/SKILL.md) - โœ… Continuous scrape (8 workers, 5-30s cool-down) ### Models (Ollama on HF) - โœ… qwen3-coder:30b-a3b (primary, 16GB MoE) - โœ… devstral:24b (Mistral SWE-agent, 53.6% SWE-bench) - โœ… qwen2.5-coder:14b (fallback) - โœ… yi-coder:9b (128k context) - โœ… nomic-embed-text (RAG embeddings) ### Agent Roster (19 SDLC experts) - โœ… solution-architect, tech-architect (design) - โœ… dev-frontend, dev-backend, dev-mobile, dev-fullstack, dev-database (impl) - โœ… qa-engineer, qa-perf, qa-security (test) - โœ… devops, sre, cloud-architect (infra) - โœ… devsecops, cloud-security (security) - โœ… data-engineer, ml-engineer (data/ML) - โœ… tech-writer, reviewer (docs/gate) ### Infrastructure - โœ… HF Space (CPU 16GB free) running 24/7 - โœ… /data persistent volume (state + logs + memory + skills + sessions + training-pairs) - โœ… Backward-compat symlinks (~/.claude/* โ†’ ~/.surrogate/*) - โœ… Mac CLI clean (20 essential files only, 118 daemons archived) - โœ… Status server: /, /health, /logs/{name}, /logs-list --- ## ๐Ÿ”ด Must-Have (next 30 days) ### Reliability + Observability 1. โณ Heartbeat alarm โ†’ Discord webhook if HF Space down >5 min 2. โณ Auto-retry on transient errors (provider 429/503 โ†’ wait + retry next rung) 3. โณ Cost meter per stage (tokens ร— $/1M, alert >$1/day) 4. โณ Regression test suite (run nightly: orchestrate test fixtures, expect APPROVE) 5. โณ Dataset upload deduplication (md5 of slice โ†’ skip if same as last) 6. โณ Token-pool health check (rotate to next when 429) 7. โณ Disk usage alert (>80% /data โ†’ cleanup oldest scrape state) 8. โณ Memory leak watchdog (kill daemon RSS >1.5GB, restart) 9. โณ Crash recovery (auto-resume cron loop on SIGCHLD) 10. โณ Snapshot scrape ledger to HF dataset weekly ### PRD + Project bootstrap 11. โณ Claude Projects-style PRD wizard (single description input โ†’ auto-extract โ†’ 1-3 follow-ups โ†’ PRD) 12. โณ PRD template library (web app / API / CLI / mobile / data pipeline / ML) 13. โณ Auto-detect existing repo โ†’ reverse-engineer surrogate.md 14. โณ PRD versioning (v1, v2 with diff) 15. โณ "Spec mode" โ€” refine PRD interactively before any code ### Pipeline quality 16. โณ Self-critique loop (after dev: model A reviews model B output โ†’ re-dev if NEEDS-WORK) 17. โณ Regression test on touched files (re-run existing tests) 18. โณ Lint + type-check + security scan in pipeline (ruff, mypy, semgrep) 19. โณ Diff approval UI (show changes before commit, esp. yolo mode) 20. โณ Search-replace block edits (Aider-style, less risky than full rewrite) ### Domain expert routing 21. โณ Auto-route DEV stage to specialist (frontend/backend/mobile/iac) based on task keywords 22. โณ Multi-specialist parallel work (e.g., backend API + frontend UI in same task โ†’ spawn both) 23. โณ Specialist-specific eval (frontend agent โ†’ check WCAG; backend โ†’ check N+1) ### Memory + Context 24. โณ Episodic memory (last 50 sessions retrieval for similar tasks) 25. โณ Procedural memory (how-to library auto-generated from successful runs) 26. โณ Project context cache (surrogate.md + repo-map persisted across sessions) 27. โณ Cross-project pattern share (skill from project A โ†’ applicable to project B) 28. โณ Long-term retention (key decisions โ†’ ADR auto-generation) ### Self-improvement loop 29. โณ Reflexion lessons โ†’ injected into next-similar-task prompt 30. โณ Failed orchestrate โ†’ root-cause analysis โ†’ improvement queue 31. โณ Weekly LoRA fine-tune trigger (on accumulated training pairs, autotrain) 32. โณ A/B test prompts (variant A vs B, pick winner by APPROVE rate) 33. โณ Voyager-style skill crystallization (pattern repeated 3+ times โ†’ permanent skill) ### Datasets + Training 34. โณ SRE postmortem corpus (scrape danluu/post-mortems โ†’ ~600 incident โ†’ instruction-pair) 35. โณ AWS Well-Architected synthetic Q/A (PDFs โ†’ distilabel pipeline โ†’ 5k pairs) 36. โณ Internal axentx code โ†’ instruction pairs (commit messages + diffs) 37. โณ Training pair quality scoring (filter low-quality before HF upload) 38. โณ DPO preference pairs from reviewer (chosen/rejected from REWORK cycles) 39. โณ Synthetic ADR generation (real OSS examples โ†’ expand via distilabel) ### Tools + Integrations 40. โณ MCP client support (Claude Desktop schema โ€” connect external tools) 41. โณ ToolSearch lazy-load (don't blow context on full tool list) 42. โณ Constitutional Critic from ~/.surrogate/agents/roster.json (auto-load) 43. โณ Repo-map context (tree-sitter symbol graph โ†’ smarter file selection) 44. โณ Tool-call traces saved as training data (every tool use โ†’ pair) ### Security + Safety 45. โณ Secret-scan pre-commit hook (gitleaks integration) 46. โณ Rate limit per-IP (HF Space /chat endpoint) 47. โณ Allowlist/denylist for git push (don't push to main without flag) 48. โณ PII scrubber for training pairs (remove emails, IPs, names before upload) 49. โณ Sandbox tool execution (no rm -rf, no curl |sh, no destructive ops) 50. โณ Audit log for every orchestrate run (who/what/when/result) ### Multi-modal + I/O 51. โณ Voice input (Whisper transcribe โ†’ surrogate) 52. โณ Image input (architectural diagrams โ†’ analysis) 53. โณ Screen recording โ†’ video โ†’ tutorial agent 54. โณ Discord voice channel (TTS responses) ### CLI UX 55. โณ /resume (continue past session) 56. โณ /diff (show pending changes before commit) 57. โณ /undo (rollback last orchestrate via git stash) 58. โณ /share (publish session as gist for review) 59. โณ Tab autocomplete for slash commands 60. โณ Cost-meter live in statusline (running $ this session) ### Cloud / multi-region 61. โณ Mirror to Cloudflare Workers AI (free tier backup) 62. โณ Egress whitelist for Discord on HF Pro tier 63. โณ HF Space upgrade auto-scale (when load > 80%) 64. โณ Backup strategy: weekly snapshot of /data โ†’ HF dataset ### Codebase intelligence 65. โณ Symbol search (tree-sitter index, not just text grep) 66. โณ Cross-file refactor (rename across project safely) 67. โณ Type-aware code completion (LSP integration) 68. โณ Dead code detection (vulture, ts-prune) 69. โณ Dependency graph viz (per-project) ### Training data flywheel 70. โณ Trace storage on HF (axentx/surrogate-1-traces dataset) 71. โณ Auto-tag training pairs by domain (frontend/backend/etc) 72. โณ Quality gate before training pair upload (โ‰ฅ N tokens, well-formed) 73. โณ Weekly eval on SWE-bench-Lite (track improvement) 74. โณ DPO data generation (REWORK cycles โ†’ preference pairs) ### Discord + notifications 75. โณ Discord webhook for every commit (axentx repo notifications) 76. โณ Daily digest webhook (commits + pairs + scrape stats) 77. โณ Failure alerts (orchestrate fail โ†’ ping) 78. โณ Slash commands `/orchestrate "task"` from Discord ### HF integrations 79. โณ TEI server (text-embeddings-inference) for RAG 80. โณ TGI server (text-generation-inference) for self-hosted LLM 81. โณ autotrain weekly LoRA on training pairs 82. โณ HF Inference Providers as primary (paid bypass) 83. โณ HF Spaces gradio UI (visualize chain status) ### Agent quality 84. โณ Specialist eval per agent (e.g., dev-backend on RealWorld benchmark) 85. โณ Multi-model consensus on critical decisions (architecture, security) 86. โณ Constitutional rules (no hard-coded secrets, validate input) 87. โณ Tool use tracking per agent (which tools each agent calls) 88. โณ Persona consistency check (review for tone/style mid-thread) ### Project management 89. โณ Burndown chart per surrogate.md plan 90. โณ Story-point estimation from PRD 91. โณ Auto-create GitHub issues from `- [ ]` plan items 92. โณ PR description auto-write from commit list 93. โณ Sprint retrospective auto-summary ### Performance 94. โณ Profile + optimize orchestrate cycle time (target < 90s p50) 95. โณ Streaming responses (LLM tokens flow live, don't wait for full) 96. โณ Local cache for repeated identical prompts 97. โณ Parallel model calls (race fastest-first, kill rest) 98. โณ Edge inference (qwen3-coder on Cerebras WaferScale via API) ### Compliance + Governance 99. โณ License audit per file generated (OSS license compatibility) 100. โณ Commit signing (gpg/sigstore) --- ## ๐Ÿ’ก Nice-to-Have (future) ### Multi-agent collaboration 1. ๐Ÿ’ก MoA (Mixture of Agents) โ€” 3 LLMs propose, judge picks best 2. ๐Ÿ’ก Debate mode (2 agents argue, third synthesizes) 3. ๐Ÿ’ก Tournament-style code review (3 reviewers, majority verdict) 4. ๐Ÿ’ก Hierarchical agents (manager โ†’ workers โ†’ reporter) 5. ๐Ÿ’ก Autonomous research squad (3 agents split topics, merge findings) ### UI / UX 6. ๐Ÿ’ก Web dashboard (real-time pipeline status, training pair count, model health) 7. ๐Ÿ’ก VSCode extension (`surrogate /auto` from editor) 8. ๐Ÿ’ก IntelliJ plugin 9. ๐Ÿ’ก Mobile app (iOS/Android) for on-the-go orchestrate 10. ๐Ÿ’ก Apple Watch glance (current task status) ### Voice + Audio 11. ๐Ÿ’ก Whisper realtime transcription 12. ๐Ÿ’ก ElevenLabs TTS for status reports 13. ๐Ÿ’ก Daily audio briefing podcast 14. ๐Ÿ’ก Voice clone of user for replies ### Visual 15. ๐Ÿ’ก Architecture diagram auto-generation (mermaid โ†’ SVG) 16. ๐Ÿ’ก Dependency graph live render 17. ๐Ÿ’ก Heat map of code changes per file 18. ๐Ÿ’ก 3D codebase visualization (gource-style) ### Integrations 19. ๐Ÿ’ก Linear / Jira sync (pull tickets, update status) 20. ๐Ÿ’ก Slack bot 21. ๐Ÿ’ก Microsoft Teams bot 22. ๐Ÿ’ก Notion sync (PRD โ†” Notion page) 23. ๐Ÿ’ก Figma plugin (design โ†’ code via DEV agent) 24. ๐Ÿ’ก Storybook integration (component dev) 25. ๐Ÿ’ก Sentry integration (errors โ†’ fix queue) 26. ๐Ÿ’ก PagerDuty integration (incident โ†’ SRE agent) 27. ๐Ÿ’ก GitHub Copilot bridge (delegate to Surrogate for complex) 28. ๐Ÿ’ก Cursor IDE integration ### ML / Self-improvement 29. ๐Ÿ’ก RLHF from APPROVE/REWORK signals 30. ๐Ÿ’ก RLAIF (AI feedback on agent outputs) 31. ๐Ÿ’ก Continual pre-training on axentx code corpus 32. ๐Ÿ’ก Distillation (qwen-coder-30B โ†’ 7B for edge) 33. ๐Ÿ’ก Quantization-aware fine-tuning 34. ๐Ÿ’ก Speculative decoding for faster inference 35. ๐Ÿ’ก Mixture-of-experts custom training ### Datasets 36. ๐Ÿ’ก Real-time scrape of GitHub trending (every 1h) 37. ๐Ÿ’ก Scrape Hacker News top stories daily 38. ๐Ÿ’ก Scrape Reddit r/programming weekly 39. ๐Ÿ’ก Scrape Twitter dev threads (X API tier 1 = $100/m, skip) 40. ๐Ÿ’ก Curated YouTube transcripts (developer talks, RustConf, KubeCon) 41. ๐Ÿ’ก Scrape arxiv-sanity for AI papers 42. ๐Ÿ’ก Crawl AWS/GCP/Azure docs nightly 43. ๐Ÿ’ก PR diff archive (axentx own PRs as training) 44. ๐Ÿ’ก Stack Overflow accepted answers (dump filter) 45. ๐Ÿ’ก GitHub issue resolutions (closed issue โ†’ PR linkage) ### Cloud / Deployment 46. ๐Ÿ’ก Multi-region HF Spaces (ap-southeast + us-east + eu-west) 47. ๐Ÿ’ก K8s deployment manifests (move beyond HF when scale demands) 48. ๐Ÿ’ก Kubernetes operator for axentx orchestration 49. ๐Ÿ’ก Lambda@Edge for global low-latency inference 50. ๐Ÿ’ก IPFS publish of PRDs (decentralized) ### Privacy + Security 51. ๐Ÿ’ก E2E encryption for Discord chat 52. ๐Ÿ’ก Air-gapped mode (Mac-only, no cloud) 53. ๐Ÿ’ก Federated learning (multiple users contribute, no central data) 54. ๐Ÿ’ก Zero-knowledge proofs for code provenance 55. ๐Ÿ’ก Confidential computing (Intel SGX) for sensitive code 56. ๐Ÿ’ก GDPR compliance toolkit (PII scrub, right-to-delete) 57. ๐Ÿ’ก SOC 2 Type II readiness checklist 58. ๐Ÿ’ก ISO 27001 audit prep ### Specialty agents 59. ๐Ÿ’ก Compiler engineer (LLVM, optimization passes) 60. ๐Ÿ’ก Embedded systems (microcontroller code, real-time) 61. ๐Ÿ’ก Game dev (Unity, Unreal, Godot) 62. ๐Ÿ’ก Blockchain (Solidity, smart contracts, security) 63. ๐Ÿ’ก Quantum computing (Qiskit, circuits) 64. ๐Ÿ’ก Robotics (ROS, motion planning) 65. ๐Ÿ’ก Bioinformatics (BLAST, sequence analysis) 66. ๐Ÿ’ก Quantitative finance (backtesting, risk) 67. ๐Ÿ’ก Climate modeling 68. ๐Ÿ’ก Legal tech (contract review) ### Education 69. ๐Ÿ’ก Teach mode (explain decisions step-by-step for learners) 70. ๐Ÿ’ก Pair programming mode (turn-taking with user) 71. ๐Ÿ’ก Code review school (annotated learning examples) 72. ๐Ÿ’ก Daily challenge generator (LeetCode-style, personalized) 73. ๐Ÿ’ก Concept explainer (DDD, hexagonal, CAP theorem on demand) ### Productivity 74. ๐Ÿ’ก Calendar integration (block focus time when in flow) 75. ๐Ÿ’ก Pomodoro mode 76. ๐Ÿ’ก Energy/mood tracker (suggest break when fatigued) 77. ๐Ÿ’ก Distraction blocker (no Twitter when Surrogate active) 78. ๐Ÿ’ก Focus music generator (lo-fi via Suno API) ### Emerging tech 79. ๐Ÿ’ก ASI safety guardrails (per Anthropic Constitutional AI) 80. ๐Ÿ’ก World model simulation (test ideas in synth environment) 81. ๐Ÿ’ก Causal reasoning (vs correlation) 82. ๐Ÿ’ก Theorem prover integration (Lean, Coq for verified code) 83. ๐Ÿ’ก Differential privacy in training 84. ๐Ÿ’ก Explainable AI for code reviews ### Localization 85. ๐Ÿ’ก Thai-native pipeline (เน‚เธ„เน‰เธ”เนเธฅเธฐ comments เน€เธ›เน‡เธ™เน„เธ—เธข) 86. ๐Ÿ’ก Japanese, Korean, Chinese support 87. ๐Ÿ’ก RTL languages (Arabic, Hebrew) 88. ๐Ÿ’ก Local LLM Thai-fluent (typhoon, openthaigpt) 89. ๐Ÿ’ก Cultural code review (idioms per locale) ### Marketing + community 90. ๐Ÿ’ก Public Surrogate-1 demo Space (read-only) 91. ๐Ÿ’ก Twitter bot posts daily Surrogate-1 wins 92. ๐Ÿ’ก GitHub discussions for community 93. ๐Ÿ’ก Discord server for users 94. ๐Ÿ’ก Newsletter (weekly improvements) 95. ๐Ÿ’ก Blog (axentx engineering) ### Speculative 96. ๐Ÿ’ก Surrogate-2 (full local inference, no cloud dep) 97. ๐Ÿ’ก Custom silicon (qwen-coder optimized FPGA) 98. ๐Ÿ’ก BCI integration (Neuralink-style direct intent) 99. ๐Ÿ’ก Physical robot (Boston Dynamics + Surrogate brain) 100. ๐Ÿ’ก ASI alignment research collaboration --- ## Current Cadence (auto-running on HF) | Task | Frequency | Status | |---|---|---| | Continuous scrape | 8 workers, 5-30s cool-down | โœ… | | Agentic crawler | 6 workers, BFS frontier | โœ… | | Skill synthesis | every 3 min | โœ… | | surrogate-dev-loop | every 2 min | โœ… | | work-queue producer | every 5 min | โœ… | | training-pair push to HF | every 3 min | โœ… | | auto-orchestrate-loop | every 20 min | โœ… | | research-apply | every 30 min | โœ… | | keyword tuner | every 60 min | โœ… | | research-loop | every 6h | โœ… | | dataset-enrich | every 12h | โœ… | ## Verified working (2026-04-28) - 5 commits to HF dataset in 12 min (~4047 pairs uploaded) - Pipeline produces real Python/Go code with DDD patterns - Reviewer issues APPROVE / REWORK / REJECT verdicts - Training feedback loop closing (every stage โ†’ HF)