| # SentinelBrain Next Training Phase Plan | |
| Created: 2026-05-03 | |
| ## Current Result | |
| The v2 realignment completed 5,000 steps and preserved full optimizer/progress state, but executable-code quality is not ready: | |
| - `frankenstein_v2_best.pt`: 0/8 Pass@1, 62.5% syntax rate. | |
| - `frankenstein_v2_final.pt`: 0/8 Pass@1, 75.0% syntax rate. | |
| - `sentinelbrain_pretrain_step2471_hf.pt`: 0/8 Pass@1, 87.5% syntax rate. | |
| The next phase should not be another broad corpus realignment. It should be a narrow, measurable SFT and auto-critic loop focused on producing valid, executable assistant outputs. | |
| ## Phase 3 Objective | |
| Recover and improve instruction-following/code-generation behavior while preserving the useful realignment progress. | |
| Primary gates: | |
| - Python stub benchmark: at least 40% Pass@1 and 95% syntax rate before extending past 1,000 steps. | |
| - MBPP/HumanEval sample: measurable improvement every eval window, no syntax regression. | |
| - Chat format probe: responses must use the requested format and stop cleanly. | |
| - Safety/data probe: no leaked secrets, no private-key blocks, no repetitive boilerplate. | |
| ## Data Mix | |
| Use the cleaned SFT dataset at `/mnt/scratch/datasets/combined/sft_combined_ready.jsonl` as the base, then rebalance before training: | |
| - 45% executable Python and TypeScript tasks: HumanEval-style stubs, MBPP-style prompts, unit-test repair, CLI scripts, API handlers. | |
| - 20% code editing and diff output: unified diffs, bug fixes, refactors, failing-test-to-patch examples. | |
| - 15% tool-use and agent workflows: file search, terminal commands, deployment diagnostics, function-call JSON. | |
| - 10% system/admin/devops: Linux, Docker, nginx, pm2, Azure, SSH, logs. | |
| - 10% general instruction/chat: concise natural language, summarization, planning. | |
| Avoid over-weighting audio/prose/OCR rows in this phase. Keep those for a later multimodal run after chat/code behavior is stable. | |
| ## Auto-Critic Pipeline | |
| For each generated training candidate: | |
| 1. Normalize to a strict prompt/response or ChatML schema. | |
| 2. Run syntax checks for code outputs. | |
| 3. Run unit tests when a test harness is available. | |
| 4. Score format compliance: required function names, JSON validity, diff parseability, stop tokens. | |
| 5. Reject outputs with repetition, generic filler, missing entry points, invalid tokens, or secret-like strings. | |
| 6. Keep only examples that pass the critic or have a repair trajectory showing the failed output and corrected output. | |
| Recommended critic labels: | |
| - `syntax_pass` | |
| - `tests_pass` | |
| - `entrypoint_match` | |
| - `format_pass` | |
| - `no_secret` | |
| - `no_repetition` | |
| - `accepted` | |
| Train on accepted final answers plus curated repair traces, not raw failed generations. | |
| ## Training Schedule | |
| Start from `sentinelbrain_pretrain_step2471_hf.pt` or `frankenstein_v2_best.pt` only after a short format probe. If `frankenstein_v2_best.pt` continues to emit invalid boilerplate, use the pretrain anchor and reintroduce v2 weights later via low-LR distillation. | |
| Suggested run: | |
| - Steps 0-250: frozen experts, train attention/router/norms and output behavior at low LR. | |
| - Steps 250-1,000: unfreeze selected layers, SFT on executable/code-repair mix. | |
| - Eval every 250 steps; save model-only checkpoints and full optimizer checkpoints at each gate. | |
| - Stop automatically if syntax rate drops for two evals or Pass@1 does not improve after 750 steps. | |
| - Continue to 3,000-5,000 only after hitting the 1,000-step gates. | |
| ## Benchmark Set | |
| Always run on the MI300X server: | |
| - Local 8-task smoke benchmark for quick regression checks. | |
| - HumanEval subset and MBPP subset with executable tests. | |
| - JSON/function-call validity suite. | |
| - Unified-diff parse/apply suite. | |
| - DevOps shell-command reasoning suite. | |
| Persist every benchmark JSON under `/mnt/scratch/benchmark_results/` and copy selected reports into the HF release folder. | |
| ## Operational Rules | |
| - Keep full optimizer checkpoints on server/Azure, not local PC. | |
| - Store model-only best checkpoints separately for HF and chat loading. | |
| - Never expose chat using a checkpoint that fails the format probe unless clearly labeled experimental. | |
| - Use `SENTINEL_WEB_CHAT_DISABLED=0` only after confirming no training job is active and VRAM has enough headroom. | |
| - Prefer `frankenstein_v2_best.pt` for validation-loss experiments, but prefer the pretrain anchor for code SFT if code probes remain better there. |