# Feature Demo: F006 — GRPO Training Pipeline > **Generated:** 2026-03-28T07:42:55Z > **Context source:** spec + discovery only (implementation not read) > **Feature entry:** [FEATURES.json #F006](FEATURES.json) --- ## What This Feature Does This feature gives you a single notebook workflow to train an SQLEnv policy with GRPO, then compare behavior before vs after training. The user-facing goal is simple: run one notebook and see whether the trained policy explores the database more strategically than a random baseline. From a user perspective, success means the workflow is reproducible, the learning signal is visible, and the random-vs-trained comparison is easy to inspect in one place. --- ## What Is Already Proven ### Verified in This Demo Run - Confirmed the training extra can import TRL GRPO classes locally (`trl-grpo-import-ok`). - Ran error-handling unit suite (`6 passed`) covering model-load failure, question-load failure modes, OOM guidance, and parse-fallback logging behavior. - Ran notebook-oriented E2E smoke suite (`5 passed`) covering structure, difficulty filtering, training step execution, and transcript generation. - Ran integration suite (`2 passed`) covering rollout + reward flow and unparseable-action recovery. - Attempted to launch the notebook UI; local environment currently lacks `jupyter` binary (captured below). ### Previously Verified Evidence - `FEATURES.json` (F006) records independent verification as **68/68 tests passed** with verifier result `approved` at `2026-03-28T07:37:20Z`. - Implementation spec Section 7 records full verification command passing and prior TRL import check. --- ## What Still Needs User Verification - Open and run `notebooks/train_grpo.ipynb` interactively in a machine with Jupyter available. - Validate the visual learning curve in the notebook output. - Validate side-by-side transcript quality (random vs trained) with your preferred model/runtime. --- ## Quickstart / Verification Steps > Run these commands to see the feature in action: ```bash uv sync --extra training uv run --extra training python -c "from trl import GRPOConfig, GRPOTrainer; print('trl-grpo-import-ok')" uv run --with pytest pytest tests/e2e/test_training_e2e.py -v ``` If you want the interactive notebook UI, install Jupyter in your environment first. --- ## Live Local Proof ### Attempt to Launch the Training Notebook UI This is the user-facing entrypoint described in the spec. ```bash uv run jupyter notebook "notebooks/train_grpo.ipynb" --no-browser --port 8899 ``` ``` error: Failed to spawn: `jupyter` Caused by: No such file or directory (os error 2) ``` What to notice: the notebook launch path is correct, but this environment does not currently have Jupyter installed, so interactive verification is handed off to the user. ### Verify GRPO Training Dependencies Resolve Locally ```bash uv run --extra training python -c "from trl import GRPOConfig, GRPOTrainer; print('trl-grpo-import-ok')" ``` ``` trl-grpo-import-ok ``` What to notice: the TRL GRPO surface required by the notebook is available in this environment when using the `training` extra. --- ## Existing Evidence - Source: `specs/FEATURES.json` (F006.verification_evidence) - `tests_run: 68`, `tests_passed: 68`, `verifier_result: approved` - Command recorded: `uv run --with pytest pytest tests/unit/test_grpo_config.py tests/unit/test_prompts.py tests/unit/test_rollout.py tests/unit/test_rewards.py tests/unit/test_error_handling.py tests/integration/test_training_pipeline.py tests/e2e/test_training_e2e.py -v` --- ## Manual Verification Checklist 1. Install notebook runtime (`jupyter`) and training deps (`uv sync --extra training`). 2. Launch notebook: `jupyter notebook notebooks/train_grpo.ipynb`. 3. Run all cells end-to-end. 4. Confirm training completes without runtime errors. 5. Confirm reward/learning curve is rendered. 6. Confirm random vs trained transcript comparison appears and is readable. 7. Confirm model artifacts are written to the configured output directory. --- ## Edge Cases Exercised ### Error-path handling (bad model, missing/invalid questions, parse fallback) ```bash uv run --with pytest pytest tests/unit/test_error_handling.py -v ``` ``` ============================= test session starts ============================== platform darwin -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/hjerp/.cache/uv/builds-v0/.tmpA8Pzif/bin/python collecting ... collected 6 items tests/unit/test_error_handling.py::test_model_load_error_bad_name PASSED [ 16%] tests/unit/test_error_handling.py::test_question_load_missing_file PASSED [ 33%] tests/unit/test_error_handling.py::test_question_load_empty_file PASSED [ 50%] tests/unit/test_error_handling.py::test_question_load_invalid_json PASSED [ 66%] tests/unit/test_error_handling.py::test_oom_guidance PASSED [ 83%] tests/unit/test_error_handling.py::test_action_parse_fallback_logged PASSED [100%] ============================== 6 passed in 4.68s =============================== ``` Why this matters: this verifies the most important failure modes fail clearly instead of silently. ### Unparseable action recovery in integration flow ```bash uv run --with pytest pytest tests/integration/test_training_pipeline.py -v ``` ``` ============================= test session starts ============================== platform darwin -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/hjerp/.cache/uv/builds-v0/.tmpn3aEqJ/bin/python collecting ... collected 2 items tests/integration/test_training_pipeline.py::test_training_pipeline_flow_with_reward_functions PASSED [ 50%] tests/integration/test_training_pipeline.py::test_unparseable_action_recovers_and_episode_continues PASSED [100%] ============================== 2 passed in 3.87s =============================== ``` Why this matters: malformed model output does not crash the episode loop; training can continue. ### Verification command mismatch in this environment (`--timeout` flag) ```bash uv run --with pytest pytest tests/e2e/test_training_e2e.py -v --timeout=300 ``` ``` ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...] pytest: error: unrecognized arguments: --timeout=300 inifile: /Users/hjerp/Projects/sql-env/pyproject.toml rootdir: /Users/hjerp/Projects/sql-env ``` Why this matters: the spec-listed command assumes timeout-plugin support; local fallback without `--timeout` was required. --- ## Test Evidence (Optional) > Supplementary proof that the feature works correctly across all scenarios. > The Live Demo section above shows how to use the feature; this section shows it was tested. | Test Suite | Tests | Status | |---|---|---| | Error handling unit tests | 6 | All passed | | E2E training notebook smoke tests | 5 | All passed | | Integration training pipeline tests | 2 | All passed | Representative command (run in this demo): ```bash uv run --with pytest pytest tests/e2e/test_training_e2e.py -v ``` Result summary: ``` 5 passed in 3.83s ``` --- ## Feature Links - Implementation spec: `specs/F006-IMPLEMENTATION_SPEC.md` - Verification spec: `specs/F006-VERIFICATION_SPEC.md` --- *Demo generated by `feature-demo` agent. Re-run with `/feature-demo F006` to refresh.*