# Exploration Ideas, technology research, and ad-hoc investigation notes. This is a scratchpad -- content here is not system-of-record. **Diataxis type:** Exploration (learning-oriented, not yet distilled) ## What Goes Here - Technology evaluations and comparisons - Prototype findings - External API exploration - Performance investigations - Ideation and backlog notes ## What Does NOT Go Here - Durable learnings (go to `docs/learnings/`) - Design decisions (go to `docs/design-docs/`) - Implementation specs (go to `specs/`) - Operational how-to guides (go to `docs/guides/`) ## Exploration Index | Topic | Type | Date | Summary | |-------|------|------|---------| | [grpo-collapse-analysis.md](grpo-collapse-analysis.md) | Investigation | 2026-04 | Post-mortem on Qwen3-1.7B GRPO collapse into degenerate null-argument tool calls | | [grpo-plateau-plan.md](grpo-plateau-plan.md) | Investigation | 2026-04 | Interventions to push past 30-40% accuracy plateau in GRPO training | | [grpo-training-session-log.md](grpo-training-session-log.md) | Investigation | 2026-04 | Running log of SFT warmup + GRPO training sessions on Colab L4 | | [rl-vs-icl-research.md](rl-vs-icl-research.md) | Comparison | 2026-04 | When GRPO training adds value over pure prompting for small SQL agents | | [train-grpo-walkthrough.md](train-grpo-walkthrough.md) | Prototype | 2026-04 | Step-by-step companion guide for train_grpo.ipynb | ## Types - **Tech Eval:** Evaluating a library, framework, or service - **Prototype:** Findings from exploratory prototyping - **Investigation:** Deep dive into a specific problem - **Comparison:** Side-by-side analysis of options ## Graduating Content When exploration produces durable insights: 1. Extract patterns to `docs/learnings/.md` 2. Create reference files in `docs/references/` for agent context 3. Create how-to guides in `docs/guides/` for operational procedures