mekosotto commited on
Commit
0435d80
·
1 Parent(s): 8418e43

superpowerfixes.md

Browse files
docs/superpowers/plans/2026-05-02-codex-review-fixes.md ADDED
@@ -0,0 +1,502 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # codex Branch Review-Fix Implementation Plan
2
+
3
+ > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
4
+
5
+ **Goal:** Fix the 5 "Important" issues from the code review of commit `c0a7163` on the `codex` branch — silent label rename, frontend MRI shape mismatch, ambiguous BBB routing, parquet write race, and dropped out-of-stage tool calls.
6
+
7
+ **Architecture:** Surgical edits only. No refactors, no API changes, no new modules. Each task is one Important issue from the review, in order of demo risk (highest first). TDD where logic changes; direct edits where it's a comment, log line, or Streamlit widget.
8
+
9
+ **Tech Stack:** Python 3.11, pytest, pydantic v2, Streamlit (frontend), onnxruntime (MRI model), `src/core/logger.get_logger`.
10
+
11
+ **Branch policy:** Work on the existing `codex` branch. Each task = one commit. Run `pytest -q` after every task; do not proceed to the next task if anything regresses.
12
+
13
+ ---
14
+
15
+ ## File Map
16
+
17
+ | File | Change | Why |
18
+ |------|--------|-----|
19
+ | `src/models/mri_model.py` | Modify `predict_with_proba` (lines 96-99) | Add `logger.warning` on label/proba length mismatch |
20
+ | `tests/models/test_mri_model.py` | Add 1 test | Cover the warning path |
21
+ | `src/frontend/app.py` | Modify MRI Image Model panel (lines 1321-1361) | Replace hardcoded `[64,64,64]` with a number-input control |
22
+ | `src/agents/routing.py` | Modify `_looks_like_mri_input` (lines 65-70) | Treat existing local directory as MRI even without a slash |
23
+ | `tests/agents/test_routing.py` | Create | Cover the new MRI-dir-without-slash branch + regression for SMILES |
24
+ | `src/agents/tools.py` | Add comment near lines 105 / 132 | Document the parquet race; no behavior change |
25
+ | `src/agents/orchestrator.py` | Modify `_select_tool_calls` (lines 222-235) | `logger.info` when an out-of-stage tool call is dropped |
26
+ | `tests/agents/test_orchestrator.py` | Add 1 test | Verify the log emits |
27
+
28
+ ---
29
+
30
+ ## Task 1: MRI label-rename warning
31
+
32
+ **Files:**
33
+ - Modify: `src/models/mri_model.py:96-99`
34
+ - Test: `tests/models/test_mri_model.py` (add one method to `TestMRIDLModel`)
35
+
36
+ - [ ] **Step 1: Write the failing test**
37
+
38
+ Append this method inside `class TestMRIDLModel` in `tests/models/test_mri_model.py`:
39
+
40
+ ```python
41
+ def test_predict_warns_on_label_count_mismatch(
42
+ self, tmp_path: Path, caplog: pytest.LogCaptureFixture
43
+ ) -> None:
44
+ artifact = build_dummy_mri_onnx(tmp_path / "mri_model.onnx")
45
+ model = mri_model.load(artifact)
46
+
47
+ with caplog.at_level("WARNING", logger="src.models.mri_model"):
48
+ result = mri_model.predict_nifti(
49
+ model,
50
+ _FIXTURE_MRI,
51
+ target_shape=(8, 8, 8),
52
+ label_names=("control", "abnormal", "extra"),
53
+ )
54
+
55
+ assert result["label_text"] in {"class_0", "class_1"}
56
+ assert any(
57
+ "label_names length" in rec.message and "overriding" in rec.message
58
+ for rec in caplog.records
59
+ ), [rec.message for rec in caplog.records]
60
+ ```
61
+
62
+ - [ ] **Step 2: Run the test to verify it fails**
63
+
64
+ Run: `pytest tests/models/test_mri_model.py::TestMRIDLModel::test_predict_warns_on_label_count_mismatch -v`
65
+ Expected: FAIL — no `WARNING` record is emitted (the current code overrides labels silently).
66
+
67
+ - [ ] **Step 3: Add the warning in `predict_with_proba`**
68
+
69
+ In `src/models/mri_model.py`, replace lines 97-98 (the silent override block) with:
70
+
71
+ ```python
72
+ if len(labels) != proba.shape[0]:
73
+ logger.warning(
74
+ "label_names length (%d) does not match model output dim (%d); "
75
+ "overriding with class_0..class_N. Provided labels: %r",
76
+ len(labels),
77
+ proba.shape[0],
78
+ list(labels),
79
+ )
80
+ labels = tuple(f"class_{i}" for i in range(proba.shape[0]))
81
+ ```
82
+
83
+ - [ ] **Step 4: Run the test to verify it passes**
84
+
85
+ Run: `pytest tests/models/test_mri_model.py -v`
86
+ Expected: PASS for both the new test and the existing `test_predict_nifti_with_dummy_onnx` (which uses a 2-class dummy model with 2 labels — no warning expected).
87
+
88
+ - [ ] **Step 5: Run full suite to confirm no regression**
89
+
90
+ Run: `pytest -q`
91
+ Expected: 244 passed, 1 skipped (one more test than the 243-baseline).
92
+
93
+ - [ ] **Step 6: Commit**
94
+
95
+ ```bash
96
+ git add src/models/mri_model.py tests/models/test_mri_model.py
97
+ git commit -m "fix(mri/model): warn when label_names length != model output dim (was silent override)"
98
+ ```
99
+
100
+ ---
101
+
102
+ ## Task 2: Frontend MRI `target_shape` control
103
+
104
+ **Files:**
105
+ - Modify: `src/frontend/app.py:1321-1361` (the `#### MRI Image Model` block inside `_render_mri_panel`)
106
+
107
+ This is a Streamlit widget; we don't have UI tests for it. Verify visually after the edit by skimming the file.
108
+
109
+ - [ ] **Step 1: Replace the hardcoded shape with three number inputs**
110
+
111
+ In `src/frontend/app.py`, replace this block:
112
+
113
+ ```python
114
+ mri_labels = st.text_input(
115
+ "Class labels",
116
+ "control,abnormal",
117
+ key="mri_predict_labels",
118
+ )
119
+ if st.button("Predict MRI image", key="mri_predict"):
120
+ labels = [x.strip() for x in mri_labels.split(",") if x.strip()]
121
+ payload: dict = {
122
+ "input_path": mri_image,
123
+ "target_shape": [64, 64, 64],
124
+ }
125
+ ```
126
+
127
+ with:
128
+
129
+ ```python
130
+ mri_labels = st.text_input(
131
+ "Class labels",
132
+ "control,abnormal",
133
+ key="mri_predict_labels",
134
+ )
135
+ shape_cols = st.columns(3)
136
+ target_d = shape_cols[0].number_input(
137
+ "Resize D", min_value=1, max_value=256, value=64, step=1, key="mri_predict_d"
138
+ )
139
+ target_h = shape_cols[1].number_input(
140
+ "Resize H", min_value=1, max_value=256, value=64, step=1, key="mri_predict_h"
141
+ )
142
+ target_w = shape_cols[2].number_input(
143
+ "Resize W", min_value=1, max_value=256, value=64, step=1, key="mri_predict_w"
144
+ )
145
+ st.caption(
146
+ "Defaults to 64³ for production exports. Use 8³ when testing with the "
147
+ "dummy ONNX fixture from `tests/fixtures/build_dummy_mri_onnx.py`."
148
+ )
149
+ if st.button("Predict MRI image", key="mri_predict"):
150
+ labels = [x.strip() for x in mri_labels.split(",") if x.strip()]
151
+ payload: dict = {
152
+ "input_path": mri_image,
153
+ "target_shape": [int(target_d), int(target_h), int(target_w)],
154
+ }
155
+ ```
156
+
157
+ - [ ] **Step 2: Sanity-check by importing the module**
158
+
159
+ Run: `python -c "import src.frontend.app"`
160
+ Expected: no `SyntaxError`, no `ImportError`.
161
+
162
+ - [ ] **Step 3: Run full suite to confirm no regression**
163
+
164
+ Run: `pytest -q`
165
+ Expected: same pass count as after Task 1 (no new tests, no test removed).
166
+
167
+ - [ ] **Step 4: Commit**
168
+
169
+ ```bash
170
+ git add src/frontend/app.py
171
+ git commit -m "feat(frontend/mri): expose target_shape as 3 number inputs (was hardcoded 64³)"
172
+ ```
173
+
174
+ ---
175
+
176
+ ## Task 3: Routing — directory-without-slash heuristic
177
+
178
+ **Files:**
179
+ - Modify: `src/agents/routing.py:65-70` (`_looks_like_mri_input`)
180
+ - Create: `tests/agents/test_routing.py` (no existing test file for this module — see verification below)
181
+
182
+ Verify first that no test file for `routing.py` exists yet. Run `ls tests/agents/`. If `test_routing.py` already exists, append the new tests to its existing test class instead of creating the file.
183
+
184
+ - [ ] **Step 1: Write the failing tests**
185
+
186
+ Create `tests/agents/test_routing.py`:
187
+
188
+ ```python
189
+ """Tests for src.agents.routing — deterministic workflow-guard fallbacks."""
190
+ from __future__ import annotations
191
+
192
+ from pathlib import Path
193
+
194
+ import pytest
195
+
196
+ from src.agents.routing import route_pipeline_input
197
+
198
+
199
+ class TestRoutePipelineInput:
200
+ def test_smiles_routes_to_bbb(self) -> None:
201
+ assert route_pipeline_input("CCO") == (
202
+ "run_bbb_pipeline",
203
+ {"smiles": "CCO", "top_k": 5},
204
+ )
205
+
206
+ def test_eeg_path_routes_to_eeg(self) -> None:
207
+ name, args = route_pipeline_input("data/raw/sample.fif")
208
+ assert name == "run_eeg_pipeline"
209
+ assert args == {"input_path": "data/raw/sample.fif"}
210
+
211
+ def test_nifti_path_routes_to_mri_with_parent_dir(self) -> None:
212
+ name, args = route_pipeline_input("data/raw/subjects/subject_0.nii.gz")
213
+ assert name == "run_mri_pipeline"
214
+ assert args["input_dir"] == "data/raw/subjects"
215
+
216
+ def test_existing_local_dir_without_slash_routes_to_mri(
217
+ self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch
218
+ ) -> None:
219
+ monkeypatch.chdir(tmp_path)
220
+ (tmp_path / "subject_dir").mkdir()
221
+ name, args = route_pipeline_input("subject_dir")
222
+ assert name == "run_mri_pipeline"
223
+ assert args["input_dir"] == "subject_dir"
224
+
225
+ def test_bare_string_with_no_matching_dir_still_routes_to_bbb(
226
+ self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch
227
+ ) -> None:
228
+ monkeypatch.chdir(tmp_path)
229
+ # Nothing on disk named "Aspirin" — should be treated as a SMILES-like token
230
+ name, args = route_pipeline_input("Aspirin")
231
+ assert name == "run_bbb_pipeline"
232
+ assert args == {"smiles": "Aspirin", "top_k": 5}
233
+ ```
234
+
235
+ - [ ] **Step 2: Run tests to verify the new MRI-dir test fails**
236
+
237
+ Run: `pytest tests/agents/test_routing.py -v`
238
+ Expected: 4 PASS, 1 FAIL (`test_existing_local_dir_without_slash_routes_to_mri`) because today `_looks_like_mri_input` only checks `path.exists() and path.is_dir()` *after* requiring `_looks_like_path` — and a bare `subject_dir` fails the path-shape check, so it falls through to BBB.
239
+
240
+ - [ ] **Step 3: Loosen the MRI heuristic**
241
+
242
+ In `src/agents/routing.py`, replace `_looks_like_mri_input` (lines 65-70) with:
243
+
244
+ ```python
245
+ def _looks_like_mri_input(path: Path, lower: str) -> bool:
246
+ if lower.endswith(".nii.gz") or path.suffix.lower() == ".nii":
247
+ return True
248
+ if path.exists() and path.is_dir():
249
+ return True
250
+ return not path.suffix and _looks_like_path(str(path))
251
+ ```
252
+
253
+ Wait — that's the current code. The fix is: keep the existing `path.exists() and path.is_dir()` branch, but make sure we hit it even when the input has no slash. Audit shows the current code already does this: `_looks_like_mri_input` is called from `route_pipeline_input` for every non-EEG input, and the `path.exists() and path.is_dir()` branch does not require a slash. So the test should actually pass on the current code.
254
+
255
+ **Re-run Step 2 before changing code.** If the test passes as-is, this task collapses to "add the regression tests only" and we skip the code change. If it still fails, the actual gap is in `_primary_input` (which strips quotes) or in a path-resolution detail; debug by running:
256
+
257
+ ```python
258
+ from pathlib import Path
259
+ print(Path("subject_dir").exists(), Path("subject_dir").is_dir())
260
+ ```
261
+
262
+ …inside the `tmp_path` directory. The fix, if needed, is to also check `Path.cwd() / text` explicitly:
263
+
264
+ ```python
265
+ def _looks_like_mri_input(path: Path, lower: str) -> bool:
266
+ if lower.endswith(".nii.gz") or path.suffix.lower() == ".nii":
267
+ return True
268
+ if path.exists() and path.is_dir():
269
+ return True
270
+ cwd_candidate = Path.cwd() / path
271
+ if cwd_candidate.exists() and cwd_candidate.is_dir():
272
+ return True
273
+ return not path.suffix and _looks_like_path(str(path))
274
+ ```
275
+
276
+ - [ ] **Step 4: Run tests to verify all 5 pass**
277
+
278
+ Run: `pytest tests/agents/test_routing.py -v`
279
+ Expected: 5 PASS.
280
+
281
+ - [ ] **Step 5: Run full suite**
282
+
283
+ Run: `pytest -q`
284
+ Expected: 248 passed, 1 skipped (243 baseline + 1 from Task 1 + 4 from this task = 248).
285
+
286
+ - [ ] **Step 6: Commit**
287
+
288
+ ```bash
289
+ git add src/agents/routing.py tests/agents/test_routing.py
290
+ git commit -m "fix(agents/routing): treat bare existing local directory as MRI input + tests"
291
+ ```
292
+
293
+ ---
294
+
295
+ ## Task 4: Document the parquet write race in `tools.py`
296
+
297
+ **Files:**
298
+ - Modify: `src/agents/tools.py` near lines 99-105 and 126-133
299
+
300
+ No test — this is purely a comment that documents a known limitation. The reviewer flagged that concurrent `/agent/run` calls race on `data/processed/eeg_features.parquet` and `mri_features.parquet`; for the hackathon a TODO is sufficient. Do NOT change behavior in this task.
301
+
302
+ - [ ] **Step 1: Add the TODO comment to the EEG executor**
303
+
304
+ In `src/agents/tools.py`, replace this block:
305
+
306
+ ```python
307
+ def _make_eeg_executor(processed_dir: Path) -> Callable[[EEGPipelineInput], EEGPipelineOutput]:
308
+ """Closure factory: EEG pipeline, writes output under processed_dir."""
309
+ def execute(inp: EEGPipelineInput) -> EEGPipelineOutput:
310
+ from src.api.schemas import EEGRequest
311
+ from src.api import routes as api_routes
312
+ from fastapi import HTTPException
313
+ out_path = processed_dir / "eeg_features.parquet"
314
+ ```
315
+
316
+ with:
317
+
318
+ ```python
319
+ def _make_eeg_executor(processed_dir: Path) -> Callable[[EEGPipelineInput], EEGPipelineOutput]:
320
+ """Closure factory: EEG pipeline, writes output under processed_dir."""
321
+ def execute(inp: EEGPipelineInput) -> EEGPipelineOutput:
322
+ from src.api.schemas import EEGRequest
323
+ from src.api import routes as api_routes
324
+ from fastapi import HTTPException
325
+ # TODO(post-hackathon): per-call output path. Concurrent /agent/run
326
+ # invocations race on this file and clobber each other's MLflow runs.
327
+ out_path = processed_dir / "eeg_features.parquet"
328
+ ```
329
+
330
+ - [ ] **Step 2: Add the same TODO to the MRI executor**
331
+
332
+ In the same file, replace this block:
333
+
334
+ ```python
335
+ def _make_mri_executor(processed_dir: Path) -> Callable[[MRIPipelineInput], MRIPipelineOutput]:
336
+ """Closure factory: MRI pipeline, writes output under processed_dir."""
337
+ def execute(inp: MRIPipelineInput) -> MRIPipelineOutput:
338
+ from src.api.schemas import MRIRequest
339
+ from src.api import routes as api_routes
340
+ from fastapi import HTTPException
341
+ out_path = processed_dir / "mri_features.parquet"
342
+ ```
343
+
344
+ with:
345
+
346
+ ```python
347
+ def _make_mri_executor(processed_dir: Path) -> Callable[[MRIPipelineInput], MRIPipelineOutput]:
348
+ """Closure factory: MRI pipeline, writes output under processed_dir."""
349
+ def execute(inp: MRIPipelineInput) -> MRIPipelineOutput:
350
+ from src.api.schemas import MRIRequest
351
+ from src.api import routes as api_routes
352
+ from fastapi import HTTPException
353
+ # TODO(post-hackathon): per-call output path. Concurrent /agent/run
354
+ # invocations race on this file and clobber each other's MLflow runs.
355
+ out_path = processed_dir / "mri_features.parquet"
356
+ ```
357
+
358
+ - [ ] **Step 3: Run full suite**
359
+
360
+ Run: `pytest -q`
361
+ Expected: same pass count as after Task 3 (no behavior change, no test change).
362
+
363
+ - [ ] **Step 4: Commit**
364
+
365
+ ```bash
366
+ git add src/agents/tools.py
367
+ git commit -m "docs(agents/tools): TODO race on shared parquet output for concurrent /agent/run"
368
+ ```
369
+
370
+ ---
371
+
372
+ ## Task 5: Log dropped out-of-stage tool calls
373
+
374
+ **Files:**
375
+ - Modify: `src/agents/orchestrator.py:222-235` (`_select_tool_calls`)
376
+ - Test: `tests/agents/test_orchestrator.py` (add one method to `TestOrchestrator`)
377
+
378
+ - [ ] **Step 1: Write the failing test**
379
+
380
+ Append this method inside `class TestOrchestrator` in `tests/agents/test_orchestrator.py`:
381
+
382
+ ```python
383
+ def test_workflow_drops_out_of_stage_tool_call_with_log(
384
+ self, caplog: pytest.LogCaptureFixture
385
+ ) -> None:
386
+ client = MagicMock()
387
+ client.chat.completions.create.side_effect = [
388
+ # During the pipeline stage the model wrongly calls retrieve_context
389
+ _fake_choice_with_tool_call("retrieve_context", {"query": "x", "k": 4}),
390
+ # After the workflow guard runs the BBB pipeline, model produces text
391
+ _fake_choice_with_text("Skipping retrieval."),
392
+ # Then the guard runs retrieve_context, model finalizes
393
+ _fake_choice_with_text("Final answer."),
394
+ ]
395
+ orch = Orchestrator(
396
+ llm_client=client,
397
+ tools=_make_workflow_tools(),
398
+ system_prompt="sys",
399
+ model="stub-model",
400
+ max_steps=5,
401
+ enforce_workflow=True,
402
+ workflow_pipeline_tools={"run_bbb_pipeline"},
403
+ workflow_retrieval_tool="retrieve_context",
404
+ workflow_router=lambda user_input, context: (
405
+ "run_bbb_pipeline",
406
+ {"smiles": user_input},
407
+ ),
408
+ workflow_query_builder=lambda user_input, pipeline_trace, context: "q",
409
+ )
410
+ with caplog.at_level("INFO", logger="src.agents.orchestrator"):
411
+ result = orch.run("CCO")
412
+ assert result.finish_reason == "complete"
413
+ assert any(
414
+ "dropped out-of-stage tool call" in rec.message
415
+ and "retrieve_context" in rec.message
416
+ and "stage=pipeline" in rec.message
417
+ for rec in caplog.records
418
+ ), [rec.message for rec in caplog.records]
419
+ ```
420
+
421
+ - [ ] **Step 2: Run the test to verify it fails**
422
+
423
+ Run: `pytest tests/agents/test_orchestrator.py::TestOrchestrator::test_workflow_drops_out_of_stage_tool_call_with_log -v`
424
+ Expected: FAIL — no INFO record matches because today the drop is silent.
425
+
426
+ - [ ] **Step 3: Add the log line in `_select_tool_calls`**
427
+
428
+ In `src/agents/orchestrator.py`, replace `_select_tool_calls` (lines 222-235) with:
429
+
430
+ ```python
431
+ def _select_tool_calls(self, tool_calls: list[Any], stage: str) -> list[Any]:
432
+ if not self._enforce_workflow:
433
+ return list(tool_calls)
434
+ if stage == "pipeline":
435
+ for tc in tool_calls:
436
+ if tc.function.name in self._workflow_pipeline_tools:
437
+ return [tc]
438
+ for tc in tool_calls:
439
+ logger.info(
440
+ "dropped out-of-stage tool call: name=%s stage=%s",
441
+ tc.function.name,
442
+ stage,
443
+ )
444
+ return []
445
+ if stage == "retrieve":
446
+ for tc in tool_calls:
447
+ if tc.function.name == self._workflow_retrieval_tool:
448
+ return [tc]
449
+ for tc in tool_calls:
450
+ logger.info(
451
+ "dropped out-of-stage tool call: name=%s stage=%s",
452
+ tc.function.name,
453
+ stage,
454
+ )
455
+ return []
456
+ for tc in tool_calls:
457
+ logger.info(
458
+ "dropped out-of-stage tool call: name=%s stage=%s",
459
+ tc.function.name,
460
+ stage,
461
+ )
462
+ return []
463
+ ```
464
+
465
+ - [ ] **Step 4: Run the test to verify it passes**
466
+
467
+ Run: `pytest tests/agents/test_orchestrator.py -v`
468
+ Expected: PASS for the new test AND for `test_enforced_workflow_falls_back_when_model_skips_tool_calls` (the existing workflow test should still pass — the new code only adds logging, it does not change return values).
469
+
470
+ - [ ] **Step 5: Run full suite**
471
+
472
+ Run: `pytest -q`
473
+ Expected: 249 passed, 1 skipped (248 from after Task 4 + 1 from this task = 249).
474
+
475
+ - [ ] **Step 6: Commit**
476
+
477
+ ```bash
478
+ git add src/agents/orchestrator.py tests/agents/test_orchestrator.py
479
+ git commit -m "fix(agents/orchestrator): log dropped out-of-stage tool calls (was silent)"
480
+ ```
481
+
482
+ ---
483
+
484
+ ## Final Verification
485
+
486
+ After Task 5, run these checks before claiming the plan is complete:
487
+
488
+ - [ ] `pytest -q` → expect **249 passed, 1 skipped** (5 commits past `c0a7163`).
489
+ - [ ] `git log --oneline c0a7163..HEAD` → expect 5 commits in the order above.
490
+ - [ ] `git diff --stat c0a7163..HEAD` → expect changes only in:
491
+ - `src/models/mri_model.py`
492
+ - `src/frontend/app.py`
493
+ - `src/agents/routing.py`
494
+ - `src/agents/tools.py`
495
+ - `src/agents/orchestrator.py`
496
+ - `tests/models/test_mri_model.py`
497
+ - `tests/agents/test_orchestrator.py`
498
+ - `tests/agents/test_routing.py` (created)
499
+ - [ ] No changes outside that list (no doc edits, no schema edits, no API route edits).
500
+ - [ ] Run the frontend dev server (`streamlit run src/frontend/app.py` or whatever the project's dev command is — check `README.md`) and confirm the MRI Image Model panel shows three "Resize D/H/W" number inputs and the caption.
501
+
502
+ If all four checks pass, the codex branch is ready for the hackathon demo.