clashcr / VERIFICATION_PLAN.md
stevenkhan's picture
Upload VERIFICATION_PLAN.md with huggingface_hub
52461a2 verified
# ClashCR Verification Plan
## Goal
Validate the opponent card tracker on real MuMu/BlueStacks gameplay recordings before claiming any accuracy.
## Dataset Requirements
- At least 20 full battles.
- Multiple arenas/maps.
- Multiple resolutions (e.g., 1280x720, 1920x1080, emulator native).
- Single/double/triple elixir periods.
- Normal cards, spells, buildings, champions, evolutions, heroes, tower troops.
- Negative recordings: lobby/menu screens, quiet battle periods with no opponent plays.
## Labeling Format
CSV with columns:
- timestamp (float, seconds)
- frame_idx (int)
- side ('opponent', 'own', 'unknown')
- card_key (string, normalized card name)
- confidence (float, 1.0 for manual labels)
- manual_note (string, optional)
- source ('manual')
## Metrics
- **Precision**: correct_predictions / total_predictions
- **Recall**: correct_predictions / total_labels
- **F1**: harmonic mean of precision and recall
- **False Positives per Minute**: FP / recording_duration * 60
- **Missed Events**: false negatives
- **Mean Timing Error**: average |pred_timestamp - label_timestamp| for matched pairs
- **Median Timing Error**: median of above
- **Confusion Matrix**: per-card breakdown
## Acceptance Targets
- 0 lobby/menu false positives.
- 0 random repeated spam when no card is played.
- False positives per minute near 0 on negative/quiet recordings.
- Every emitted event must include raw visual evidence.
- 100% accuracy may only be claimed on held-out labeled recordings if every opponent card event is detected within the allowed time window and no false events are emitted.
- If true 100% is not achievable from single-screen public data, state that plainly and identify exactly what additional labeled data or visual signal is required.
## Commands
```bash
# Record
clashcr record-battle --config config.yaml --output data/live-recordings/session-001 --seconds 180 --fps 8
# Label manually by editing data/live-recordings/session-001/labels.csv
# Evaluate
clashcr evaluate-recording --config config.yaml --recording data/live-recordings/session-heldout --labels data/live-recordings/session-heldout/labels.csv
```
## Known Limitations (Expected)
- Spell detection relies on heuristic color signatures; may miss subtle spells.
- Hero/evolution detection requires YOLO model trained on those units.
- RoyaleAPI static dataset is stale; official API token required for current card list.
- Without labeled recordings, no accuracy claims can be made.