--- license: cc-by-4.0 library_name: scikit-learn tags: - hackathon - tabular-classification - alphahack - factor-model pipeline_tag: tabular-classification --- # AlphaHack Model 2 — Winner Predictor (ex-ante) A `GradientBoostingClassifier` + `StandardScaler` bundle that produces a **prize-probability score** for a hackathon project from 23 ex-ante structural features (i.e. features the competitor controls before submission, not post-submission engagement signals). Companion: [Model 1 — regime classifier](https://huggingface.co/xenosaac/alphahack-models/tree/main/model1-regime-classifier) ## The frozen feature list These are the exact 23 features the released pkl was fit on. Reproduction **requires this exact list** — the model will silently produce garbage on any other feature subset. ```python WINNER_PREDICTOR_FINAL_FEATURES = [ "B02_problem_clarity_score", "B04_pain_point_universality_score", "B08_theme_alignment_score", "B09_novelty_score", "D05_wow_factor_score", "F03_memorability_score", "I01_criteria_alignment", "A05_prize_pool_usd", "A06_total_submissions", "A09_num_prize_categories", "A11_theme_keywords", "C01_tech_stack_count", "C06_has_github", "C07_has_live_demo", "D01_has_demo_video", "D07_num_images", "G01_team_size", "B10_hype_alignment", "C02_tech_categories", "C03_has_ml_ai", "C04_has_hardware", "I03_submission_completeness", "J02_competition_density", ] ``` ## Hyperparameters (frozen, verified from pkl bytes) ```python GradientBoostingClassifier( n_estimators=200, max_depth=4, learning_rate=0.1, subsample=1.0, random_state=42, ) # preceded by sklearn.preprocessing.StandardScaler ``` ## Headline metrics These are the cross-validated metrics from the original training run, reported across multi-target setups. **GroupKFold by event_id**, so no event appears in both train and test: | Target | AUC | 95% CI | |---|---|---| | `label_sponsor_prize` | **0.908** | [0.859, 0.947] | | `label_top_winner` | **0.743** | [0.644, 0.869] | | `label_any_prize` | **0.602** | [0.491, 0.713] | Per-year sponsor_prize stability: 2024 = 0.914 / 2025 = 0.898 / 2026 = 0.913. The published pkl trained on `is_winner` (binary) hits **train AUC 0.978 / train accuracy 0.940** on the full corpus when reproduced via `hackalpha train-model2 --feature-set legacy`. ## Stability sanity check (different artifact!) The `honest_backtest.json` shipped alongside this pkl is a **different** artifact from the headline metrics — it's a 7-year single-target binary walk-forward (`is_winner`) instead of a GroupKFold multi-target run. It's included as a stability sanity check, not as the headline report: | Year | Accuracy | AUC | Top-1 | Top-3 | n_test | |---|---|---|---|---|---| | 2020 | 0.739 | 0.662 | 0.547 | 0.848 | 6,238 | | 2021 | 0.720 | 0.690 | 0.581 | 0.874 | 9,016 | | 2022 | 0.710 | 0.699 | 0.626 | 0.901 | 9,403 | | 2023 | 0.730 | 0.682 | 0.538 | 0.878 | 8,944 | | 2024 | 0.724 | 0.670 | 0.536 | 0.860 | 10,132 | | 2025 | 0.772 | 0.655 | 0.519 | 0.823 | 19,976 | | 2026 | 0.860 | 0.741 | 0.428 | 0.754 | 11,069 | | **Mean** | **0.751** | **0.685** | **0.539** | **0.848** | — | This walk-forward AUC of 0.685 is the **most conservative** estimate of model performance and shows reasonably stable behavior across 7 years. The headline 0.908 sponsor-prize AUC is the strongest signal the model carries; the walk-forward 0.685 is the binary-winner all-comers signal. ## Loading the model ```python import joblib import numpy as np import pyarrow.parquet as pq bundle = joblib.load("winner_predictor_final.pkl") model = bundle["model"] # GradientBoostingClassifier scaler = bundle["scaler"] # StandardScaler features = bundle["features"] # the 23-feature list above # Score projects from the released parquet table = pq.read_table( "alphahack_features_v7.parquet", columns=features + ["project_id", "is_winner"], ) X_raw = np.column_stack([table[f].to_pylist() for f in features]).astype(float) X_raw = np.where(np.isnan(X_raw), 0.0, X_raw) X = scaler.transform(X_raw) proba = model.predict_proba(X)[:, 1] # P(winner) ``` ## Reproducing this artifact ```bash pip install hackalpha hackalpha train-model2 \ --features data/merged/alphahack_features_v7.parquet \ --feature-set legacy \ --model-out data/models/winner_predictor_final.pkl ``` The CLI's `--feature-set legacy` mode writes a pkl with the exact 23-feature schema and hyperparameters listed above. ## Why `SIGNAL_FEATURES` differs from this pkl The companion repo's `hackalpha/research/multi_target_model.py` defines `SIGNAL_FEATURES` as a 39-feature ablation-validated set (post-IQ_/EP_/ SP_/PQ_/JI_ factor families). **17 of those features are not present in the released `alphahack_features_v7.parquet`** — that parquet was frozen before the new factor families were merged into the pipeline. For byte-format reproduction of this released pkl, use `WINNER_PREDICTOR_FINAL_FEATURES` (the 23 above), not `SIGNAL_FEATURES`. To train against `SIGNAL_FEATURES`, run `hackalpha merge` fresh on the released projects/events/annotations to produce a v7+ parquet that includes the newer columns. ## Honest known-failure mode A single prospective trial was conducted in April 2026 at a sponsor-API hackathon. The model recommended a strategy with a "STRONG GO" verdict; the implementation built around that strategy did not place. This is one data point, but a salient negative one: the model's retrospective strength (sponsor-prize AUC 0.908) **did not translate** to a prospective win in this single test. The most plausible failure mode (documented in the post-trial analysis): the model rewards sponsor-stack depth + presentation polish + perceived completeness, but does not adequately penalize **shallow user workflows** — projects can score high on "sponsor integration" + "demo quality" while still feeling like a showcase rather than a real mini-product. The companion repo's `v2-depth-aware` upgrade adds explicit workflow-depth + repeated-value + state-change checks to the strategy generator to mitigate this; the released pkl predates that upgrade. ## Limitations - Built on Devpost public hackathons only (English). - Single prospective trial; small sample, but it lost. - Within-event z-score normalization in `hackalpha merge` means the model expects features to be normalized **within an event's project pool**, not globally. Online single-project scoring is not directly supported without re-injecting the event context. - `label_any_prize` AUC's 95% CI lower bound (0.491) crosses 0.5 — use that target with caution. ## License CC BY 4.0.