| --- |
| license: cc-by-4.0 |
| library_name: scikit-learn |
| tags: |
| - hackathon |
| - tabular-classification |
| - alphahack |
| - factor-model |
| pipeline_tag: tabular-classification |
| --- |
| |
| # AlphaHack Model 2 — Winner Predictor (ex-ante) |
|
|
| A `GradientBoostingClassifier` + `StandardScaler` bundle that produces |
| a **prize-probability score** for a hackathon project from 23 ex-ante |
| structural features (i.e. features the competitor controls before |
| submission, not post-submission engagement signals). |
|
|
| Companion: [Model 1 — regime classifier](https://huggingface.co/xenosaac/alphahack-models/tree/main/model1-regime-classifier) |
|
|
| ## The frozen feature list |
|
|
| These are the exact 23 features the released pkl was fit on. Reproduction |
| **requires this exact list** — the model will silently produce garbage |
| on any other feature subset. |
|
|
| ```python |
| WINNER_PREDICTOR_FINAL_FEATURES = [ |
| "B02_problem_clarity_score", "B04_pain_point_universality_score", |
| "B08_theme_alignment_score", "B09_novelty_score", |
| "D05_wow_factor_score", "F03_memorability_score", |
| "I01_criteria_alignment", |
| "A05_prize_pool_usd", "A06_total_submissions", |
| "A09_num_prize_categories", "A11_theme_keywords", |
| "C01_tech_stack_count", "C06_has_github", "C07_has_live_demo", |
| "D01_has_demo_video", "D07_num_images", "G01_team_size", |
| "B10_hype_alignment", |
| "C02_tech_categories", "C03_has_ml_ai", "C04_has_hardware", |
| "I03_submission_completeness", "J02_competition_density", |
| ] |
| ``` |
|
|
| ## Hyperparameters (frozen, verified from pkl bytes) |
|
|
| ```python |
| GradientBoostingClassifier( |
| n_estimators=200, |
| max_depth=4, |
| learning_rate=0.1, |
| subsample=1.0, |
| random_state=42, |
| ) |
| # preceded by sklearn.preprocessing.StandardScaler |
| ``` |
|
|
| ## Headline metrics |
|
|
| These are the cross-validated metrics from the original training run, |
| reported across multi-target setups. **GroupKFold by event_id**, so |
| no event appears in both train and test: |
| |
| | Target | AUC | 95% CI | |
| |---|---|---| |
| | `label_sponsor_prize` | **0.908** | [0.859, 0.947] | |
| | `label_top_winner` | **0.743** | [0.644, 0.869] | |
| | `label_any_prize` | **0.602** | [0.491, 0.713] | |
| |
| Per-year sponsor_prize stability: 2024 = 0.914 / 2025 = 0.898 / 2026 = 0.913. |
| |
| The published pkl trained on `is_winner` (binary) hits **train AUC |
| 0.978 / train accuracy 0.940** on the full corpus when reproduced via |
| `hackalpha train-model2 --feature-set legacy`. |
| |
| ## Stability sanity check (different artifact!) |
| |
| The `honest_backtest.json` shipped alongside this pkl is a **different** |
| artifact from the headline metrics — it's a 7-year single-target binary |
| walk-forward (`is_winner`) instead of a GroupKFold multi-target run. |
| It's included as a stability sanity check, not as the headline report: |
|
|
| | Year | Accuracy | AUC | Top-1 | Top-3 | n_test | |
| |---|---|---|---|---|---| |
| | 2020 | 0.739 | 0.662 | 0.547 | 0.848 | 6,238 | |
| | 2021 | 0.720 | 0.690 | 0.581 | 0.874 | 9,016 | |
| | 2022 | 0.710 | 0.699 | 0.626 | 0.901 | 9,403 | |
| | 2023 | 0.730 | 0.682 | 0.538 | 0.878 | 8,944 | |
| | 2024 | 0.724 | 0.670 | 0.536 | 0.860 | 10,132 | |
| | 2025 | 0.772 | 0.655 | 0.519 | 0.823 | 19,976 | |
| | 2026 | 0.860 | 0.741 | 0.428 | 0.754 | 11,069 | |
| | **Mean** | **0.751** | **0.685** | **0.539** | **0.848** | — | |
| |
| This walk-forward AUC of 0.685 is the **most conservative** estimate |
| of model performance and shows reasonably stable behavior across 7 |
| years. The headline 0.908 sponsor-prize AUC is the strongest signal |
| the model carries; the walk-forward 0.685 is the binary-winner |
| all-comers signal. |
| |
| ## Loading the model |
| |
| ```python |
| import joblib |
| import numpy as np |
| import pyarrow.parquet as pq |
| |
| bundle = joblib.load("winner_predictor_final.pkl") |
| model = bundle["model"] # GradientBoostingClassifier |
| scaler = bundle["scaler"] # StandardScaler |
| features = bundle["features"] # the 23-feature list above |
| |
| # Score projects from the released parquet |
| table = pq.read_table( |
| "alphahack_features_v7.parquet", |
| columns=features + ["project_id", "is_winner"], |
| ) |
| X_raw = np.column_stack([table[f].to_pylist() for f in features]).astype(float) |
| X_raw = np.where(np.isnan(X_raw), 0.0, X_raw) |
| X = scaler.transform(X_raw) |
| proba = model.predict_proba(X)[:, 1] # P(winner) |
| ``` |
| |
| ## Reproducing this artifact |
|
|
| ```bash |
| pip install hackalpha |
| hackalpha train-model2 \ |
| --features data/merged/alphahack_features_v7.parquet \ |
| --feature-set legacy \ |
| --model-out data/models/winner_predictor_final.pkl |
| ``` |
|
|
| The CLI's `--feature-set legacy` mode writes a pkl with the exact |
| 23-feature schema and hyperparameters listed above. |
|
|
| ## Why `SIGNAL_FEATURES` differs from this pkl |
| |
| The companion repo's `hackalpha/research/multi_target_model.py` defines |
| `SIGNAL_FEATURES` as a 39-feature ablation-validated set (post-IQ_/EP_/ |
| SP_/PQ_/JI_ factor families). **17 of those features are not present |
| in the released `alphahack_features_v7.parquet`** — that parquet was |
| frozen before the new factor families were merged into the pipeline. |
| For byte-format reproduction of this released pkl, use |
| `WINNER_PREDICTOR_FINAL_FEATURES` (the 23 above), not `SIGNAL_FEATURES`. |
|
|
| To train against `SIGNAL_FEATURES`, run `hackalpha merge` fresh on the |
| released projects/events/annotations to produce a v7+ parquet that |
| includes the newer columns. |
|
|
| ## Honest known-failure mode |
|
|
| A single prospective trial was conducted in April 2026 at a sponsor-API |
| hackathon. The model recommended a strategy with a "STRONG GO" verdict; |
| the implementation built around that strategy did not place. This is one |
| data point, but a salient negative one: the model's retrospective |
| strength (sponsor-prize AUC 0.908) **did not translate** to a |
| prospective win in this single test. |
|
|
| The most plausible failure mode (documented in the post-trial analysis): |
| the model rewards sponsor-stack depth + presentation polish + perceived |
| completeness, but does not adequately penalize **shallow user |
| workflows** — projects can score high on "sponsor integration" + "demo |
| quality" while still feeling like a showcase rather than a real |
| mini-product. The companion repo's `v2-depth-aware` upgrade adds |
| explicit workflow-depth + repeated-value + state-change checks to the |
| strategy generator to mitigate this; the released pkl predates that |
| upgrade. |
|
|
| ## Limitations |
|
|
| - Built on Devpost public hackathons only (English). |
| - Single prospective trial; small sample, but it lost. |
| - Within-event z-score normalization in `hackalpha merge` means the |
| model expects features to be normalized **within an event's project |
| pool**, not globally. Online single-project scoring is not directly |
| supported without re-injecting the event context. |
| - `label_any_prize` AUC's 95% CI lower bound (0.491) crosses 0.5 — |
| use that target with caution. |
|
|
| ## License |
|
|
| CC BY 4.0. |
|
|