license: cc-by-4.0
library_name: scikit-learn
tags:
- hackathon
- tabular-classification
- alphahack
- factor-model
pipeline_tag: tabular-classification
AlphaHack Model 2 — Winner Predictor (ex-ante)
A GradientBoostingClassifier + StandardScaler bundle that produces
a prize-probability score for a hackathon project from 23 ex-ante
structural features (i.e. features the competitor controls before
submission, not post-submission engagement signals).
Companion: Model 1 — regime classifier
The frozen feature list
These are the exact 23 features the released pkl was fit on. Reproduction requires this exact list — the model will silently produce garbage on any other feature subset.
WINNER_PREDICTOR_FINAL_FEATURES = [
"B02_problem_clarity_score", "B04_pain_point_universality_score",
"B08_theme_alignment_score", "B09_novelty_score",
"D05_wow_factor_score", "F03_memorability_score",
"I01_criteria_alignment",
"A05_prize_pool_usd", "A06_total_submissions",
"A09_num_prize_categories", "A11_theme_keywords",
"C01_tech_stack_count", "C06_has_github", "C07_has_live_demo",
"D01_has_demo_video", "D07_num_images", "G01_team_size",
"B10_hype_alignment",
"C02_tech_categories", "C03_has_ml_ai", "C04_has_hardware",
"I03_submission_completeness", "J02_competition_density",
]
Hyperparameters (frozen, verified from pkl bytes)
GradientBoostingClassifier(
n_estimators=200,
max_depth=4,
learning_rate=0.1,
subsample=1.0,
random_state=42,
)
# preceded by sklearn.preprocessing.StandardScaler
Headline metrics
These are the cross-validated metrics from the original training run, reported across multi-target setups. GroupKFold by event_id, so no event appears in both train and test:
| Target | AUC | 95% CI |
|---|---|---|
label_sponsor_prize |
0.908 | [0.859, 0.947] |
label_top_winner |
0.743 | [0.644, 0.869] |
label_any_prize |
0.602 | [0.491, 0.713] |
Per-year sponsor_prize stability: 2024 = 0.914 / 2025 = 0.898 / 2026 = 0.913.
The published pkl trained on is_winner (binary) hits train AUC
0.978 / train accuracy 0.940 on the full corpus when reproduced via
hackalpha train-model2 --feature-set legacy.
Stability sanity check (different artifact!)
The honest_backtest.json shipped alongside this pkl is a different
artifact from the headline metrics — it's a 7-year single-target binary
walk-forward (is_winner) instead of a GroupKFold multi-target run.
It's included as a stability sanity check, not as the headline report:
| Year | Accuracy | AUC | Top-1 | Top-3 | n_test |
|---|---|---|---|---|---|
| 2020 | 0.739 | 0.662 | 0.547 | 0.848 | 6,238 |
| 2021 | 0.720 | 0.690 | 0.581 | 0.874 | 9,016 |
| 2022 | 0.710 | 0.699 | 0.626 | 0.901 | 9,403 |
| 2023 | 0.730 | 0.682 | 0.538 | 0.878 | 8,944 |
| 2024 | 0.724 | 0.670 | 0.536 | 0.860 | 10,132 |
| 2025 | 0.772 | 0.655 | 0.519 | 0.823 | 19,976 |
| 2026 | 0.860 | 0.741 | 0.428 | 0.754 | 11,069 |
| Mean | 0.751 | 0.685 | 0.539 | 0.848 | — |
This walk-forward AUC of 0.685 is the most conservative estimate of model performance and shows reasonably stable behavior across 7 years. The headline 0.908 sponsor-prize AUC is the strongest signal the model carries; the walk-forward 0.685 is the binary-winner all-comers signal.
Loading the model
import joblib
import numpy as np
import pyarrow.parquet as pq
bundle = joblib.load("winner_predictor_final.pkl")
model = bundle["model"] # GradientBoostingClassifier
scaler = bundle["scaler"] # StandardScaler
features = bundle["features"] # the 23-feature list above
# Score projects from the released parquet
table = pq.read_table(
"alphahack_features_v7.parquet",
columns=features + ["project_id", "is_winner"],
)
X_raw = np.column_stack([table[f].to_pylist() for f in features]).astype(float)
X_raw = np.where(np.isnan(X_raw), 0.0, X_raw)
X = scaler.transform(X_raw)
proba = model.predict_proba(X)[:, 1] # P(winner)
Reproducing this artifact
pip install hackalpha
hackalpha train-model2 \
--features data/merged/alphahack_features_v7.parquet \
--feature-set legacy \
--model-out data/models/winner_predictor_final.pkl
The CLI's --feature-set legacy mode writes a pkl with the exact
23-feature schema and hyperparameters listed above.
Why SIGNAL_FEATURES differs from this pkl
The companion repo's hackalpha/research/multi_target_model.py defines
SIGNAL_FEATURES as a 39-feature ablation-validated set (post-IQ_/EP_/
SP_/PQ_/JI_ factor families). 17 of those features are not present
in the released alphahack_features_v7.parquet — that parquet was
frozen before the new factor families were merged into the pipeline.
For byte-format reproduction of this released pkl, use
WINNER_PREDICTOR_FINAL_FEATURES (the 23 above), not SIGNAL_FEATURES.
To train against SIGNAL_FEATURES, run hackalpha merge fresh on the
released projects/events/annotations to produce a v7+ parquet that
includes the newer columns.
Honest known-failure mode
A single prospective trial was conducted in April 2026 at a sponsor-API hackathon. The model recommended a strategy with a "STRONG GO" verdict; the implementation built around that strategy did not place. This is one data point, but a salient negative one: the model's retrospective strength (sponsor-prize AUC 0.908) did not translate to a prospective win in this single test.
The most plausible failure mode (documented in the post-trial analysis):
the model rewards sponsor-stack depth + presentation polish + perceived
completeness, but does not adequately penalize shallow user
workflows — projects can score high on "sponsor integration" + "demo
quality" while still feeling like a showcase rather than a real
mini-product. The companion repo's v2-depth-aware upgrade adds
explicit workflow-depth + repeated-value + state-change checks to the
strategy generator to mitigate this; the released pkl predates that
upgrade.
Limitations
- Built on Devpost public hackathons only (English).
- Single prospective trial; small sample, but it lost.
- Within-event z-score normalization in
hackalpha mergemeans the model expects features to be normalized within an event's project pool, not globally. Online single-project scoring is not directly supported without re-injecting the event context. label_any_prizeAUC's 95% CI lower bound (0.491) crosses 0.5 — use that target with caution.
License
CC BY 4.0.