xenosaac's picture
Upload folder using huggingface_hub
60c3695 verified
---
license: cc-by-4.0
library_name: scikit-learn
tags:
- hackathon
- tabular-classification
- alphahack
- factor-model
pipeline_tag: tabular-classification
---
# AlphaHack Model 2 — Winner Predictor (ex-ante)
A `GradientBoostingClassifier` + `StandardScaler` bundle that produces
a **prize-probability score** for a hackathon project from 23 ex-ante
structural features (i.e. features the competitor controls before
submission, not post-submission engagement signals).
Companion: [Model 1 — regime classifier](https://huggingface.co/xenosaac/alphahack-models/tree/main/model1-regime-classifier)
## The frozen feature list
These are the exact 23 features the released pkl was fit on. Reproduction
**requires this exact list** — the model will silently produce garbage
on any other feature subset.
```python
WINNER_PREDICTOR_FINAL_FEATURES = [
"B02_problem_clarity_score", "B04_pain_point_universality_score",
"B08_theme_alignment_score", "B09_novelty_score",
"D05_wow_factor_score", "F03_memorability_score",
"I01_criteria_alignment",
"A05_prize_pool_usd", "A06_total_submissions",
"A09_num_prize_categories", "A11_theme_keywords",
"C01_tech_stack_count", "C06_has_github", "C07_has_live_demo",
"D01_has_demo_video", "D07_num_images", "G01_team_size",
"B10_hype_alignment",
"C02_tech_categories", "C03_has_ml_ai", "C04_has_hardware",
"I03_submission_completeness", "J02_competition_density",
]
```
## Hyperparameters (frozen, verified from pkl bytes)
```python
GradientBoostingClassifier(
n_estimators=200,
max_depth=4,
learning_rate=0.1,
subsample=1.0,
random_state=42,
)
# preceded by sklearn.preprocessing.StandardScaler
```
## Headline metrics
These are the cross-validated metrics from the original training run,
reported across multi-target setups. **GroupKFold by event_id**, so
no event appears in both train and test:
| Target | AUC | 95% CI |
|---|---|---|
| `label_sponsor_prize` | **0.908** | [0.859, 0.947] |
| `label_top_winner` | **0.743** | [0.644, 0.869] |
| `label_any_prize` | **0.602** | [0.491, 0.713] |
Per-year sponsor_prize stability: 2024 = 0.914 / 2025 = 0.898 / 2026 = 0.913.
The published pkl trained on `is_winner` (binary) hits **train AUC
0.978 / train accuracy 0.940** on the full corpus when reproduced via
`hackalpha train-model2 --feature-set legacy`.
## Stability sanity check (different artifact!)
The `honest_backtest.json` shipped alongside this pkl is a **different**
artifact from the headline metrics — it's a 7-year single-target binary
walk-forward (`is_winner`) instead of a GroupKFold multi-target run.
It's included as a stability sanity check, not as the headline report:
| Year | Accuracy | AUC | Top-1 | Top-3 | n_test |
|---|---|---|---|---|---|
| 2020 | 0.739 | 0.662 | 0.547 | 0.848 | 6,238 |
| 2021 | 0.720 | 0.690 | 0.581 | 0.874 | 9,016 |
| 2022 | 0.710 | 0.699 | 0.626 | 0.901 | 9,403 |
| 2023 | 0.730 | 0.682 | 0.538 | 0.878 | 8,944 |
| 2024 | 0.724 | 0.670 | 0.536 | 0.860 | 10,132 |
| 2025 | 0.772 | 0.655 | 0.519 | 0.823 | 19,976 |
| 2026 | 0.860 | 0.741 | 0.428 | 0.754 | 11,069 |
| **Mean** | **0.751** | **0.685** | **0.539** | **0.848** | — |
This walk-forward AUC of 0.685 is the **most conservative** estimate
of model performance and shows reasonably stable behavior across 7
years. The headline 0.908 sponsor-prize AUC is the strongest signal
the model carries; the walk-forward 0.685 is the binary-winner
all-comers signal.
## Loading the model
```python
import joblib
import numpy as np
import pyarrow.parquet as pq
bundle = joblib.load("winner_predictor_final.pkl")
model = bundle["model"] # GradientBoostingClassifier
scaler = bundle["scaler"] # StandardScaler
features = bundle["features"] # the 23-feature list above
# Score projects from the released parquet
table = pq.read_table(
"alphahack_features_v7.parquet",
columns=features + ["project_id", "is_winner"],
)
X_raw = np.column_stack([table[f].to_pylist() for f in features]).astype(float)
X_raw = np.where(np.isnan(X_raw), 0.0, X_raw)
X = scaler.transform(X_raw)
proba = model.predict_proba(X)[:, 1] # P(winner)
```
## Reproducing this artifact
```bash
pip install hackalpha
hackalpha train-model2 \
--features data/merged/alphahack_features_v7.parquet \
--feature-set legacy \
--model-out data/models/winner_predictor_final.pkl
```
The CLI's `--feature-set legacy` mode writes a pkl with the exact
23-feature schema and hyperparameters listed above.
## Why `SIGNAL_FEATURES` differs from this pkl
The companion repo's `hackalpha/research/multi_target_model.py` defines
`SIGNAL_FEATURES` as a 39-feature ablation-validated set (post-IQ_/EP_/
SP_/PQ_/JI_ factor families). **17 of those features are not present
in the released `alphahack_features_v7.parquet`** — that parquet was
frozen before the new factor families were merged into the pipeline.
For byte-format reproduction of this released pkl, use
`WINNER_PREDICTOR_FINAL_FEATURES` (the 23 above), not `SIGNAL_FEATURES`.
To train against `SIGNAL_FEATURES`, run `hackalpha merge` fresh on the
released projects/events/annotations to produce a v7+ parquet that
includes the newer columns.
## Honest known-failure mode
A single prospective trial was conducted in April 2026 at a sponsor-API
hackathon. The model recommended a strategy with a "STRONG GO" verdict;
the implementation built around that strategy did not place. This is one
data point, but a salient negative one: the model's retrospective
strength (sponsor-prize AUC 0.908) **did not translate** to a
prospective win in this single test.
The most plausible failure mode (documented in the post-trial analysis):
the model rewards sponsor-stack depth + presentation polish + perceived
completeness, but does not adequately penalize **shallow user
workflows** — projects can score high on "sponsor integration" + "demo
quality" while still feeling like a showcase rather than a real
mini-product. The companion repo's `v2-depth-aware` upgrade adds
explicit workflow-depth + repeated-value + state-change checks to the
strategy generator to mitigate this; the released pkl predates that
upgrade.
## Limitations
- Built on Devpost public hackathons only (English).
- Single prospective trial; small sample, but it lost.
- Within-event z-score normalization in `hackalpha merge` means the
model expects features to be normalized **within an event's project
pool**, not globally. Online single-project scoring is not directly
supported without re-injecting the event context.
- `label_any_prize` AUC's 95% CI lower bound (0.491) crosses 0.5 —
use that target with caution.
## License
CC BY 4.0.