xenosaac's picture
Upload folder using huggingface_hub
60c3695 verified
metadata
license: cc-by-4.0
library_name: scikit-learn
tags:
  - hackathon
  - tabular-classification
  - alphahack
  - factor-model
pipeline_tag: tabular-classification

AlphaHack Model 2 — Winner Predictor (ex-ante)

A GradientBoostingClassifier + StandardScaler bundle that produces a prize-probability score for a hackathon project from 23 ex-ante structural features (i.e. features the competitor controls before submission, not post-submission engagement signals).

Companion: Model 1 — regime classifier

The frozen feature list

These are the exact 23 features the released pkl was fit on. Reproduction requires this exact list — the model will silently produce garbage on any other feature subset.

WINNER_PREDICTOR_FINAL_FEATURES = [
    "B02_problem_clarity_score", "B04_pain_point_universality_score",
    "B08_theme_alignment_score", "B09_novelty_score",
    "D05_wow_factor_score", "F03_memorability_score",
    "I01_criteria_alignment",
    "A05_prize_pool_usd", "A06_total_submissions",
    "A09_num_prize_categories", "A11_theme_keywords",
    "C01_tech_stack_count", "C06_has_github", "C07_has_live_demo",
    "D01_has_demo_video", "D07_num_images", "G01_team_size",
    "B10_hype_alignment",
    "C02_tech_categories", "C03_has_ml_ai", "C04_has_hardware",
    "I03_submission_completeness", "J02_competition_density",
]

Hyperparameters (frozen, verified from pkl bytes)

GradientBoostingClassifier(
    n_estimators=200,
    max_depth=4,
    learning_rate=0.1,
    subsample=1.0,
    random_state=42,
)
# preceded by sklearn.preprocessing.StandardScaler

Headline metrics

These are the cross-validated metrics from the original training run, reported across multi-target setups. GroupKFold by event_id, so no event appears in both train and test:

Target AUC 95% CI
label_sponsor_prize 0.908 [0.859, 0.947]
label_top_winner 0.743 [0.644, 0.869]
label_any_prize 0.602 [0.491, 0.713]

Per-year sponsor_prize stability: 2024 = 0.914 / 2025 = 0.898 / 2026 = 0.913.

The published pkl trained on is_winner (binary) hits train AUC 0.978 / train accuracy 0.940 on the full corpus when reproduced via hackalpha train-model2 --feature-set legacy.

Stability sanity check (different artifact!)

The honest_backtest.json shipped alongside this pkl is a different artifact from the headline metrics — it's a 7-year single-target binary walk-forward (is_winner) instead of a GroupKFold multi-target run. It's included as a stability sanity check, not as the headline report:

Year Accuracy AUC Top-1 Top-3 n_test
2020 0.739 0.662 0.547 0.848 6,238
2021 0.720 0.690 0.581 0.874 9,016
2022 0.710 0.699 0.626 0.901 9,403
2023 0.730 0.682 0.538 0.878 8,944
2024 0.724 0.670 0.536 0.860 10,132
2025 0.772 0.655 0.519 0.823 19,976
2026 0.860 0.741 0.428 0.754 11,069
Mean 0.751 0.685 0.539 0.848 —

This walk-forward AUC of 0.685 is the most conservative estimate of model performance and shows reasonably stable behavior across 7 years. The headline 0.908 sponsor-prize AUC is the strongest signal the model carries; the walk-forward 0.685 is the binary-winner all-comers signal.

Loading the model

import joblib
import numpy as np
import pyarrow.parquet as pq

bundle = joblib.load("winner_predictor_final.pkl")
model = bundle["model"]      # GradientBoostingClassifier
scaler = bundle["scaler"]    # StandardScaler
features = bundle["features"]  # the 23-feature list above

# Score projects from the released parquet
table = pq.read_table(
    "alphahack_features_v7.parquet",
    columns=features + ["project_id", "is_winner"],
)
X_raw = np.column_stack([table[f].to_pylist() for f in features]).astype(float)
X_raw = np.where(np.isnan(X_raw), 0.0, X_raw)
X = scaler.transform(X_raw)
proba = model.predict_proba(X)[:, 1]   # P(winner)

Reproducing this artifact

pip install hackalpha
hackalpha train-model2 \
  --features data/merged/alphahack_features_v7.parquet \
  --feature-set legacy \
  --model-out data/models/winner_predictor_final.pkl

The CLI's --feature-set legacy mode writes a pkl with the exact 23-feature schema and hyperparameters listed above.

Why SIGNAL_FEATURES differs from this pkl

The companion repo's hackalpha/research/multi_target_model.py defines SIGNAL_FEATURES as a 39-feature ablation-validated set (post-IQ_/EP_/ SP_/PQ_/JI_ factor families). 17 of those features are not present in the released alphahack_features_v7.parquet — that parquet was frozen before the new factor families were merged into the pipeline. For byte-format reproduction of this released pkl, use WINNER_PREDICTOR_FINAL_FEATURES (the 23 above), not SIGNAL_FEATURES.

To train against SIGNAL_FEATURES, run hackalpha merge fresh on the released projects/events/annotations to produce a v7+ parquet that includes the newer columns.

Honest known-failure mode

A single prospective trial was conducted in April 2026 at a sponsor-API hackathon. The model recommended a strategy with a "STRONG GO" verdict; the implementation built around that strategy did not place. This is one data point, but a salient negative one: the model's retrospective strength (sponsor-prize AUC 0.908) did not translate to a prospective win in this single test.

The most plausible failure mode (documented in the post-trial analysis): the model rewards sponsor-stack depth + presentation polish + perceived completeness, but does not adequately penalize shallow user workflows — projects can score high on "sponsor integration" + "demo quality" while still feeling like a showcase rather than a real mini-product. The companion repo's v2-depth-aware upgrade adds explicit workflow-depth + repeated-value + state-change checks to the strategy generator to mitigate this; the released pkl predates that upgrade.

Limitations

  • Built on Devpost public hackathons only (English).
  • Single prospective trial; small sample, but it lost.
  • Within-event z-score normalization in hackalpha merge means the model expects features to be normalized within an event's project pool, not globally. Online single-project scoring is not directly supported without re-injecting the event context.
  • label_any_prize AUC's 95% CI lower bound (0.491) crosses 0.5 — use that target with caution.

License

CC BY 4.0.