Upload folder using huggingface_hub

60c3695 verified 25 days ago

6.69 kB

	---
	license: cc-by-4.0
	library_name: scikit-learn
	tags:
	- hackathon
	- tabular-classification
	- alphahack
	- factor-model
	pipeline_tag: tabular-classification
	---

	# AlphaHack Model 2 — Winner Predictor (ex-ante)

	A `GradientBoostingClassifier` + `StandardScaler` bundle that produces
	a prize-probability score for a hackathon project from 23 ex-ante
	structural features (i.e. features the competitor controls before
	submission, not post-submission engagement signals).

	Companion: [Model 1 — regime classifier](https://huggingface.co/xenosaac/alphahack-models/tree/main/model1-regime-classifier)

	## The frozen feature list

	These are the exact 23 features the released pkl was fit on. Reproduction
	requires this exact list — the model will silently produce garbage
	on any other feature subset.

	```python
	WINNER_PREDICTOR_FINAL_FEATURES = [
	"B02_problem_clarity_score", "B04_pain_point_universality_score",
	"B08_theme_alignment_score", "B09_novelty_score",
	"D05_wow_factor_score", "F03_memorability_score",
	"I01_criteria_alignment",
	"A05_prize_pool_usd", "A06_total_submissions",
	"A09_num_prize_categories", "A11_theme_keywords",
	"C01_tech_stack_count", "C06_has_github", "C07_has_live_demo",
	"D01_has_demo_video", "D07_num_images", "G01_team_size",
	"B10_hype_alignment",
	"C02_tech_categories", "C03_has_ml_ai", "C04_has_hardware",
	"I03_submission_completeness", "J02_competition_density",
	]
	```

	## Hyperparameters (frozen, verified from pkl bytes)

	```python
	GradientBoostingClassifier(
	n_estimators=200,
	max_depth=4,
	learning_rate=0.1,
	subsample=1.0,
	random_state=42,
	)
	# preceded by sklearn.preprocessing.StandardScaler
	```

	## Headline metrics

	These are the cross-validated metrics from the original training run,
	reported across multi-target setups. GroupKFold by event_id, so
	no event appears in both train and test:

	\| Target \| AUC \| 95% CI \|
	\|---\|---\|---\|
	\| `label_sponsor_prize` \| 0.908 \| [0.859, 0.947] \|
	\| `label_top_winner` \| 0.743 \| [0.644, 0.869] \|
	\| `label_any_prize` \| 0.602 \| [0.491, 0.713] \|

	Per-year sponsor_prize stability: 2024 = 0.914 / 2025 = 0.898 / 2026 = 0.913.

	The published pkl trained on `is_winner` (binary) hits **train AUC
	0.978 / train accuracy 0.940** on the full corpus when reproduced via
	`hackalpha train-model2 --feature-set legacy`.

	## Stability sanity check (different artifact!)

	The `honest_backtest.json` shipped alongside this pkl is a different
	artifact from the headline metrics — it's a 7-year single-target binary
	walk-forward (`is_winner`) instead of a GroupKFold multi-target run.
	It's included as a stability sanity check, not as the headline report:

	\| Year \| Accuracy \| AUC \| Top-1 \| Top-3 \| n_test \|
	\|---\|---\|---\|---\|---\|---\|
	\| 2020 \| 0.739 \| 0.662 \| 0.547 \| 0.848 \| 6,238 \|
	\| 2021 \| 0.720 \| 0.690 \| 0.581 \| 0.874 \| 9,016 \|
	\| 2022 \| 0.710 \| 0.699 \| 0.626 \| 0.901 \| 9,403 \|
	\| 2023 \| 0.730 \| 0.682 \| 0.538 \| 0.878 \| 8,944 \|
	\| 2024 \| 0.724 \| 0.670 \| 0.536 \| 0.860 \| 10,132 \|
	\| 2025 \| 0.772 \| 0.655 \| 0.519 \| 0.823 \| 19,976 \|
	\| 2026 \| 0.860 \| 0.741 \| 0.428 \| 0.754 \| 11,069 \|
	\| Mean \| 0.751 \| 0.685 \| 0.539 \| 0.848 \| — \|

	This walk-forward AUC of 0.685 is the most conservative estimate
	of model performance and shows reasonably stable behavior across 7
	years. The headline 0.908 sponsor-prize AUC is the strongest signal
	the model carries; the walk-forward 0.685 is the binary-winner
	all-comers signal.

	## Loading the model

	```python
	import joblib
	import numpy as np
	import pyarrow.parquet as pq

	bundle = joblib.load("winner_predictor_final.pkl")
	model = bundle["model"] # GradientBoostingClassifier
	scaler = bundle["scaler"] # StandardScaler
	features = bundle["features"] # the 23-feature list above

	# Score projects from the released parquet
	table = pq.read_table(
	"alphahack_features_v7.parquet",
	columns=features + ["project_id", "is_winner"],
	)
	X_raw = np.column_stack([table[f].to_pylist() for f in features]).astype(float)
	X_raw = np.where(np.isnan(X_raw), 0.0, X_raw)
	X = scaler.transform(X_raw)
	proba = model.predict_proba(X)[:, 1] # P(winner)
	```

	## Reproducing this artifact

	```bash
	pip install hackalpha
	hackalpha train-model2 \
	--features data/merged/alphahack_features_v7.parquet \
	--feature-set legacy \
	--model-out data/models/winner_predictor_final.pkl
	```

	The CLI's `--feature-set legacy` mode writes a pkl with the exact
	23-feature schema and hyperparameters listed above.

	## Why `SIGNAL_FEATURES` differs from this pkl

	The companion repo's `hackalpha/research/multi_target_model.py` defines
	`SIGNAL_FEATURES` as a 39-feature ablation-validated set (post-IQ_/EP_/
	SP_/PQ_/JI_ factor families). **17 of those features are not present
	in the released `alphahack_features_v7.parquet`** — that parquet was
	frozen before the new factor families were merged into the pipeline.
	For byte-format reproduction of this released pkl, use
	`WINNER_PREDICTOR_FINAL_FEATURES` (the 23 above), not `SIGNAL_FEATURES`.

	To train against `SIGNAL_FEATURES`, run `hackalpha merge` fresh on the
	released projects/events/annotations to produce a v7+ parquet that
	includes the newer columns.

	## Honest known-failure mode

	A single prospective trial was conducted in April 2026 at a sponsor-API
	hackathon. The model recommended a strategy with a "STRONG GO" verdict;
	the implementation built around that strategy did not place. This is one
	data point, but a salient negative one: the model's retrospective
	strength (sponsor-prize AUC 0.908) did not translate to a
	prospective win in this single test.

	The most plausible failure mode (documented in the post-trial analysis):
	the model rewards sponsor-stack depth + presentation polish + perceived
	completeness, but does not adequately penalize **shallow user
	workflows** — projects can score high on "sponsor integration" + "demo
	quality" while still feeling like a showcase rather than a real
	mini-product. The companion repo's `v2-depth-aware` upgrade adds
	explicit workflow-depth + repeated-value + state-change checks to the
	strategy generator to mitigate this; the released pkl predates that
	upgrade.

	## Limitations

	- Built on Devpost public hackathons only (English).
	- Single prospective trial; small sample, but it lost.
	- Within-event z-score normalization in `hackalpha merge` means the
	model expects features to be normalized **within an event's project
	pool**, not globally. Online single-project scoring is not directly
	supported without re-injecting the event context.
	- `label_any_prize` AUC's 95% CI lower bound (0.491) crosses 0.5 —
	use that target with caution.

	## License

	CC BY 4.0.