NSE Nifty 50 Swing Trading Predictor — v10 Reformed

A CatBoost classifier that predicts whether any of the 50 NSE Nifty 50 stocks will hit a +3% upside target within 10 trading days (BUY vs NOT_BUY).

Repository: mohan170802/nse-nifty50-swing-predictor-v10

🎯 What Changed in v10

v10 is a reformed model that directly addresses the problems identified in v1-v9:

Issue	v1-v9	v10 Fix
Calendar leakage	`month`, `dayofweek` were top features (21%+ importance)	❌ Stripped entirely — only `ticker`/`sector` remain as categoricals
No purge gap	Train/validation/test had overlapping 10-day return windows	✅ 10-day purge gap between all splits (López de Prado method)
Weak interactions	Only raw features	✅ 10 interaction features: momentum×VIX, RSI×volume, MACD×market, etc.
Honest evaluation	Headline 60.3% but inflated by leakage	✅ 64.0% test accuracy with zero calendar features

📊 Results

Test Set Performance (held-out, no leakage)

Metric	Value
Test Accuracy	64.03%
Test AUC	58.05%
Test F1 (threshold=0.50)	0.280
Optimized threshold	0.481
Optimized precision	49.1%
Optimized recall	25.4%

The 64% accuracy is +14pp above random on a 43% base rate — a genuine but weak signal.

🔬 Ablation Study — What Actually Works?

Experiment	Features	Test Acc	Test AUC	Insight
Full v10	112	64.03%	58.05%	Baseline
No VIX	109	63.52%	56.07%	VIX adds ~0.5pp
No target encoding	110	63.20%	57.12%	Encoding adds ~0.8pp
No interactions	98	63.27%	57.64%	Interactions add ~0.8pp
Raw technicals only	36	62.79%	54.87%	Cross-sectional adds ~1.2pp
Cross-sectional + macro only	76	64.19%	59.13%	⭐ Best subset! Pure relative features work
Sector + Ticker only	2	60.36%	53.60%	Random baseline
VIX only	3	63.79%	54.35%	VIX alone is strong!

Key findings:

VIX is the single most important feature (37.5% importance) — fear/greed regime dominates
Cross-sectional relative features are the real edge — "how does RELIANCE look vs all 50 today?"
Interactions add modest value — worth keeping but not transformative
Target encoding is clean — 1-day shift prevents leakage, adds real signal

🏗️ Architecture

Target

Binary: BUY (hits +3% within 10 days) vs NOT_BUY
10-day horizon — captures more signal than 5-day
+3% threshold — filters noise better than ±2%

Features (112 total)

Technical indicators — RSI(14,28), MACD, Bollinger Bands, ATR, Stochastic, ADX, OBV, VWAP
Returns & trend — lagged log-returns, rolling vol/mean, SMA distances, momentum
Cross-sectional z-scores — z-scores and percentiles within each date across all 50 stocks
Cross-sectional percentiles — percentile rank within each date
Sector-relative z-scores — z-score within each sector per date
Macro — Nifty50 return, India VIX level & change, relative performance
Temporal target encoding — expanding mean BUY rate per ticker/sector (shifted 1 day)
Interaction features — momentum×VIX, RSI×volume, distance×trend, MACD×market, etc.

Validation

Strict temporal split with 10-day purge gaps
Time-decay sample weights — recent data weighted exponentially higher (6-month half-life)
No random shuffling ever

Model

CatBoost — native categorical handling prevents target leakage from encoding
Ordered Target Statistics for ticker (50 levels) and sector (7 levels)

⚠️ Honest Assessment

What This Model Is

A weak but genuine directional edge (+14pp above random)
A market regime filter — VIX drives most predictive power
A relative strength screener — finds stocks outperforming peers in favorable regimes

What This Model Is NOT

A standalone trading system — 49% precision means ~half of BUY signals are false
Guaranteed profit — backtest only, no transaction costs, slippage, or market impact
Regime-agnostic — VIX-heavy models fail when macro dynamics shift

Production Risks

VIX regime shift — model overweighted on fear/greed; retrain quarterly
Non-stationarity — 64% on 2025-2026 test may not hold in future regimes
Transaction costs — 0.5% brokerage + STT + slippage erodes 3% target quickly
Crowding — if many use similar signals, alpha decays

🚀 Usage

import catboost as cb
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download("mohan170802/nse-nifty50-swing-predictor-v10", "unified_model_v10.cbm")
model = cb.CatBoostClassifier()
model.load_model(model_path)

# Features list: unified_features_v10.txt (112 features)
# Categorical: ticker, sector

Threshold Recommendations

Strategy	Threshold	Use Case
Conservative	0.65	Fewer trades, higher precision
Moderate	0.55	Balanced — recommended
Aggressive	0.50	Default, more trades

📁 Files

File	Description
`unified_model_v10.cbm`	Trained CatBoost model
`unified_features_v10.txt`	112 feature names
`summary_v10.json`	Full metrics, top features, hyperparameters
`ablation_results.json`	8 ablation experiments with feature importance

🔮 Recommended Next Steps

Priority	Improvement	Expected Lift
P0	Add quarterly fundamental data (P/E, ROE, EPS growth)	+3-7pp
P1	Rolling walk-forward retraining (not fixed split)	Regime adaptation
P1	Ensemble: technical + cross-sectional + macro sub-models	+1-3pp
P2	Intraday features (volume profile, order flow)	+2-4pp
P2	Sector rotation momentum (relative sector strength)	+1-2pp

⚠️ Not financial advice. This is a research model with a weak signal. Use with stop-losses (-2%), position sizing (≤2% risk per trade), and paper trade for 3+ months before real capital. Markets are non-stationary — past performance does not guarantee future results.

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mohan170802/nse-nifty50-swing-predictor-v10"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support