NSE Nifty 50 Swing Trading Predictor โ v10 Reformed
A CatBoost classifier that predicts whether any of the 50 NSE Nifty 50 stocks will hit a +3% upside target within 10 trading days (BUY vs NOT_BUY).
Repository: mohan170802/nse-nifty50-swing-predictor-v10
๐ฏ What Changed in v10
v10 is a reformed model that directly addresses the problems identified in v1-v9:
| Issue | v1-v9 | v10 Fix |
|---|---|---|
| Calendar leakage | month, dayofweek were top features (21%+ importance) |
โ Stripped entirely โ only ticker/sector remain as categoricals |
| No purge gap | Train/validation/test had overlapping 10-day return windows | โ 10-day purge gap between all splits (Lรณpez de Prado method) |
| Weak interactions | Only raw features | โ 10 interaction features: momentumรVIX, RSIรvolume, MACDรmarket, etc. |
| Honest evaluation | Headline 60.3% but inflated by leakage | โ 64.0% test accuracy with zero calendar features |
๐ Results
Test Set Performance (held-out, no leakage)
| Metric | Value |
|---|---|
| Test Accuracy | 64.03% |
| Test AUC | 58.05% |
| Test F1 (threshold=0.50) | 0.280 |
| Optimized threshold | 0.481 |
| Optimized precision | 49.1% |
| Optimized recall | 25.4% |
The 64% accuracy is +14pp above random on a 43% base rate โ a genuine but weak signal.
๐ฌ Ablation Study โ What Actually Works?
| Experiment | Features | Test Acc | Test AUC | Insight |
|---|---|---|---|---|
| Full v10 | 112 | 64.03% | 58.05% | Baseline |
| No VIX | 109 | 63.52% | 56.07% | VIX adds ~0.5pp |
| No target encoding | 110 | 63.20% | 57.12% | Encoding adds ~0.8pp |
| No interactions | 98 | 63.27% | 57.64% | Interactions add ~0.8pp |
| Raw technicals only | 36 | 62.79% | 54.87% | Cross-sectional adds ~1.2pp |
| Cross-sectional + macro only | 76 | 64.19% | 59.13% | โญ Best subset! Pure relative features work |
| Sector + Ticker only | 2 | 60.36% | 53.60% | Random baseline |
| VIX only | 3 | 63.79% | 54.35% | VIX alone is strong! |
Key findings:
- VIX is the single most important feature (37.5% importance) โ fear/greed regime dominates
- Cross-sectional relative features are the real edge โ "how does RELIANCE look vs all 50 today?"
- Interactions add modest value โ worth keeping but not transformative
- Target encoding is clean โ 1-day shift prevents leakage, adds real signal
๐๏ธ Architecture
Target
- Binary: BUY (hits +3% within 10 days) vs NOT_BUY
- 10-day horizon โ captures more signal than 5-day
- +3% threshold โ filters noise better than ยฑ2%
Features (112 total)
- Technical indicators โ RSI(14,28), MACD, Bollinger Bands, ATR, Stochastic, ADX, OBV, VWAP
- Returns & trend โ lagged log-returns, rolling vol/mean, SMA distances, momentum
- Cross-sectional z-scores โ z-scores and percentiles within each date across all 50 stocks
- Cross-sectional percentiles โ percentile rank within each date
- Sector-relative z-scores โ z-score within each sector per date
- Macro โ Nifty50 return, India VIX level & change, relative performance
- Temporal target encoding โ expanding mean BUY rate per ticker/sector (shifted 1 day)
- Interaction features โ momentumรVIX, RSIรvolume, distanceรtrend, MACDรmarket, etc.
Validation
- Strict temporal split with 10-day purge gaps
- Time-decay sample weights โ recent data weighted exponentially higher (6-month half-life)
- No random shuffling ever
Model
- CatBoost โ native categorical handling prevents target leakage from encoding
- Ordered Target Statistics for
ticker(50 levels) andsector(7 levels)
โ ๏ธ Honest Assessment
What This Model Is
- A weak but genuine directional edge (+14pp above random)
- A market regime filter โ VIX drives most predictive power
- A relative strength screener โ finds stocks outperforming peers in favorable regimes
What This Model Is NOT
- A standalone trading system โ 49% precision means ~half of BUY signals are false
- Guaranteed profit โ backtest only, no transaction costs, slippage, or market impact
- Regime-agnostic โ VIX-heavy models fail when macro dynamics shift
Production Risks
- VIX regime shift โ model overweighted on fear/greed; retrain quarterly
- Non-stationarity โ 64% on 2025-2026 test may not hold in future regimes
- Transaction costs โ 0.5% brokerage + STT + slippage erodes 3% target quickly
- Crowding โ if many use similar signals, alpha decays
๐ Usage
import catboost as cb
from huggingface_hub import hf_hub_download
# Download model
model_path = hf_hub_download("mohan170802/nse-nifty50-swing-predictor-v10", "unified_model_v10.cbm")
model = cb.CatBoostClassifier()
model.load_model(model_path)
# Features list: unified_features_v10.txt (112 features)
# Categorical: ticker, sector
Threshold Recommendations
| Strategy | Threshold | Use Case |
|---|---|---|
| Conservative | 0.65 | Fewer trades, higher precision |
| Moderate | 0.55 | Balanced โ recommended |
| Aggressive | 0.50 | Default, more trades |
๐ Files
| File | Description |
|---|---|
unified_model_v10.cbm |
Trained CatBoost model |
unified_features_v10.txt |
112 feature names |
summary_v10.json |
Full metrics, top features, hyperparameters |
ablation_results.json |
8 ablation experiments with feature importance |
๐ฎ Recommended Next Steps
| Priority | Improvement | Expected Lift |
|---|---|---|
| P0 | Add quarterly fundamental data (P/E, ROE, EPS growth) | +3-7pp |
| P1 | Rolling walk-forward retraining (not fixed split) | Regime adaptation |
| P1 | Ensemble: technical + cross-sectional + macro sub-models | +1-3pp |
| P2 | Intraday features (volume profile, order flow) | +2-4pp |
| P2 | Sector rotation momentum (relative sector strength) | +1-2pp |
โ ๏ธ Not financial advice. This is a research model with a weak signal. Use with stop-losses (-2%), position sizing (โค2% risk per trade), and paper trade for 3+ months before real capital. Markets are non-stationary โ past performance does not guarantee future results.
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "mohan170802/nse-nifty50-swing-predictor-v10"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.