NSE Nifty 50 Swing Trading Predictor โ€” v10 Reformed

A CatBoost classifier that predicts whether any of the 50 NSE Nifty 50 stocks will hit a +3% upside target within 10 trading days (BUY vs NOT_BUY).

Repository: mohan170802/nse-nifty50-swing-predictor-v10


๐ŸŽฏ What Changed in v10

v10 is a reformed model that directly addresses the problems identified in v1-v9:

Issue v1-v9 v10 Fix
Calendar leakage month, dayofweek were top features (21%+ importance) โŒ Stripped entirely โ€” only ticker/sector remain as categoricals
No purge gap Train/validation/test had overlapping 10-day return windows โœ… 10-day purge gap between all splits (Lรณpez de Prado method)
Weak interactions Only raw features โœ… 10 interaction features: momentumร—VIX, RSIร—volume, MACDร—market, etc.
Honest evaluation Headline 60.3% but inflated by leakage โœ… 64.0% test accuracy with zero calendar features

๐Ÿ“Š Results

Test Set Performance (held-out, no leakage)

Metric Value
Test Accuracy 64.03%
Test AUC 58.05%
Test F1 (threshold=0.50) 0.280
Optimized threshold 0.481
Optimized precision 49.1%
Optimized recall 25.4%

The 64% accuracy is +14pp above random on a 43% base rate โ€” a genuine but weak signal.


๐Ÿ”ฌ Ablation Study โ€” What Actually Works?

Experiment Features Test Acc Test AUC Insight
Full v10 112 64.03% 58.05% Baseline
No VIX 109 63.52% 56.07% VIX adds ~0.5pp
No target encoding 110 63.20% 57.12% Encoding adds ~0.8pp
No interactions 98 63.27% 57.64% Interactions add ~0.8pp
Raw technicals only 36 62.79% 54.87% Cross-sectional adds ~1.2pp
Cross-sectional + macro only 76 64.19% 59.13% โญ Best subset! Pure relative features work
Sector + Ticker only 2 60.36% 53.60% Random baseline
VIX only 3 63.79% 54.35% VIX alone is strong!

Key findings:

  1. VIX is the single most important feature (37.5% importance) โ€” fear/greed regime dominates
  2. Cross-sectional relative features are the real edge โ€” "how does RELIANCE look vs all 50 today?"
  3. Interactions add modest value โ€” worth keeping but not transformative
  4. Target encoding is clean โ€” 1-day shift prevents leakage, adds real signal

๐Ÿ—๏ธ Architecture

Target

  • Binary: BUY (hits +3% within 10 days) vs NOT_BUY
  • 10-day horizon โ€” captures more signal than 5-day
  • +3% threshold โ€” filters noise better than ยฑ2%

Features (112 total)

  1. Technical indicators โ€” RSI(14,28), MACD, Bollinger Bands, ATR, Stochastic, ADX, OBV, VWAP
  2. Returns & trend โ€” lagged log-returns, rolling vol/mean, SMA distances, momentum
  3. Cross-sectional z-scores โ€” z-scores and percentiles within each date across all 50 stocks
  4. Cross-sectional percentiles โ€” percentile rank within each date
  5. Sector-relative z-scores โ€” z-score within each sector per date
  6. Macro โ€” Nifty50 return, India VIX level & change, relative performance
  7. Temporal target encoding โ€” expanding mean BUY rate per ticker/sector (shifted 1 day)
  8. Interaction features โ€” momentumร—VIX, RSIร—volume, distanceร—trend, MACDร—market, etc.

Validation

  • Strict temporal split with 10-day purge gaps
  • Time-decay sample weights โ€” recent data weighted exponentially higher (6-month half-life)
  • No random shuffling ever

Model

  • CatBoost โ€” native categorical handling prevents target leakage from encoding
  • Ordered Target Statistics for ticker (50 levels) and sector (7 levels)

โš ๏ธ Honest Assessment

What This Model Is

  • A weak but genuine directional edge (+14pp above random)
  • A market regime filter โ€” VIX drives most predictive power
  • A relative strength screener โ€” finds stocks outperforming peers in favorable regimes

What This Model Is NOT

  • A standalone trading system โ€” 49% precision means ~half of BUY signals are false
  • Guaranteed profit โ€” backtest only, no transaction costs, slippage, or market impact
  • Regime-agnostic โ€” VIX-heavy models fail when macro dynamics shift

Production Risks

  1. VIX regime shift โ€” model overweighted on fear/greed; retrain quarterly
  2. Non-stationarity โ€” 64% on 2025-2026 test may not hold in future regimes
  3. Transaction costs โ€” 0.5% brokerage + STT + slippage erodes 3% target quickly
  4. Crowding โ€” if many use similar signals, alpha decays

๐Ÿš€ Usage

import catboost as cb
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download("mohan170802/nse-nifty50-swing-predictor-v10", "unified_model_v10.cbm")
model = cb.CatBoostClassifier()
model.load_model(model_path)

# Features list: unified_features_v10.txt (112 features)
# Categorical: ticker, sector

Threshold Recommendations

Strategy Threshold Use Case
Conservative 0.65 Fewer trades, higher precision
Moderate 0.55 Balanced โ€” recommended
Aggressive 0.50 Default, more trades

๐Ÿ“ Files

File Description
unified_model_v10.cbm Trained CatBoost model
unified_features_v10.txt 112 feature names
summary_v10.json Full metrics, top features, hyperparameters
ablation_results.json 8 ablation experiments with feature importance

๐Ÿ”ฎ Recommended Next Steps

Priority Improvement Expected Lift
P0 Add quarterly fundamental data (P/E, ROE, EPS growth) +3-7pp
P1 Rolling walk-forward retraining (not fixed split) Regime adaptation
P1 Ensemble: technical + cross-sectional + macro sub-models +1-3pp
P2 Intraday features (volume profile, order flow) +2-4pp
P2 Sector rotation momentum (relative sector strength) +1-2pp

โš ๏ธ Not financial advice. This is a research model with a weak signal. Use with stop-losses (-2%), position sizing (โ‰ค2% risk per trade), and paper trade for 3+ months before real capital. Markets are non-stationary โ€” past performance does not guarantee future results.

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mohan170802/nse-nifty50-swing-predictor-v10"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support