| --- |
| tags: |
| - ml-intern |
| --- |
| # Crypto 15-Minute Direction Classifier |
|
|
| A time-series classification model that predicts whether Bitcoin (BTC/USDT) price will move **up** or **down** over the next 15-minute interval using multivariate historical market data. |
|
|
| ## Model Overview |
|
|
| | Attribute | Value | |
| |-----------|-------| |
| | **Task** | Binary time-series classification | |
| | **Target** | BTC price direction in next 15 minutes (up=1, down=0) | |
| | **Input** | 60 minutes of multivariate OHLCV + technical indicators | |
| | **Assets** | BTC/USDT + ETH/USDT (cross-asset features) | |
| | **Best Model** | Logistic Regression on flattened windows | |
| | **Dataset** | 300K rows of 1-minute candles from WinkingFace CryptoLM datasets | |
|
|
| ## Performance |
|
|
| | Metric | Value | |
| |--------|-------| |
| | Test Accuracy | 53.1% | |
| | Test F1 | 0.574 | |
| | Test AUC | 0.540 | |
|
|
| **Note:** 15-minute crypto price direction prediction is an extremely hard problem due to market efficiency at short timeframes. The model consistently edges above random chance (50%), demonstrating a non-trivial but small signal. This pipeline is valuable as a complete data engineering and feature extraction system for further research. |
|
|
| ## Data Sources |
|
|
| - [WinkingFace/CryptoLM-Bitcoin-BTC-USDT](https://huggingface.co/datasets/WinkingFace/CryptoLM-Bitcoin-BTC-USDT) - BTC 1-min OHLCV + 15 technical indicators |
| - [WinkingFace/CryptoLM-Ethereum-ETH-USDT](https://huggingface.co/datasets/WinkingFace/CryptoLM-Ethereum-ETH-USDT) - ETH 1-min OHLCV + 15 technical indicators |
|
|
| ## Features (49 per timestep) |
|
|
| ### BTC & ETH (separately) |
| - Price: `open`, `high`, `low`, `close` |
| - Volume: `volume` |
| - Moving Averages: `MA_20`, `MA_50`, `MA_200` |
| - Momentum: `RSI`, `%K`, `%D`, `ADX`, `ATR` |
| - Trend: `MACD`, `Signal`, `Histogram`, `Trendline` |
| - Volatility: `BL_Upper`, `BL_Lower`, `MN_Upper`, `MN_Lower` |
|
|
| ### Cross-Asset Engineered |
| - `eth_btc_ratio` - ETH/BTC price ratio |
| - `btc_ret_1m`, `eth_ret_1m` - 1-minute returns |
| - `btc_vol_ma20`, `eth_vol_ma20` - 20-period volume MA |
| - `btc_range`, `eth_range` - Normalized price range |
|
|
| ## Pipeline |
|
|
| 1. **Load & Merge** BTC and ETH 1-minute datasets on timestamp |
| 2. **Engineer Features** - Add returns, ratios, ranges, volume MAs |
| 3. **Create Windows** - 60-minute lookback → predict next 15-minute direction |
| 4. **Clean** - Drop NaN/Inf, standardize per-feature |
| 5. **Split** - 70/15/15 temporal train/val/test (no data leakage) |
| 6. **Train** - Logistic Regression + Random Forest baselines |
|
|
| ## Usage |
|
|
| ```python |
| import pickle |
| import numpy as np |
| |
| # Load model |
| with open("model.pkl", "rb") as f: |
| model = pickle.load(f) |
| |
| # Load preprocessing artifacts |
| mean = np.load("feature_mean.npy") |
| std = np.load("feature_std.npy") |
| valid = np.load("valid_cols.npy") |
| |
| # X shape: (samples, 60 minutes, 49 features) |
| X_flat = X.reshape(X.shape[0], -1) # flatten to 2940 features |
| X_flat = X_flat[:, valid] # keep valid columns |
| X_norm = (X_flat - mean) / std # standardize |
| |
| # Predict |
| preds = model.predict(X_norm) # 0=down, 1=up |
| probs = model.predict_proba(X_norm)[:, 1] # probability of up |
| ``` |
|
|
| ## Files |
|
|
| | File | Description | |
| |------|-------------| |
| | `model.pkl` | Trained LogisticRegression classifier | |
| | `feature_mean.npy` | Per-feature means for standardization | |
| | `feature_std.npy` | Per-feature standard deviations | |
| | `valid_cols.npy` | Boolean mask of valid (finite) feature columns | |
| | `metrics.json` | Evaluation results | |
|
|
| ## Limitations |
|
|
| - **Market Efficiency**: 15-min prediction is near-random walk; edge is small |
| - **No Costs**: Evaluation ignores fees, slippage, spread |
| - **Historical Data**: Trained on 2017-2020 data; may not generalize to current regimes |
| - **Simple Models**: Deep learning (Conv-LSTM, TCN, Transformer) may improve results |
|
|
| ## Future Work |
|
|
| 1. **Deep Learning**: Conv-LSTM, Temporal CNN, or Transformer architectures |
| 2. **More Data**: Order book, funding rates, on-chain metrics, sentiment |
| 3. **Multi-Scale**: Combine 1-min, 5-min, 15-min, 1-hour features |
| 4. **Regime Detection**: Train separate models for bull/bear/sideways markets |
| 5. **Cost-Aware Evaluation**: Incorporate transaction costs in metric |
|
|
| ## License |
|
|
| MIT License |
|
|
| <!-- ml-intern-provenance --> |
| ## Generated by ML Intern |
|
|
| This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub. |
|
|
| - Try ML Intern: https://smolagents-ml-intern.hf.space |
| - Source code: https://github.com/huggingface/ml-intern |
|
|