--- tags: - ml-intern --- # Crypto 15-Minute Direction Classifier A time-series classification model that predicts whether Bitcoin (BTC/USDT) price will move **up** or **down** over the next 15-minute interval using multivariate historical market data. ## Model Overview | Attribute | Value | |-----------|-------| | **Task** | Binary time-series classification | | **Target** | BTC price direction in next 15 minutes (up=1, down=0) | | **Input** | 60 minutes of multivariate OHLCV + technical indicators | | **Assets** | BTC/USDT + ETH/USDT (cross-asset features) | | **Best Model** | Logistic Regression on flattened windows | | **Dataset** | 300K rows of 1-minute candles from WinkingFace CryptoLM datasets | ## Performance | Metric | Value | |--------|-------| | Test Accuracy | 53.1% | | Test F1 | 0.574 | | Test AUC | 0.540 | **Note:** 15-minute crypto price direction prediction is an extremely hard problem due to market efficiency at short timeframes. The model consistently edges above random chance (50%), demonstrating a non-trivial but small signal. This pipeline is valuable as a complete data engineering and feature extraction system for further research. ## Data Sources - [WinkingFace/CryptoLM-Bitcoin-BTC-USDT](https://huggingface.co/datasets/WinkingFace/CryptoLM-Bitcoin-BTC-USDT) - BTC 1-min OHLCV + 15 technical indicators - [WinkingFace/CryptoLM-Ethereum-ETH-USDT](https://huggingface.co/datasets/WinkingFace/CryptoLM-Ethereum-ETH-USDT) - ETH 1-min OHLCV + 15 technical indicators ## Features (49 per timestep) ### BTC & ETH (separately) - Price: `open`, `high`, `low`, `close` - Volume: `volume` - Moving Averages: `MA_20`, `MA_50`, `MA_200` - Momentum: `RSI`, `%K`, `%D`, `ADX`, `ATR` - Trend: `MACD`, `Signal`, `Histogram`, `Trendline` - Volatility: `BL_Upper`, `BL_Lower`, `MN_Upper`, `MN_Lower` ### Cross-Asset Engineered - `eth_btc_ratio` - ETH/BTC price ratio - `btc_ret_1m`, `eth_ret_1m` - 1-minute returns - `btc_vol_ma20`, `eth_vol_ma20` - 20-period volume MA - `btc_range`, `eth_range` - Normalized price range ## Pipeline 1. **Load & Merge** BTC and ETH 1-minute datasets on timestamp 2. **Engineer Features** - Add returns, ratios, ranges, volume MAs 3. **Create Windows** - 60-minute lookback → predict next 15-minute direction 4. **Clean** - Drop NaN/Inf, standardize per-feature 5. **Split** - 70/15/15 temporal train/val/test (no data leakage) 6. **Train** - Logistic Regression + Random Forest baselines ## Usage ```python import pickle import numpy as np # Load model with open("model.pkl", "rb") as f: model = pickle.load(f) # Load preprocessing artifacts mean = np.load("feature_mean.npy") std = np.load("feature_std.npy") valid = np.load("valid_cols.npy") # X shape: (samples, 60 minutes, 49 features) X_flat = X.reshape(X.shape[0], -1) # flatten to 2940 features X_flat = X_flat[:, valid] # keep valid columns X_norm = (X_flat - mean) / std # standardize # Predict preds = model.predict(X_norm) # 0=down, 1=up probs = model.predict_proba(X_norm)[:, 1] # probability of up ``` ## Files | File | Description | |------|-------------| | `model.pkl` | Trained LogisticRegression classifier | | `feature_mean.npy` | Per-feature means for standardization | | `feature_std.npy` | Per-feature standard deviations | | `valid_cols.npy` | Boolean mask of valid (finite) feature columns | | `metrics.json` | Evaluation results | ## Limitations - **Market Efficiency**: 15-min prediction is near-random walk; edge is small - **No Costs**: Evaluation ignores fees, slippage, spread - **Historical Data**: Trained on 2017-2020 data; may not generalize to current regimes - **Simple Models**: Deep learning (Conv-LSTM, TCN, Transformer) may improve results ## Future Work 1. **Deep Learning**: Conv-LSTM, Temporal CNN, or Transformer architectures 2. **More Data**: Order book, funding rates, on-chain metrics, sentiment 3. **Multi-Scale**: Combine 1-min, 5-min, 15-min, 1-hour features 4. **Regime Detection**: Train separate models for bull/bear/sideways markets 5. **Cost-Aware Evaluation**: Incorporate transaction costs in metric ## License MIT License ## Generated by ML Intern This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub. - Try ML Intern: https://smolagents-ml-intern.hf.space - Source code: https://github.com/huggingface/ml-intern