File size: 4,506 Bytes
22c9e3e 2b5e46a 20c53c6 2b5e46a 20c53c6 2b5e46a 20c53c6 2b5e46a 20c53c6 2b5e46a 20c53c6 2b5e46a 20c53c6 2b5e46a 20c53c6 2b5e46a 22c9e3e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | ---
tags:
- ml-intern
---
# Crypto 15-Minute Direction Classifier
A time-series classification model that predicts whether Bitcoin (BTC/USDT) price will move **up** or **down** over the next 15-minute interval using multivariate historical market data.
## Model Overview
| Attribute | Value |
|-----------|-------|
| **Task** | Binary time-series classification |
| **Target** | BTC price direction in next 15 minutes (up=1, down=0) |
| **Input** | 60 minutes of multivariate OHLCV + technical indicators |
| **Assets** | BTC/USDT + ETH/USDT (cross-asset features) |
| **Best Model** | Logistic Regression on flattened windows |
| **Dataset** | 300K rows of 1-minute candles from WinkingFace CryptoLM datasets |
## Performance
| Metric | Value |
|--------|-------|
| Test Accuracy | 53.1% |
| Test F1 | 0.574 |
| Test AUC | 0.540 |
**Note:** 15-minute crypto price direction prediction is an extremely hard problem due to market efficiency at short timeframes. The model consistently edges above random chance (50%), demonstrating a non-trivial but small signal. This pipeline is valuable as a complete data engineering and feature extraction system for further research.
## Data Sources
- [WinkingFace/CryptoLM-Bitcoin-BTC-USDT](https://huggingface.co/datasets/WinkingFace/CryptoLM-Bitcoin-BTC-USDT) - BTC 1-min OHLCV + 15 technical indicators
- [WinkingFace/CryptoLM-Ethereum-ETH-USDT](https://huggingface.co/datasets/WinkingFace/CryptoLM-Ethereum-ETH-USDT) - ETH 1-min OHLCV + 15 technical indicators
## Features (49 per timestep)
### BTC & ETH (separately)
- Price: `open`, `high`, `low`, `close`
- Volume: `volume`
- Moving Averages: `MA_20`, `MA_50`, `MA_200`
- Momentum: `RSI`, `%K`, `%D`, `ADX`, `ATR`
- Trend: `MACD`, `Signal`, `Histogram`, `Trendline`
- Volatility: `BL_Upper`, `BL_Lower`, `MN_Upper`, `MN_Lower`
### Cross-Asset Engineered
- `eth_btc_ratio` - ETH/BTC price ratio
- `btc_ret_1m`, `eth_ret_1m` - 1-minute returns
- `btc_vol_ma20`, `eth_vol_ma20` - 20-period volume MA
- `btc_range`, `eth_range` - Normalized price range
## Pipeline
1. **Load & Merge** BTC and ETH 1-minute datasets on timestamp
2. **Engineer Features** - Add returns, ratios, ranges, volume MAs
3. **Create Windows** - 60-minute lookback → predict next 15-minute direction
4. **Clean** - Drop NaN/Inf, standardize per-feature
5. **Split** - 70/15/15 temporal train/val/test (no data leakage)
6. **Train** - Logistic Regression + Random Forest baselines
## Usage
```python
import pickle
import numpy as np
# Load model
with open("model.pkl", "rb") as f:
model = pickle.load(f)
# Load preprocessing artifacts
mean = np.load("feature_mean.npy")
std = np.load("feature_std.npy")
valid = np.load("valid_cols.npy")
# X shape: (samples, 60 minutes, 49 features)
X_flat = X.reshape(X.shape[0], -1) # flatten to 2940 features
X_flat = X_flat[:, valid] # keep valid columns
X_norm = (X_flat - mean) / std # standardize
# Predict
preds = model.predict(X_norm) # 0=down, 1=up
probs = model.predict_proba(X_norm)[:, 1] # probability of up
```
## Files
| File | Description |
|------|-------------|
| `model.pkl` | Trained LogisticRegression classifier |
| `feature_mean.npy` | Per-feature means for standardization |
| `feature_std.npy` | Per-feature standard deviations |
| `valid_cols.npy` | Boolean mask of valid (finite) feature columns |
| `metrics.json` | Evaluation results |
## Limitations
- **Market Efficiency**: 15-min prediction is near-random walk; edge is small
- **No Costs**: Evaluation ignores fees, slippage, spread
- **Historical Data**: Trained on 2017-2020 data; may not generalize to current regimes
- **Simple Models**: Deep learning (Conv-LSTM, TCN, Transformer) may improve results
## Future Work
1. **Deep Learning**: Conv-LSTM, Temporal CNN, or Transformer architectures
2. **More Data**: Order book, funding rates, on-chain metrics, sentiment
3. **Multi-Scale**: Combine 1-min, 5-min, 15-min, 1-hour features
4. **Regime Detection**: Train separate models for bull/bear/sideways markets
5. **Cost-Aware Evaluation**: Incorporate transaction costs in metric
## License
MIT License
<!-- ml-intern-provenance -->
## Generated by ML Intern
This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
|