huntergemmer commited on
Commit
2b5e46a
·
verified ·
1 Parent(s): 20c53c6

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -15
README.md CHANGED
@@ -1,26 +1,108 @@
1
- ---
2
- tags:
3
- - ml-intern
4
- ---
5
 
6
- # huntergemmer/crypto-15min-direction-classifier
7
 
8
- <!-- ml-intern-provenance -->
9
- ## Generated by ML Intern
10
 
11
- This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.
 
 
 
 
 
 
 
12
 
13
- - Try ML Intern: https://smolagents-ml-intern.hf.space
14
- - Source code: https://github.com/huggingface/ml-intern
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ## Usage
17
 
18
  ```python
19
- from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
 
 
 
20
 
21
- model_id = "huntergemmer/crypto-15min-direction-classifier"
22
- tokenizer = AutoTokenizer.from_pretrained(model_id)
23
- model = AutoModelForCausalLM.from_pretrained(model_id)
 
 
 
 
 
 
 
 
 
 
24
  ```
25
 
26
- For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Crypto 15-Minute Direction Classifier
 
 
 
2
 
3
+ A time-series classification model that predicts whether Bitcoin (BTC/USDT) price will move **up** or **down** over the next 15-minute interval using multivariate historical market data.
4
 
5
+ ## Model Overview
 
6
 
7
+ | Attribute | Value |
8
+ |-----------|-------|
9
+ | **Task** | Binary time-series classification |
10
+ | **Target** | BTC price direction in next 15 minutes (up=1, down=0) |
11
+ | **Input** | 60 minutes of multivariate OHLCV + technical indicators |
12
+ | **Assets** | BTC/USDT + ETH/USDT (cross-asset features) |
13
+ | **Best Model** | Logistic Regression on flattened windows |
14
+ | **Dataset** | 300K rows of 1-minute candles from WinkingFace CryptoLM datasets |
15
 
16
+ ## Performance
17
+
18
+ | Metric | Value |
19
+ |--------|-------|
20
+ | Test Accuracy | 53.1% |
21
+ | Test F1 | 0.574 |
22
+ | Test AUC | 0.540 |
23
+
24
+ **Note:** 15-minute crypto price direction prediction is an extremely hard problem due to market efficiency at short timeframes. The model consistently edges above random chance (50%), demonstrating a non-trivial but small signal. This pipeline is valuable as a complete data engineering and feature extraction system for further research.
25
+
26
+ ## Data Sources
27
+
28
+ - [WinkingFace/CryptoLM-Bitcoin-BTC-USDT](https://huggingface.co/datasets/WinkingFace/CryptoLM-Bitcoin-BTC-USDT) - BTC 1-min OHLCV + 15 technical indicators
29
+ - [WinkingFace/CryptoLM-Ethereum-ETH-USDT](https://huggingface.co/datasets/WinkingFace/CryptoLM-Ethereum-ETH-USDT) - ETH 1-min OHLCV + 15 technical indicators
30
+
31
+ ## Features (49 per timestep)
32
+
33
+ ### BTC & ETH (separately)
34
+ - Price: `open`, `high`, `low`, `close`
35
+ - Volume: `volume`
36
+ - Moving Averages: `MA_20`, `MA_50`, `MA_200`
37
+ - Momentum: `RSI`, `%K`, `%D`, `ADX`, `ATR`
38
+ - Trend: `MACD`, `Signal`, `Histogram`, `Trendline`
39
+ - Volatility: `BL_Upper`, `BL_Lower`, `MN_Upper`, `MN_Lower`
40
+
41
+ ### Cross-Asset Engineered
42
+ - `eth_btc_ratio` - ETH/BTC price ratio
43
+ - `btc_ret_1m`, `eth_ret_1m` - 1-minute returns
44
+ - `btc_vol_ma20`, `eth_vol_ma20` - 20-period volume MA
45
+ - `btc_range`, `eth_range` - Normalized price range
46
+
47
+ ## Pipeline
48
+
49
+ 1. **Load & Merge** BTC and ETH 1-minute datasets on timestamp
50
+ 2. **Engineer Features** - Add returns, ratios, ranges, volume MAs
51
+ 3. **Create Windows** - 60-minute lookback → predict next 15-minute direction
52
+ 4. **Clean** - Drop NaN/Inf, standardize per-feature
53
+ 5. **Split** - 70/15/15 temporal train/val/test (no data leakage)
54
+ 6. **Train** - Logistic Regression + Random Forest baselines
55
 
56
  ## Usage
57
 
58
  ```python
59
+ import pickle
60
+ import numpy as np
61
+
62
+ # Load model
63
+ with open("model.pkl", "rb") as f:
64
+ model = pickle.load(f)
65
 
66
+ # Load preprocessing artifacts
67
+ mean = np.load("feature_mean.npy")
68
+ std = np.load("feature_std.npy")
69
+ valid = np.load("valid_cols.npy")
70
+
71
+ # X shape: (samples, 60 minutes, 49 features)
72
+ X_flat = X.reshape(X.shape[0], -1) # flatten to 2940 features
73
+ X_flat = X_flat[:, valid] # keep valid columns
74
+ X_norm = (X_flat - mean) / std # standardize
75
+
76
+ # Predict
77
+ preds = model.predict(X_norm) # 0=down, 1=up
78
+ probs = model.predict_proba(X_norm)[:, 1] # probability of up
79
  ```
80
 
81
+ ## Files
82
+
83
+ | File | Description |
84
+ |------|-------------|
85
+ | `model.pkl` | Trained LogisticRegression classifier |
86
+ | `feature_mean.npy` | Per-feature means for standardization |
87
+ | `feature_std.npy` | Per-feature standard deviations |
88
+ | `valid_cols.npy` | Boolean mask of valid (finite) feature columns |
89
+ | `metrics.json` | Evaluation results |
90
+
91
+ ## Limitations
92
+
93
+ - **Market Efficiency**: 15-min prediction is near-random walk; edge is small
94
+ - **No Costs**: Evaluation ignores fees, slippage, spread
95
+ - **Historical Data**: Trained on 2017-2020 data; may not generalize to current regimes
96
+ - **Simple Models**: Deep learning (Conv-LSTM, TCN, Transformer) may improve results
97
+
98
+ ## Future Work
99
+
100
+ 1. **Deep Learning**: Conv-LSTM, Temporal CNN, or Transformer architectures
101
+ 2. **More Data**: Order book, funding rates, on-chain metrics, sentiment
102
+ 3. **Multi-Scale**: Combine 1-min, 5-min, 15-min, 1-hour features
103
+ 4. **Regime Detection**: Train separate models for bull/bear/sideways markets
104
+ 5. **Cost-Aware Evaluation**: Incorporate transaction costs in metric
105
+
106
+ ## License
107
+
108
+ MIT License