huntergemmer
/

crypto-15min-direction-classifier

ml-intern

Model card Files Files and versions

xet

Community

huntergemmer commited on 1 day ago

Commit

2b5e46a

verified ·

1 Parent(s): 20c53c6

Upload README.md

Browse files

Files changed (1) hide show

README.md +97 -15

README.md CHANGED Viewed

@@ -1,26 +1,108 @@
----
-tags:
-- ml-intern
----
-# huntergemmer/crypto-15min-direction-classifier
-<!-- ml-intern-provenance -->
-## Generated by ML Intern
-This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.
-- Try ML Intern: https://smolagents-ml-intern.hf.space
-- Source code: https://github.com/huggingface/ml-intern
 ## Usage
 ```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_id = "huntergemmer/crypto-15min-direction-classifier"
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(model_id)
 ```
-For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.

+# Crypto 15-Minute Direction Classifier
+A time-series classification model that predicts whether Bitcoin (BTC/USDT) price will move **up** or **down** over the next 15-minute interval using multivariate historical market data.
+## Model Overview
+| Attribute | Value |
+|-----------|-------|
+| **Task** | Binary time-series classification |
+| **Target** | BTC price direction in next 15 minutes (up=1, down=0) |
+| **Input** | 60 minutes of multivariate OHLCV + technical indicators |
+| **Assets** | BTC/USDT + ETH/USDT (cross-asset features) |
+| **Best Model** | Logistic Regression on flattened windows |
+| **Dataset** | 300K rows of 1-minute candles from WinkingFace CryptoLM datasets |
+## Performance
+| Metric | Value |
+|--------|-------|
+| Test Accuracy | 53.1% |
+| Test F1 | 0.574 |
+| Test AUC | 0.540 |
+**Note:** 15-minute crypto price direction prediction is an extremely hard problem due to market efficiency at short timeframes. The model consistently edges above random chance (50%), demonstrating a non-trivial but small signal. This pipeline is valuable as a complete data engineering and feature extraction system for further research.
+## Data Sources
+- [WinkingFace/CryptoLM-Bitcoin-BTC-USDT](https://huggingface.co/datasets/WinkingFace/CryptoLM-Bitcoin-BTC-USDT) - BTC 1-min OHLCV + 15 technical indicators
+- [WinkingFace/CryptoLM-Ethereum-ETH-USDT](https://huggingface.co/datasets/WinkingFace/CryptoLM-Ethereum-ETH-USDT) - ETH 1-min OHLCV + 15 technical indicators
+## Features (49 per timestep)
+### BTC & ETH (separately)
+- Price: `open`, `high`, `low`, `close`
+- Volume: `volume`
+- Moving Averages: `MA_20`, `MA_50`, `MA_200`
+- Momentum: `RSI`, `%K`, `%D`, `ADX`, `ATR`
+- Trend: `MACD`, `Signal`, `Histogram`, `Trendline`
+- Volatility: `BL_Upper`, `BL_Lower`, `MN_Upper`, `MN_Lower`
+### Cross-Asset Engineered
+- `eth_btc_ratio` - ETH/BTC price ratio
+- `btc_ret_1m`, `eth_ret_1m` - 1-minute returns
+- `btc_vol_ma20`, `eth_vol_ma20` - 20-period volume MA
+- `btc_range`, `eth_range` - Normalized price range
+## Pipeline
+1. **Load & Merge** BTC and ETH 1-minute datasets on timestamp
+2. **Engineer Features** - Add returns, ratios, ranges, volume MAs
+3. **Create Windows** - 60-minute lookback → predict next 15-minute direction
+4. **Clean** - Drop NaN/Inf, standardize per-feature
+5. **Split** - 70/15/15 temporal train/val/test (no data leakage)
+6. **Train** - Logistic Regression + Random Forest baselines
 ## Usage
 ```python
+import pickle
+import numpy as np
+# Load model
+with open("model.pkl", "rb") as f:
+    model = pickle.load(f)
+# Load preprocessing artifacts
+mean = np.load("feature_mean.npy")
+std = np.load("feature_std.npy")
+valid = np.load("valid_cols.npy")
+# X shape: (samples, 60 minutes, 49 features)
+X_flat = X.reshape(X.shape[0], -1)      # flatten to 2940 features
+X_flat = X_flat[:, valid]               # keep valid columns
+X_norm = (X_flat - mean) / std            # standardize
+# Predict
+preds = model.predict(X_norm)            # 0=down, 1=up
+probs = model.predict_proba(X_norm)[:, 1]  # probability of up
 ```
+## Files
+| File | Description |
+|------|-------------|
+| `model.pkl` | Trained LogisticRegression classifier |
+| `feature_mean.npy` | Per-feature means for standardization |
+| `feature_std.npy` | Per-feature standard deviations |
+| `valid_cols.npy` | Boolean mask of valid (finite) feature columns |
+| `metrics.json` | Evaluation results |
+## Limitations
+- **Market Efficiency**: 15-min prediction is near-random walk; edge is small
+- **No Costs**: Evaluation ignores fees, slippage, spread
+- **Historical Data**: Trained on 2017-2020 data; may not generalize to current regimes
+- **Simple Models**: Deep learning (Conv-LSTM, TCN, Transformer) may improve results
+## Future Work
+1. **Deep Learning**: Conv-LSTM, Temporal CNN, or Transformer architectures
+2. **More Data**: Order book, funding rates, on-chain metrics, sentiment
+3. **Multi-Scale**: Combine 1-min, 5-min, 15-min, 1-hour features
+4. **Regime Detection**: Train separate models for bull/bear/sideways markets
+5. **Cost-Aware Evaluation**: Incorporate transaction costs in metric
+## License
+MIT License