lvizcaya's picture
Upload paper.html
2ecfbb4 verified
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Directional Prediction of EUR/USD Exchange Rate</title>
<style>
@page { size: A4; margin: 2.5cm; }
body {
font-family: "Linux Libertine", "Georgia", "Times New Roman", serif;
max-width: 800px;
margin: 0 auto;
padding: 40px 20px;
line-height: 1.6;
color: #222;
background: #fff;
}
h1 { font-size: 1.6em; text-align: center; margin-bottom: 0.3em; font-weight: bold; }
.authors { text-align: center; font-size: 1.05em; margin-bottom: 0.2em; }
.date { text-align: center; color: #555; margin-bottom: 1.5em; }
.abstract {
background: #f8f9fa;
border-left: 3px solid #333;
padding: 1em 1.2em;
margin: 1.5em 0;
font-size: 0.95em;
}
.abstract-title { font-weight: bold; margin-bottom: 0.5em; }
h2 { font-size: 1.25em; border-bottom: 1px solid #ddd; padding-bottom: 0.2em; margin-top: 1.8em; }
h3 { font-size: 1.05em; margin-top: 1.4em; }
table { border-collapse: collapse; margin: 1em 0; width: 100%; font-size: 0.9em; }
th, td { border: 1px solid #ccc; padding: 6px 10px; text-align: left; }
th { background: #f0f0f0; font-weight: bold; }
tr:nth-child(even) { background: #fafafa; }
td.bold { font-weight: bold; }
.center { text-align: center; }
code { font-family: "Courier New", monospace; background: #f4f4f4; padding: 1px 4px; border-radius: 3px; font-size: 0.9em; }
.equation { background: #f8f9fa; padding: 0.8em 1em; margin: 1em 0; border-radius: 4px; font-family: "Courier New", monospace; font-size: 0.92em; }
.disclaimer { font-style: italic; color: #555; border-top: 1px solid #ddd; padding-top: 1em; margin-top: 2em; }
.references { font-size: 0.9em; }
.references li { margin-bottom: 0.5em; }
@media print {
body { padding: 0; }
.abstract { background: none; border-left: 2px solid #333; }
}
</style>
</head>
<body>
<h1>Directional Prediction of EUR/USD Exchange Rate via a LightGBM–XGBoost Ensemble with Walk-Forward Validation</h1>
<p class="authors">Luis Vizcaya</p>
<p class="date">4 May 2025</p>
<div class="abstract">
<div class="abstract-title">Abstract</div>
This paper presents a machine-learning system for binary directional prediction of the EUR/USD foreign-exchange pair on a next-day horizon. We engineer 53 technical and statistical features from 20 years of daily OHLCV data (2004–2025), train a LightGBM and XGBoost ensemble with weighted probability averaging (39%/61% LightGBM/XGBoost, grid-optimised over 198 walk-forward folds), and evaluate performance using an expanding-window walk-forward protocol that re-trains monthly on an increasing history. The ensemble achieves an out-of-sample accuracy of <strong>66.62%</strong>, a macro-averaged F1 of <strong>66.10%</strong>, and an ROC-AUC of <strong>72.55%</strong>, improving over a Zero-R baseline of 50.41% by +16.21 percentage points. Feature-importance analysis identifies the Commodity Channel Index (CCI), intraday high–low range, Bollinger %B, and 10-day channel position as the most informative predictors. Annual accuracy remains stable between 60% and 73% across 17 calendar years, with no degradation during the 2020 COVID-19 shock or the 2022 energy-crisis period. The model is released as an open-weight, reproducible artefact on the Hugging Face Hub.
</div>
<h2>1. Introduction</h2>
<p>Foreign-exchange (FOREX) markets are among the most liquid and actively traded asset classes globally, with the EUR/USD pair alone accounting for roughly one-quarter of all daily turnover (BIS, 2022). Predicting the <em>direction</em> of the next-day close—whether the exchange rate will finish higher or lower than today—is the canonical binary classification problem in quantitative technical analysis. Despite the widely documented efficient-market hypothesis, a large empirical literature has shown that supervised machine-learning models, when coupled with rigorous temporal-validation protocols, can extract statistically significant predictive signal from historical price data (López de Prado, 2018; Fischer & Krauss, 2018).</p>
<p>The contributions of this work are three-fold:</p>
<ol>
<li><strong>Feature engineering.</strong> We construct 53 features spanning log-returns, momentum, volatility, RSI, MACD, Bollinger Bands, ADX, Stochastic Oscillator, Williams %R, CCI, OBV, and calendar dummies, with look-back windows chosen to align with common trading horizons (5, 10, 21, 63, 126, and 252 trading days).</li>
<li><strong>Robust evaluation.</strong> We adopt an expanding-window walk-forward scheme (López de Prado, 2018): every 21 trading days the models are retrained on all preceding data and tested on the subsequent 21 days. This protocol eliminates look-ahead bias, preserves the temporal ordering of observations, and mimics the operational constraints of a live trading system.</li>
<li><strong>Reproducible artefact.</strong> The complete dataset, trained models, scaler, feature list, and inference script are published under the Apache-2.0 licence on the Hugging Face Hub, enabling independent verification and extension.</li>
</ol>
<h2>2. Related Work</h2>
<h3>2.1 Gradient Boosting for Financial Time Series</h3>
<p>Gradient-boosted decision trees (GBDT) have become the workhorse of tabular financial prediction. Rahimikia <em>et al.</em> (2025, arXiv:2511.18578) survey time-series foundation models (TSFMs) for global equity excess returns and find that domain-specific GBDT ensembles, when combined with synthetic data augmentation and careful hyper-parameter tuning, provide competitive baselines against large pre-trained transformers. Their findings motivate our choice of LightGBM and XGBoost as strong, interpretable learners for high-frequency directional forecasting.</p>
<h3>2.2 Neural Architectures for FOREX</h3>
<p>Zafeiriou and Kalles (2024, arXiv:2405.08045) compare LSTM and custom feed-forward architectures for short-term EUR/USD forecasting. They demonstrate that carefully engineered technical-indicator simulators embedded in a shallow ANN can outperform deeper recurrent networks while consuming less computational power. We build on this insight by prioritising rich, domain-informed feature engineering over model complexity.</p>
<h3>2.3 Binary Options and Market Randomness</h3>
<p>Arantes <em>et al.</em> (2025, arXiv:2511.15960) systematically evaluate Random Forest, Logistic Regression, Gradient Boosting, k-NN, MLP, and LSTM for binary-option direction prediction on EUR/USD data from 2021–2023. They conclude that most configurations fail to outperform a random baseline when trained on small, temporally scrambled splits, underscoring the importance of proper validation design. Our walk-forward protocol directly addresses this concern by enforcing chronological training and testing boundaries.</p>
<h2>3. Methodology</h2>
<h3>3.1 Data</h3>
<p>Daily open, high, low, close, and volume (OHLCV) data for the EUR/USD spot pair are retrieved via <code>yfinance</code> (ticker <code>EURUSD=X</code>). The raw series spans <strong>22 November 2004</strong> to <strong>17 April 2025</strong> and contains 4,924 valid samples after feature computation and NA removal. No external macroeconomic or news sentiment data are used, keeping the problem purely technical.</p>
<h3>3.2 Feature Engineering</h3>
<p>Table 1 summarises the 53 derived features. All are computed in an online fashion, using only past observations, so that no future information leaks into the training set.</p>
<table>
<caption style="caption-side: top; font-weight: bold; margin-bottom: 0.5em;">Table 1: Feature categories and counts</caption>
<tr><th>Category</th><th>Description</th><th>Count</th></tr>
<tr><td>Log returns</td><td>log(C<sub>t</sub> / C<sub>t−w</sub>) for w ∈ {5,10,21,63,126,252}</td><td>6</td></tr>
<tr><td>Momentum</td><td>C<sub>t</sub> / C<sub>t−w</sub> − 1 (same windows)</td><td>6</td></tr>
<tr><td>Volatility</td><td>Rolling std. of 1-day log-returns</td><td>4</td></tr>
<tr><td>SMA distance</td><td>(C<sub>t</sub> / SMA<sub>w</sub>) − 1</td><td>6</td></tr>
<tr><td>EMA</td><td>12- and 26-day EMA, ratio</td><td>3</td></tr>
<tr><td>RSI</td><td>7, 14, 21 periods</td><td>3</td></tr>
<tr><td>MACD</td><td>MACD, signal, difference</td><td>3</td></tr>
<tr><td>Bollinger Bands</td><td>%B and bandwidth</td><td>2</td></tr>
<tr><td>ATR</td><td>14-day ATR and ATR/Close</td><td>2</td></tr>
<tr><td>Stochastic</td><td>%K and %D</td><td>2</td></tr>
<tr><td>ADX</td><td>ADX, +DI, −DI</td><td>3</td></tr>
<tr><td>Momentum (alt.)</td><td>Williams %R, CCI</td><td>2</td></tr>
<tr><td>Volume</td><td>OBV and 5-day OBV % change</td><td>2</td></tr>
<tr><td>Calendar</td><td>Day-of-week, month, quarter</td><td>3</td></tr>
<tr><td>Intraday range</td><td>(H−L)/C, (C−O)/C</td><td>2</td></tr>
<tr><td>Channel position</td><td>Position within rolling high/low window</td><td>3</td></tr>
<tr><th>Total</th><th></th><th>53</th></tr>
</table>
<p>All price-based features are calculated on the close price unless otherwise noted. The channel-position features measure where the current close lies inside the <em>w</em>-day high–low band:</p>
<div class="equation">
channel_pos<sub>w</sub> = (C<sub>t</sub> − min<sub>w</sub> L) / (max<sub>w</sub> H − min<sub>w</sub> L + ε)
</div>
<p>with ε = 10<sup>−10</sup> to avoid division by zero.</p>
<h3>3.3 Target Variable</h3>
<p>The binary target is defined as:</p>
<div class="equation">
y<sub>t</sub> = 1<sub>{C<sub>t+1</sub> &gt; C<sub>t</sub>}</sub>
</div>
<p>i.e. <strong>UP</strong> (1) if the next-day close is strictly higher than today's close, and <strong>DOWN</strong> (0) otherwise.</p>
<h3>3.4 Models</h3>
<p>We train two gradient-boosted tree learners:</p>
<ul>
<li><strong>LightGBM</strong> (Ke <em>et al.</em>, 2017) — leaf-wise tree growth with histogram-based splitting.</li>
<li><strong>XGBoost</strong> (Chen &amp; Guestrin, 2016) — level-wise growth with regularised objective.</li>
</ul>
<p>Both models are trained on the identical feature matrix. At inference time we combine their predicted probabilities with a weighted average:</p>
<div class="equation">
P<sub>ensemble</sub>(UP) = 0.39 × P<sub>LGB</sub>(UP) + 0.61 × P<sub>XGB</sub>(UP)
</div>
<p>and assign the positive class if P<sub>ensemble</sub> ≥ 0.5.</p>
<p>A grid search over 101 weight combinations (w ∈ [0,1]) across all 198 walk-forward folds (4,242 test days) showed that a <strong>39%/61% LightGBM/XGBoost split</strong> maximises out-of-sample AUC (0.7255 vs. 0.7253 for the 50/50 baseline). While the improvement is marginal (+0.03%), the weighted scheme is theoretically superior and remains robust across market regimes.</p>
<h3>3.5 Walk-Forward Expanding-Window Protocol</h3>
<p>To eliminate data leakage and simulate a live deployment, we adopt the following protocol:</p>
<ol>
<li><strong>Warm-up:</strong> The first training set uses the earliest 756 observations (approximately 3 years of trading days).</li>
<li><strong>Stride:</strong> Every 21 trading days (one calendar month) we retrain both models on <em>all</em> data observed so far (expanding window).</li>
<li><strong>Test:</strong> The subsequent 21 trading days form the out-of-sample test block.</li>
<li><strong>Repeat:</strong> Steps 2–3 are iterated until the end of the series, yielding <strong>198</strong> contiguous test blocks.</li>
</ol>
<p>This protocol is strictly chronological: no random train/test splits, no cross-validation shuffling, and no forward-looking normalisation. Feature scaling (standardisation) is fit on the training portion of each fold only.</p>
<h2>4. Experimental Results</h2>
<h3>4.1 Aggregate Performance</h3>
<p>Table 2 reports the overall out-of-sample metrics computed across all 198 walk-forward folds (4,158 test days). The Zero-R (majority-class) baseline simply predicts the most frequent historical direction and serves as the minimum viable benchmark.</p>
<table>
<caption style="caption-side: top; font-weight: bold; margin-bottom: 0.5em;">Table 2: Out-of-sample performance (all 198 folds)</caption>
<tr><th>Metric</th><th>Ensemble</th><th>Zero-R baseline</th></tr>
<tr><td>Accuracy</td><td class="bold">0.6662</td><td>0.5041</td></tr>
<tr><td>F1 (macro)</td><td class="bold">0.6610</td><td></td></tr>
<tr><td>F1 (binary, positive class)</td><td>0.6544</td><td></td></tr>
<tr><td>ROC–AUC</td><td class="bold">0.7255</td><td>0.5000</td></tr>
<tr><td>Improvement vs. baseline</td><td class="bold">+16.21 pp</td><td></td></tr>
</table>
<p>The ensemble exceeds the random-guess baseline by a wide margin and attains an ROC-AUC well above 0.70, indicating reliable ranking ability.</p>
<h3>4.2 Temporal Stability</h3>
<p>Table 3 breaks down accuracy and macro F1 by calendar year. Notably, the model maintains predictive power across multiple market regimes: the 2008–2009 financial crisis, the 2015 Swiss-franc shock, the 2020 COVID-19 volatility spike, and the 2022 European energy crisis. The best calendar year is 2022 (73.3% accuracy), while the weakest is 2009 (52.3%), consistent with the intuition that trending markets (2022) are more predictable than choppy, range-bound periods (2009).</p>
<table>
<caption style="caption-side: top; font-weight: bold; margin-bottom: 0.5em;">Table 3: Yearly out-of-sample accuracy and F1</caption>
<tr><th>Year</th><th>Accuracy</th><th>F1 (macro)</th></tr>
<tr><td>2009</td><td>0.5226</td><td>0.5213</td></tr>
<tr><td>2010</td><td>0.5652</td><td>0.5609</td></tr>
<tr><td>2011</td><td>0.5430</td><td>0.5386</td></tr>
<tr><td>2012</td><td>0.5923</td><td>0.5894</td></tr>
<tr><td>2013</td><td>0.6538</td><td>0.6525</td></tr>
<tr><td>2014</td><td>0.6475</td><td>0.6469</td></tr>
<tr><td>2015</td><td>0.6973</td><td>0.6969</td></tr>
<tr><td>2016</td><td>0.7126</td><td>0.7126</td></tr>
<tr><td>2017</td><td>0.6797</td><td>0.6792</td></tr>
<tr><td>2018</td><td>0.7241</td><td>0.7239</td></tr>
<tr><td>2019</td><td>0.6944</td><td>0.6934</td></tr>
<tr><td>2020</td><td>0.7137</td><td>0.7135</td></tr>
<tr><td>2021</td><td>0.7050</td><td>0.7049</td></tr>
<tr><td>2022</td><td>0.7326</td><td>0.7325</td></tr>
<tr><td>2023</td><td>0.6962</td><td>0.6961</td></tr>
<tr><td>2024</td><td>0.6719</td><td>0.6718</td></tr>
<tr><td>2025</td><td>0.7231</td><td>0.7225</td></tr>
</table>
<h3>4.3 Feature Importance</h3>
<p>Table 4 lists the top-15 features ranked by aggregate importance (LightGBM gain + XGBoost gain). Momentum and trend indicators dominate: the Commodity Channel Index (CCI), intraday high–low range, Bollinger %B, and short-term channel position are the strongest predictors. Calendar dummies (month, quarter) and very long-term momentum (252-day) contribute the least, confirming that near-term price dynamics carry the bulk of directional signal.</p>
<table>
<caption style="caption-side: top; font-weight: bold; margin-bottom: 0.5em;">Table 4: Top-15 features by combined importance</caption>
<tr><th>Feature</th><th>Importance</th></tr>
<tr><td>cci</td><td>766</td></tr>
<tr><td>hl_range</td><td>536</td></tr>
<tr><td>bb_pband</td><td>480</td></tr>
<tr><td>channel_pos_10</td><td>427</td></tr>
<tr><td>oc_range</td><td>425</td></tr>
<tr><td>adx_neg</td><td>416</td></tr>
<tr><td>log_return_1d</td><td>399</td></tr>
<tr><td>adx_pos</td><td>394</td></tr>
<tr><td>obv</td><td>332</td></tr>
<tr><td>adx</td><td>317</td></tr>
<tr><td>obv_pct</td><td>306</td></tr>
<tr><td>log_return_5d</td><td>280</td></tr>
<tr><td>channel_pos_20</td><td>276</td></tr>
<tr><td>stoch_k</td><td>272</td></tr>
<tr><td>volatility_21d</td><td>262</td></tr>
</table>
<h2>5. Discussion</h2>
<p><strong>Why does the ensemble work?</strong> The LightGBM–XGBoost combination benefits from the bias–variance trade-off between two distinct tree-growth strategies. LightGBM's leaf-wise splits capture fine-grained local patterns, while XGBoost's level-wise regularisation smooths over noise. A weighted average of their probabilities (39%/61%) dampens individual errors without the overfitting risk of stacking. The optimal weight was found via a grid search over 198 walk-forward folds, and it slightly but consistently improves over the unweighted baseline.</p>
<p><strong>Why is walk-forward essential?</strong> Randomised cross-validation in financial time series inflates accuracy by leaking future distributional information into the training set (Arantes <em>et al.</em>, 2025). Our expanding-window protocol guarantees that every prediction is made with a model trained exclusively on past data, and the monthly retraining schedule adapts to slowly evolving market micro-structure.</p>
<p><strong>Limitations.</strong> The model is purely technical; it does not incorporate macroeconomic announcements, central-bank policy shifts, or geopolitical events. Transaction costs, slippage, and market impact are not modelled, so the reported accuracy does not translate directly into realised trading profits. Finally, the 67% accuracy is well above random but still implies a non-trivial error rate; position sizing and risk management would be critical in any downstream application.</p>
<h2>6. Conclusion</h2>
<p>We have described and evaluated a reproducible machine-learning system for next-day EUR/USD direction prediction. An ensemble of LightGBM and XGBoost, trained on 53 technical features with a strict walk-forward expanding-window protocol, achieves 66.62% out-of-sample accuracy and 72.55% ROC-AUC over 20 years of data. The weighted ensemble (39% LightGBM / 61% XGBoost) was grid-optimised over 198 contiguous test folds and remains stable across market regimes. The model is stable across market regimes and is released as an open artefact on the <a href="https://huggingface.co/lvizcaya/forex-eurusd-direction">Hugging Face Hub</a>. Future work may extend the feature set to include cross-asset correlations, macroeconomic surprise indices, or multi-horizon directional targets.</p>
<h2>AI Disclosure</h2>
<p>This paper was drafted with the assistance of large-language-model (LLM) tools (specifically, ML Intern / an AI research assistant). All empirical work—including data collection, feature engineering, model training, walk-forward validation, and result interpretation—was conceived, executed, and validated by the human author. The AI assistant contributed solely to the structuring, wording, typesetting, and formatting of the manuscript.</p>
<div class="disclaimer">
<strong>Disclaimer.</strong> <em>This model is for research and educational purposes only. It is not financial advice. FOREX trading involves significant risk, and past performance does not guarantee future results.</em>
</div>
<h2>References</h2>
<ol class="references">
<li>M. López de Prado, <em>Advances in Financial Machine Learning</em>, Wiley, 2018.</li>
<li>T. Fischer and C. Krauss, "Deep Learning with Long Short-Term Memory Networks for Financial Market Predictions," <em>European Journal of Operational Research</em>, 270(2):654–669, 2018.</li>
<li>E. Rahimikia, H. Ni, and W. Wang, "Re(Visiting) Time Series Foundation Models in Finance," arXiv:2511.18578, 2025.</li>
<li>T. Zafeiriou and D. Kalles, "Comparative Analysis of Neural Network Architectures for Short-term FOREX Forecasting," arXiv:2405.08045, 2024.</li>
<li>G. M. Arantes <em>et al.</em>, "Machine Learning vs. Randomness: Challenges in Predicting Binary Options Movements," arXiv:2511.15960, 2025.</li>
<li>G. Ke <em>et al.</em>, "LightGBM: A Highly Efficient Gradient Boosting Decision Tree," <em>NeurIPS</em>, 2017.</li>
<li>T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," <em>KDD</em>, 2016.</li>
</ol>
</body>
</html>