Text Classification
Transformers
Safetensors
finance
sentiment-analysis
market-impact
gated-fusion
multitask-learning
event-study
kyLELEng commited on
Commit
59859a9
·
verified ·
1 Parent(s): bf2cae5

Train FinImpact Direction1D V4

Browse files
README.md ADDED
@@ -0,0 +1,419 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - finance
5
+ - sentiment-analysis
6
+ - text-classification
7
+ - market-impact
8
+ - gated-fusion
9
+ - multitask-learning
10
+ - event-study
11
+ datasets:
12
+ - NickyNicky/finance-financialmodelingprep-stock-news-sentiments-rss-feed
13
+ - siddharthmb/stocks-ohlcv
14
+ license: mit
15
+ ---
16
+
17
+ # FinImpact Direction1D V4
18
+
19
+ This repository contains a custom Hugging Face/PyTorch model for financial text impact modeling.
20
+ It combines a financial language encoder with market context features through a gated fusion head.
21
+ This v4 checkpoint is trained primarily as a 1-day market-impact signal model and adds return-aware auxiliary losses for directional-score alignment and confidence calibration.
22
+
23
+ ## Task
24
+
25
+ Input:
26
+
27
+ - Financial news title/body
28
+ - Symbol-level market context available before the event date
29
+
30
+ Outputs:
31
+
32
+ - `direction_1d`: bearish / neutral / bullish future 1-day abnormal market reaction
33
+ - auxiliary diagnostics: `sentiment`, `direction_5d`, and `volatility` heads are still exported, but their default loss weights are zero in this v4 run.
34
+
35
+ ## Architecture
36
+
37
+ - Encoder: `microsoft/deberta-v3-large`
38
+ - Pooling: attention-mask-aware mean pooling over `last_hidden_state`
39
+ - Text branch: LayerNorm -> Linear -> GELU -> Dropout
40
+ - Numeric branch: LayerNorm -> Linear -> GELU -> Dropout
41
+ - Fusion: learned sigmoid gate between text and numeric branches
42
+ - Heads: 1D direction classification head plus auxiliary diagnostic heads
43
+ - Return-aware training: `P(bullish) - P(bearish)` is softly aligned to realized 1-day abnormal return sign/magnitude
44
+
45
+ This is intentionally not an embedding-only classifier. The model fine-tunes a contextual encoder and learns a supervised 1-day market-impact boundary.
46
+
47
+ ## Training Data
48
+
49
+ News source:
50
+
51
+ - `NickyNicky/finance-financialmodelingprep-stock-news-sentiments-rss-feed`
52
+ - Fields used: `symbol`, `publishedDate`, `title`, `text`, `sentiment`, `sentimentScore`
53
+
54
+ Price source:
55
+
56
+ - `siddharthmb/stocks-ohlcv`
57
+ - Daily close data used to build lagged market features and future-return labels.
58
+
59
+ Training rows after join: `13930`
60
+
61
+ Time split:
62
+
63
+ - train: `10865`
64
+ - validation: `1532`
65
+ - test: `1533`
66
+
67
+ Symbols:
68
+
69
+ `AAPL, ABNB, ADBE, ADI, ADP, AMAT, AMD, AMZN, ARKK, ASML, AVGO, BA, BABA, BAC, BIDU, BMY, CAT, CMCSA, COIN, COST, CRM, CSCO, CVX, DHR, DIS, ELF, GE, GM, GOOGL, HD, IBM, INTC, IOVA, JNJ, JPM, KO, LULU, MA, MCD, MDB, META, MRNA, MSFT, MU, NFLX, NKE, NVDA, ORCL, PEP, PFE, PLTR, PYPL, QCOM, RCL, SHOP, SNOW, SPOT, TGT, TSLA, TSM, TXN, UAL, UBER, UNH, UPS, WMT, XOM, ZM`
70
+
71
+ ## Numeric Features
72
+
73
+ - `source_sentiment_score`
74
+ - `sentiment_abs_score`
75
+ - `site_benzinga`
76
+ - `site_globenewswire`
77
+ - `site_yahoo_finance`
78
+ - `site_marketwatch`
79
+ - `text_char_len`
80
+ - `title_char_len`
81
+ - `is_after_close`
82
+ - `is_premarket`
83
+ - `is_market_hours`
84
+ - `published_hour_sin`
85
+ - `published_hour_cos`
86
+ - `ret_1d_lag`
87
+ - `ret_5d_lag`
88
+ - `ret_20d_lag`
89
+ - `vol_5d`
90
+ - `vol_20d`
91
+ - `vol_60d`
92
+ - `vol_ratio_5_20`
93
+ - `drawdown_20d`
94
+ - `volume_z_20d`
95
+ - `sector_code`
96
+ - `market_ret_1d_lag`
97
+ - `market_ret_5d_lag`
98
+ - `sector_ret_1d_lag`
99
+ - `sector_ret_5d_lag`
100
+ - `beta_market_60d`
101
+ - `beta_sector_60d`
102
+
103
+ All numeric features are fit/scaled on the training split only.
104
+
105
+ ## Label Construction
106
+
107
+ - `sentiment` comes from the source dataset label.
108
+ - `direction_1d` is generated from future 1-day beta-adjusted abnormal log return after the event anchor date.
109
+ - `direction_5d` is generated from future 5-day beta-adjusted abnormal log return.
110
+ - A return is neutral when it falls inside a volatility-adjusted threshold.
111
+ - `volatility` is based on train-split tertiles of future absolute 5-day return.
112
+
113
+ Event anchoring shifts after-close UTC news to the next calendar day and uses the next available trading date.
114
+ Duplicate same-symbol same-day event titles are removed before training.
115
+
116
+ ## Test Metrics
117
+
118
+ ```json
119
+ {
120
+ "sentiment": {
121
+ "accuracy": 0.009784735812133072,
122
+ "macro_precision": 0.0032615786040443573,
123
+ "macro_recall": 0.3333333333333333,
124
+ "macro_f1": 0.006459948320413436,
125
+ "confusion_matrix": [
126
+ [
127
+ 0,
128
+ 110,
129
+ 0
130
+ ],
131
+ [
132
+ 0,
133
+ 15,
134
+ 0
135
+ ],
136
+ [
137
+ 0,
138
+ 1408,
139
+ 0
140
+ ]
141
+ ],
142
+ "predicted_class_counts": {
143
+ "bearish": 0,
144
+ "neutral": 1533,
145
+ "bullish": 0
146
+ },
147
+ "true_class_counts": {
148
+ "bearish": 110,
149
+ "neutral": 15,
150
+ "bullish": 1408
151
+ },
152
+ "expected_calibration_error": 0.8247767172447622,
153
+ "brier_score": 1.5716914553389048
154
+ },
155
+ "direction_1d": {
156
+ "accuracy": 0.3333333333333333,
157
+ "macro_precision": 0.2797514006239241,
158
+ "macro_recall": 0.30693224852151285,
159
+ "macro_f1": 0.28541099148120913,
160
+ "confusion_matrix": [
161
+ [
162
+ 31,
163
+ 217,
164
+ 212
165
+ ],
166
+ [
167
+ 54,
168
+ 138,
169
+ 244
170
+ ],
171
+ [
172
+ 124,
173
+ 171,
174
+ 342
175
+ ]
176
+ ],
177
+ "predicted_class_counts": {
178
+ "bearish": 209,
179
+ "neutral": 526,
180
+ "bullish": 798
181
+ },
182
+ "true_class_counts": {
183
+ "bearish": 460,
184
+ "neutral": 436,
185
+ "bullish": 637
186
+ },
187
+ "expected_calibration_error": 0.03496117605077468,
188
+ "brier_score": 0.6723385830557309
189
+ },
190
+ "direction_5d": {
191
+ "accuracy": 0.45857795172863663,
192
+ "macro_precision": 0.28120269843193446,
193
+ "macro_recall": 0.33741824434028733,
194
+ "macro_f1": 0.30491929241175314,
195
+ "confusion_matrix": [
196
+ [
197
+ 0,
198
+ 123,
199
+ 143
200
+ ],
201
+ [
202
+ 0,
203
+ 553,
204
+ 311
205
+ ],
206
+ [
207
+ 0,
208
+ 253,
209
+ 150
210
+ ]
211
+ ],
212
+ "predicted_class_counts": {
213
+ "bearish": 0,
214
+ "neutral": 929,
215
+ "bullish": 604
216
+ },
217
+ "true_class_counts": {
218
+ "bearish": 266,
219
+ "neutral": 864,
220
+ "bullish": 403
221
+ },
222
+ "expected_calibration_error": 0.024249568105288305,
223
+ "brier_score": 0.633983250156182
224
+ },
225
+ "volatility": {
226
+ "accuracy": 0.39921722113502933,
227
+ "macro_precision": 0.13307240704500978,
228
+ "macro_recall": 0.3333333333333333,
229
+ "macro_f1": 0.1902097902097902,
230
+ "confusion_matrix": [
231
+ [
232
+ 0,
233
+ 534,
234
+ 0
235
+ ],
236
+ [
237
+ 0,
238
+ 612,
239
+ 0
240
+ ],
241
+ [
242
+ 0,
243
+ 387,
244
+ 0
245
+ ]
246
+ ],
247
+ "predicted_class_counts": {
248
+ "lower_vol": 0,
249
+ "normal_vol": 1533,
250
+ "higher_vol": 0
251
+ },
252
+ "true_class_counts": {
253
+ "lower_vol": 534,
254
+ "normal_vol": 612,
255
+ "higher_vol": 387
256
+ },
257
+ "expected_calibration_error": 0.1760233100613036,
258
+ "brier_score": 0.712886643106662
259
+ },
260
+ "event_backtest": {
261
+ "split": "test",
262
+ "transaction_cost_bps": 10.0,
263
+ "active_1d_fraction": 0.6568819308545336,
264
+ "hit_ratio_1d_active": 0.5789473684210527,
265
+ "avg_signal_return_1d_all": 0.0023507131510749455,
266
+ "avg_signal_return_1d_all_net": 0.0016938312202204116,
267
+ "avg_signal_return_1d_active": 0.0035785931088360393,
268
+ "avg_signal_return_1d_active_net": 0.002578593108836038,
269
+ "avg_abnormal_signal_return_1d_all": 0.0007162017627715418,
270
+ "avg_abnormal_signal_return_1d_all_net": 5.93198319170076e-05,
271
+ "avg_abnormal_signal_return_1d_active": 0.0010903051661656142,
272
+ "avg_abnormal_signal_return_1d_active_net": 9.030516616561321e-05,
273
+ "soft_signal_backtest_1d": {
274
+ "mean_abs_signal": 0.049029190093278885,
275
+ "active_fraction_abs_score_ge_0.10": 0.11545988258317025,
276
+ "hit_ratio_all_gross": 0.532941943900848,
277
+ "hit_ratio_active_gross": 0.4463276836158192,
278
+ "avg_signal_return_all": -4.1903094825102016e-05,
279
+ "avg_signal_return_all_net": -9.093229164136574e-05,
280
+ "avg_abnormal_signal_return_all": 6.6492548285168596e-06,
281
+ "avg_abnormal_signal_return_all_net": -4.237994289724156e-05
282
+ },
283
+ "active_5d_fraction": 0.39399869536855836,
284
+ "hit_ratio_5d_active": 0.3708609271523179,
285
+ "avg_signal_return_5d_all": -0.004762698527096634,
286
+ "avg_signal_return_5d_active": -0.012088107354369436,
287
+ "confidence_backtests_1d": {
288
+ "top_10pct_directional_confidence": {
289
+ "num_events": 153,
290
+ "mean_confidence": 0.40472039580345154,
291
+ "hit_ratio": 0.48366013071895425,
292
+ "avg_signal_return": -0.0017291086948829782,
293
+ "avg_signal_return_net": -0.002729108694882979,
294
+ "median_signal_return": -0.0011537353275343776,
295
+ "avg_abnormal_signal_return": 0.0023775517351216227,
296
+ "avg_abnormal_signal_return_net": 0.001377551735121622
297
+ },
298
+ "top_20pct_directional_confidence": {
299
+ "num_events": 307,
300
+ "mean_confidence": 0.39327916502952576,
301
+ "hit_ratio": 0.46579804560260585,
302
+ "avg_signal_return": -0.0018346647964980064,
303
+ "avg_signal_return_net": -0.0028346647964980075,
304
+ "median_signal_return": -0.001494303229264915,
305
+ "avg_abnormal_signal_return": 0.00013626769358741198,
306
+ "avg_abnormal_signal_return_net": -0.0008637323064125889
307
+ },
308
+ "top_30pct_directional_confidence": {
309
+ "num_events": 460,
310
+ "mean_confidence": 0.3858344554901123,
311
+ "hit_ratio": 0.5021739130434782,
312
+ "avg_signal_return": 0.000168503345380684,
313
+ "avg_signal_return_net": -0.0008314966546193169,
314
+ "median_signal_return": 0.00047751690726727247,
315
+ "avg_abnormal_signal_return": -0.0006967808645482443,
316
+ "avg_abnormal_signal_return_net": -0.0016967808645482452
317
+ },
318
+ "threshold_0.34": {
319
+ "num_events": 1108,
320
+ "coverage": 0.7227658186562296,
321
+ "hit_ratio": 0.5523465703971119,
322
+ "avg_signal_return": 0.002400788538323842,
323
+ "avg_signal_return_net": 0.001400788538323841,
324
+ "median_signal_return": 0.0029742957558482885,
325
+ "avg_abnormal_signal_return": 0.00043566977963306825,
326
+ "avg_abnormal_signal_return_net": -0.0005643302203669326
327
+ },
328
+ "threshold_0.36": {
329
+ "num_events": 636,
330
+ "coverage": 0.41487279843444225,
331
+ "hit_ratio": 0.5471698113207547,
332
+ "avg_signal_return": 0.001725182487273244,
333
+ "avg_signal_return_net": 0.000725182487273243,
334
+ "median_signal_return": 0.0027859483379870653,
335
+ "avg_abnormal_signal_return": -0.0001395135721791607,
336
+ "avg_abnormal_signal_return_net": -0.0011395135721791617
337
+ },
338
+ "threshold_0.38": {
339
+ "num_events": 247,
340
+ "coverage": 0.16112198303979125,
341
+ "hit_ratio": 0.44129554655870445,
342
+ "avg_signal_return": -0.0028902063559107334,
343
+ "avg_signal_return_net": -0.0038902063559107343,
344
+ "median_signal_return": -0.0022296553943306208,
345
+ "avg_abnormal_signal_return": 0.0008437850360328761,
346
+ "avg_abnormal_signal_return_net": -0.00015621496396712474
347
+ },
348
+ "threshold_0.40": {
349
+ "num_events": 85,
350
+ "coverage": 0.055446836268754074,
351
+ "hit_ratio": 0.43529411764705883,
352
+ "avg_signal_return": -0.0034284473146887168,
353
+ "avg_signal_return_net": -0.004428447314688717,
354
+ "median_signal_return": -0.0031941309571266174,
355
+ "avg_abnormal_signal_return": 0.0008126156158087884,
356
+ "avg_abnormal_signal_return_net": -0.00018738438419121242
357
+ },
358
+ "threshold_0.42": {
359
+ "num_events": 20,
360
+ "coverage": 0.01304631441617743,
361
+ "hit_ratio": 0.25,
362
+ "avg_signal_return": -0.008969856356270612,
363
+ "avg_signal_return_net": -0.009969856356270613,
364
+ "median_signal_return": -0.011959618888795376,
365
+ "avg_abnormal_signal_return": -0.0006033775032847188,
366
+ "avg_abnormal_signal_return_net": -0.0016033775032847197
367
+ },
368
+ "threshold_0.45": {
369
+ "num_events": 2,
370
+ "coverage": 0.001304631441617743,
371
+ "hit_ratio": 0.5,
372
+ "avg_signal_return": 0.0010717622935771942,
373
+ "avg_signal_return_net": 7.176229357719333e-05,
374
+ "median_signal_return": 0.0010717622935771942,
375
+ "avg_abnormal_signal_return": 0.01302456425037235,
376
+ "avg_abnormal_signal_return_net": 0.01202456425037235
377
+ },
378
+ "threshold_0.50": {
379
+ "num_events": 0,
380
+ "coverage": 0.0,
381
+ "hit_ratio": null,
382
+ "avg_signal_return": null,
383
+ "avg_signal_return_net": null,
384
+ "median_signal_return": null,
385
+ "avg_abnormal_signal_return": null,
386
+ "avg_abnormal_signal_return_net": null
387
+ }
388
+ },
389
+ "mean_text_gate_weight": 0.506599485874176,
390
+ "std_text_gate_weight": 0.0016385371563956141
391
+ }
392
+ }
393
+ ```
394
+
395
+ The `confidence_backtests_1d` section ranks events by directional confidence, where directional confidence is `max(P(bearish), P(bullish))`.
396
+ This is the preferred way to inspect whether the model is useful as a signal filter.
397
+ V4 also reports cost-adjusted backtest fields using `10.0` bps per active directional event.
398
+
399
+ ## Important Limitations
400
+
401
+ This is a research model, not trading advice.
402
+
403
+ Known limitations:
404
+
405
+ - Public news sentiment labels can be noisy.
406
+ - Daily OHLCV alignment is an approximation; intraday timestamp alignment would be better.
407
+ - Market-direction labels are derived from future returns and are sensitive to threshold choice.
408
+ - Results should be evaluated out-of-sample by date and by ticker before any practical use.
409
+
410
+ ## Files
411
+
412
+ - `pytorch_model.bin`: custom gated-fusion model weights.
413
+ - `training_config.json`: training and dataset configuration.
414
+ - `feature_schema.json`: numeric feature scaler and schema.
415
+ - `label_mapping.json`: task label names.
416
+ - `metrics.json`: train/validation/test metrics.
417
+ - `test_predictions.csv`: event-level test predictions and returns.
418
+ - `top_10pct_directional_confidence_1d` in `test_predictions.csv`: marks the highest-confidence directional event subset.
419
+ - `confusion_matrix_*.csv`: confusion matrices per split/task.
class_weights.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "sentiment": [
3
+ 0.3636363744735718,
4
+ 2.2727272510528564,
5
+ 0.3636363744735718
6
+ ],
7
+ "direction_1d": [
8
+ 0.9880715608596802,
9
+ 0.9986989498138428,
10
+ 1.013229489326477
11
+ ],
12
+ "direction_5d": [
13
+ 1.2122101783752441,
14
+ 0.5105218887329102,
15
+ 1.2772679328918457
16
+ ],
17
+ "volatility": [
18
+ 1.0096524953842163,
19
+ 1.0141775608062744,
20
+ 0.9761698842048645
21
+ ]
22
+ }
confusion_matrix_test_direction_1d.csv ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ ,bearish,neutral,bullish
2
+ bearish,31,217,212
3
+ neutral,54,138,244
4
+ bullish,124,171,342
confusion_matrix_test_direction_5d.csv ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ ,bearish,neutral,bullish
2
+ bearish,0,123,143
3
+ neutral,0,553,311
4
+ bullish,0,253,150
confusion_matrix_test_sentiment.csv ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ ,bearish,neutral,bullish
2
+ bearish,0,110,0
3
+ neutral,0,15,0
4
+ bullish,0,1408,0
confusion_matrix_test_volatility.csv ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ ,lower_vol,normal_vol,higher_vol
2
+ lower_vol,0,534,0
3
+ normal_vol,0,612,0
4
+ higher_vol,0,387,0
confusion_matrix_validation_direction_1d.csv ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ ,bearish,neutral,bullish
2
+ bearish,104,200,164
3
+ neutral,116,228,186
4
+ bullish,154,189,191
confusion_matrix_validation_direction_5d.csv ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ ,bearish,neutral,bullish
2
+ bearish,0,131,208
3
+ neutral,0,381,424
4
+ bullish,0,166,222
confusion_matrix_validation_sentiment.csv ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ ,bearish,neutral,bullish
2
+ bearish,0,72,0
3
+ neutral,0,13,0
4
+ bullish,0,1447,0
confusion_matrix_validation_volatility.csv ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ ,lower_vol,normal_vol,higher_vol
2
+ lower_vol,0,535,0
3
+ normal_vol,0,642,0
4
+ higher_vol,0,355,0
feature_schema.json ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "numeric_columns": [
3
+ "source_sentiment_score",
4
+ "sentiment_abs_score",
5
+ "site_benzinga",
6
+ "site_globenewswire",
7
+ "site_yahoo_finance",
8
+ "site_marketwatch",
9
+ "text_char_len",
10
+ "title_char_len",
11
+ "is_after_close",
12
+ "is_premarket",
13
+ "is_market_hours",
14
+ "published_hour_sin",
15
+ "published_hour_cos",
16
+ "ret_1d_lag",
17
+ "ret_5d_lag",
18
+ "ret_20d_lag",
19
+ "vol_5d",
20
+ "vol_20d",
21
+ "vol_60d",
22
+ "vol_ratio_5_20",
23
+ "drawdown_20d",
24
+ "volume_z_20d",
25
+ "sector_code",
26
+ "market_ret_1d_lag",
27
+ "market_ret_5d_lag",
28
+ "sector_ret_1d_lag",
29
+ "sector_ret_5d_lag",
30
+ "beta_market_60d",
31
+ "beta_sector_60d"
32
+ ],
33
+ "mean": {
34
+ "source_sentiment_score": 0.746147335480902,
35
+ "sentiment_abs_score": 0.8758751153945923,
36
+ "site_benzinga": 0.3414634168148041,
37
+ "site_globenewswire": 0.027059365063905716,
38
+ "site_yahoo_finance": 0.19834330677986145,
39
+ "site_marketwatch": 0.0011965024750679731,
40
+ "text_char_len": 0.12647317349910736,
41
+ "title_char_len": 0.22998590767383575,
42
+ "is_after_close": 0.06129774451255798,
43
+ "is_premarket": 0.4531063139438629,
44
+ "is_market_hours": 0.4855959415435791,
45
+ "published_hour_sin": -0.2567417025566101,
46
+ "published_hour_cos": -0.35876917839050293,
47
+ "ret_1d_lag": 0.0008439657480443655,
48
+ "ret_5d_lag": 0.005167760459005587,
49
+ "ret_20d_lag": 0.02857849583185001,
50
+ "vol_5d": 0.02562241014758805,
51
+ "vol_20d": 0.027127421020100354,
52
+ "vol_60d": 0.030643427833552454,
53
+ "vol_ratio_5_20": 0.94897371917211,
54
+ "drawdown_20d": -0.06130785621108734,
55
+ "volume_z_20d": 0.9134664162837107,
56
+ "sector_code": 0.2863782788771284,
57
+ "market_ret_1d_lag": 0.0006297079118320088,
58
+ "market_ret_5d_lag": 0.0041547514374490275,
59
+ "sector_ret_1d_lag": 0.0008193551641604897,
60
+ "sector_ret_5d_lag": 0.0056375516939092185,
61
+ "beta_market_60d": 1.3920874921870132,
62
+ "beta_sector_60d": 1.1672672796791717
63
+ },
64
+ "std": {
65
+ "source_sentiment_score": 0.5201509712378972,
66
+ "sentiment_abs_score": 0.24518661201000214,
67
+ "site_benzinga": 0.4742223620414734,
68
+ "site_globenewswire": 0.1622639149427414,
69
+ "site_yahoo_finance": 0.39877045154571533,
70
+ "site_marketwatch": 0.034571390599012375,
71
+ "text_char_len": 0.02897312119603157,
72
+ "title_char_len": 0.08457622677087784,
73
+ "is_after_close": 0.23988670110702515,
74
+ "is_premarket": 0.49781903624534607,
75
+ "is_market_hours": 0.4998154640197754,
76
+ "published_hour_sin": 0.663528323173523,
77
+ "published_hour_cos": 0.6042951941490173,
78
+ "ret_1d_lag": 0.043107957015871196,
79
+ "ret_5d_lag": 0.10119434487495654,
80
+ "ret_20d_lag": 0.19046219546936188,
81
+ "vol_5d": 0.03433107463832683,
82
+ "vol_20d": 0.025843168460955513,
83
+ "vol_60d": 0.03346308394931953,
84
+ "vol_ratio_5_20": 0.368944284961192,
85
+ "drawdown_20d": 0.09463532321589767,
86
+ "volume_z_20d": 2.680127570980756,
87
+ "sector_code": 0.266243410025217,
88
+ "market_ret_1d_lag": 0.009415043142061333,
89
+ "market_ret_5d_lag": 0.020352265466718426,
90
+ "sector_ret_1d_lag": 0.016764466116674252,
91
+ "sector_ret_5d_lag": 0.04006510290929962,
92
+ "beta_market_60d": 0.6201234265286986,
93
+ "beta_sector_60d": 0.5781415112568552
94
+ }
95
+ }
label_mapping.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "sentiment": [
3
+ "bearish",
4
+ "neutral",
5
+ "bullish"
6
+ ],
7
+ "direction_1d": [
8
+ "bearish",
9
+ "neutral",
10
+ "bullish"
11
+ ],
12
+ "direction_5d": [
13
+ "bearish",
14
+ "neutral",
15
+ "bullish"
16
+ ],
17
+ "volatility": [
18
+ "lower_vol",
19
+ "normal_vol",
20
+ "higher_vol"
21
+ ]
22
+ }
metrics.json ADDED
@@ -0,0 +1,640 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "train": {
3
+ "train_runtime": 638.1554,
4
+ "train_samples_per_second": 51.077,
5
+ "train_steps_per_second": 3.197,
6
+ "total_flos": 0.0,
7
+ "train_loss": 0.6800105861589021,
8
+ "epoch": 3.0
9
+ },
10
+ "validation": {
11
+ "sentiment": {
12
+ "accuracy": 0.008485639686684074,
13
+ "macro_precision": 0.0028285465622280245,
14
+ "macro_recall": 0.3333333333333333,
15
+ "macro_f1": 0.0056094929881337656,
16
+ "confusion_matrix": [
17
+ [
18
+ 0,
19
+ 72,
20
+ 0
21
+ ],
22
+ [
23
+ 0,
24
+ 13,
25
+ 0
26
+ ],
27
+ [
28
+ 0,
29
+ 1447,
30
+ 0
31
+ ]
32
+ ],
33
+ "predicted_class_counts": {
34
+ "bearish": 0,
35
+ "neutral": 1532,
36
+ "bullish": 0
37
+ },
38
+ "true_class_counts": {
39
+ "bearish": 72,
40
+ "neutral": 13,
41
+ "bullish": 1447
42
+ },
43
+ "expected_calibration_error": 0.8264312032931156,
44
+ "brier_score": 1.573439388720477
45
+ },
46
+ "direction_1d": {
47
+ "accuracy": 0.3413838120104439,
48
+ "macro_precision": 0.3335515858937544,
49
+ "macro_recall": 0.33669626802974273,
50
+ "macro_f1": 0.33331285508030406,
51
+ "confusion_matrix": [
52
+ [
53
+ 104,
54
+ 200,
55
+ 164
56
+ ],
57
+ [
58
+ 116,
59
+ 228,
60
+ 186
61
+ ],
62
+ [
63
+ 154,
64
+ 189,
65
+ 191
66
+ ]
67
+ ],
68
+ "predicted_class_counts": {
69
+ "bearish": 374,
70
+ "neutral": 617,
71
+ "bullish": 541
72
+ },
73
+ "true_class_counts": {
74
+ "bearish": 468,
75
+ "neutral": 530,
76
+ "bullish": 534
77
+ },
78
+ "expected_calibration_error": 0.028834249269868946,
79
+ "brier_score": 0.6675477741251082
80
+ },
81
+ "direction_5d": {
82
+ "accuracy": 0.3936031331592689,
83
+ "macro_precision": 0.27396668808245767,
84
+ "macro_recall": 0.3484856246398156,
85
+ "macro_f1": 0.29043708459698375,
86
+ "confusion_matrix": [
87
+ [
88
+ 0,
89
+ 131,
90
+ 208
91
+ ],
92
+ [
93
+ 0,
94
+ 381,
95
+ 424
96
+ ],
97
+ [
98
+ 0,
99
+ 166,
100
+ 222
101
+ ]
102
+ ],
103
+ "predicted_class_counts": {
104
+ "bearish": 0,
105
+ "neutral": 678,
106
+ "bullish": 854
107
+ },
108
+ "true_class_counts": {
109
+ "bearish": 339,
110
+ "neutral": 805,
111
+ "bullish": 388
112
+ },
113
+ "expected_calibration_error": 0.060579302202783736,
114
+ "brier_score": 0.6606167988480304
115
+ },
116
+ "volatility": {
117
+ "accuracy": 0.41906005221932113,
118
+ "macro_precision": 0.13968668407310705,
119
+ "macro_recall": 0.3333333333333333,
120
+ "macro_f1": 0.1968721251149954,
121
+ "confusion_matrix": [
122
+ [
123
+ 0,
124
+ 535,
125
+ 0
126
+ ],
127
+ [
128
+ 0,
129
+ 642,
130
+ 0
131
+ ],
132
+ [
133
+ 0,
134
+ 355,
135
+ 0
136
+ ]
137
+ ],
138
+ "predicted_class_counts": {
139
+ "lower_vol": 0,
140
+ "normal_vol": 1532,
141
+ "higher_vol": 0
142
+ },
143
+ "true_class_counts": {
144
+ "lower_vol": 535,
145
+ "normal_vol": 642,
146
+ "higher_vol": 355
147
+ },
148
+ "expected_calibration_error": 0.15814054444159914,
149
+ "brier_score": 0.6913779982386878
150
+ },
151
+ "event_backtest": {
152
+ "split": "validation",
153
+ "transaction_cost_bps": 10.0,
154
+ "active_1d_fraction": 0.5972584856396866,
155
+ "hit_ratio_1d_active": 0.5398907103825137,
156
+ "avg_signal_return_1d_all": -0.000668341834270596,
157
+ "avg_signal_return_1d_all_net": -0.0012656003199102831,
158
+ "avg_signal_return_1d_active": -0.0011190160547568885,
159
+ "avg_signal_return_1d_active_net": -0.0021190160547568896,
160
+ "avg_abnormal_signal_return_1d_all": -0.0009099995935003474,
161
+ "avg_abnormal_signal_return_1d_all_net": -0.0015072580791400346,
162
+ "avg_abnormal_signal_return_1d_active": -0.001523627734691292,
163
+ "avg_abnormal_signal_return_1d_active_net": -0.002523627734691293,
164
+ "soft_signal_backtest_1d": {
165
+ "mean_abs_signal": 0.04358483850955963,
166
+ "active_fraction_abs_score_ge_0.10": 0.0731070496083551,
167
+ "hit_ratio_all_gross": 0.5071801566579635,
168
+ "hit_ratio_active_gross": 0.5446428571428571,
169
+ "avg_signal_return_all": -2.6098849048139527e-05,
170
+ "avg_signal_return_all_net": -6.968369416426867e-05,
171
+ "avg_abnormal_signal_return_all": -3.516719152685255e-05,
172
+ "avg_abnormal_signal_return_all_net": -7.87520402809605e-05
173
+ },
174
+ "active_5d_fraction": 0.5574412532637075,
175
+ "hit_ratio_5d_active": 0.5644028103044496,
176
+ "avg_signal_return_5d_all": 0.00222230488512584,
177
+ "avg_signal_return_5d_active": 0.003986617194394364,
178
+ "confidence_backtests_1d": {
179
+ "top_10pct_directional_confidence": {
180
+ "num_events": 153,
181
+ "mean_confidence": 0.39838218688964844,
182
+ "hit_ratio": 0.5424836601307189,
183
+ "avg_signal_return": -0.0040096313650019805,
184
+ "avg_signal_return_net": -0.005009631365001981,
185
+ "median_signal_return": 0.001708349445834756,
186
+ "avg_abnormal_signal_return": -0.0016484424157800104,
187
+ "avg_abnormal_signal_return_net": -0.002648442415780011
188
+ },
189
+ "top_20pct_directional_confidence": {
190
+ "num_events": 306,
191
+ "mean_confidence": 0.38756096363067627,
192
+ "hit_ratio": 0.5588235294117647,
193
+ "avg_signal_return": -0.0005135064179175138,
194
+ "avg_signal_return_net": -0.0015135064179175146,
195
+ "median_signal_return": 0.003223672043532133,
196
+ "avg_abnormal_signal_return": -0.0006505049740809268,
197
+ "avg_abnormal_signal_return_net": -0.0016505049740809278
198
+ },
199
+ "top_30pct_directional_confidence": {
200
+ "num_events": 460,
201
+ "mean_confidence": 0.38047993183135986,
202
+ "hit_ratio": 0.5543478260869565,
203
+ "avg_signal_return": -0.0009027605532142131,
204
+ "avg_signal_return_net": -0.001902760553214214,
205
+ "median_signal_return": 0.002927212161011994,
206
+ "avg_abnormal_signal_return": -0.0014120966409751892,
207
+ "avg_abnormal_signal_return_net": -0.0024120966409751903
208
+ },
209
+ "threshold_0.34": {
210
+ "num_events": 1040,
211
+ "coverage": 0.6788511749347258,
212
+ "hit_ratio": 0.5394230769230769,
213
+ "avg_signal_return": -0.0007571742408449959,
214
+ "avg_signal_return_net": -0.0017571742408449967,
215
+ "median_signal_return": 0.0012542950571514666,
216
+ "avg_abnormal_signal_return": -0.0012730416878412143,
217
+ "avg_abnormal_signal_return_net": -0.0022730416878412154
218
+ },
219
+ "threshold_0.36": {
220
+ "num_events": 494,
221
+ "coverage": 0.3224543080939948,
222
+ "hit_ratio": 0.5566801619433198,
223
+ "avg_signal_return": -0.0008215918299352735,
224
+ "avg_signal_return_net": -0.0018215918299352744,
225
+ "median_signal_return": 0.0024513842072337866,
226
+ "avg_abnormal_signal_return": -0.0013882744387551298,
227
+ "avg_abnormal_signal_return_net": -0.002388274438755131
228
+ },
229
+ "threshold_0.38": {
230
+ "num_events": 188,
231
+ "coverage": 0.1227154046997389,
232
+ "hit_ratio": 0.5478723404255319,
233
+ "avg_signal_return": -0.0020706070304669917,
234
+ "avg_signal_return_net": -0.0030706070304669926,
235
+ "median_signal_return": 0.0030422902200371027,
236
+ "avg_abnormal_signal_return": -0.0007841120187801239,
237
+ "avg_abnormal_signal_return_net": -0.0017841120187801248
238
+ },
239
+ "threshold_0.40": {
240
+ "num_events": 49,
241
+ "coverage": 0.03198433420365535,
242
+ "hit_ratio": 0.5714285714285714,
243
+ "avg_signal_return": -0.0023946920445435966,
244
+ "avg_signal_return_net": -0.0033946920445435975,
245
+ "median_signal_return": 0.003524972125887871,
246
+ "avg_abnormal_signal_return": 0.0008117058985552997,
247
+ "avg_abnormal_signal_return_net": -0.00018829410144470112
248
+ },
249
+ "threshold_0.42": {
250
+ "num_events": 8,
251
+ "coverage": 0.005221932114882507,
252
+ "hit_ratio": 0.625,
253
+ "avg_signal_return": 0.007374357082881033,
254
+ "avg_signal_return_net": 0.0063743570828810325,
255
+ "median_signal_return": 0.010437145829200745,
256
+ "avg_abnormal_signal_return": 0.013620887020806549,
257
+ "avg_abnormal_signal_return_net": 0.012620887020806548
258
+ },
259
+ "threshold_0.45": {
260
+ "num_events": 0,
261
+ "coverage": 0.0,
262
+ "hit_ratio": null,
263
+ "avg_signal_return": null,
264
+ "avg_signal_return_net": null,
265
+ "median_signal_return": null,
266
+ "avg_abnormal_signal_return": null,
267
+ "avg_abnormal_signal_return_net": null
268
+ },
269
+ "threshold_0.50": {
270
+ "num_events": 0,
271
+ "coverage": 0.0,
272
+ "hit_ratio": null,
273
+ "avg_signal_return": null,
274
+ "avg_signal_return_net": null,
275
+ "median_signal_return": null,
276
+ "avg_abnormal_signal_return": null,
277
+ "avg_abnormal_signal_return_net": null
278
+ }
279
+ },
280
+ "mean_text_gate_weight": 0.5071590542793274,
281
+ "std_text_gate_weight": 0.0016348367789760232
282
+ }
283
+ },
284
+ "test": {
285
+ "sentiment": {
286
+ "accuracy": 0.009784735812133072,
287
+ "macro_precision": 0.0032615786040443573,
288
+ "macro_recall": 0.3333333333333333,
289
+ "macro_f1": 0.006459948320413436,
290
+ "confusion_matrix": [
291
+ [
292
+ 0,
293
+ 110,
294
+ 0
295
+ ],
296
+ [
297
+ 0,
298
+ 15,
299
+ 0
300
+ ],
301
+ [
302
+ 0,
303
+ 1408,
304
+ 0
305
+ ]
306
+ ],
307
+ "predicted_class_counts": {
308
+ "bearish": 0,
309
+ "neutral": 1533,
310
+ "bullish": 0
311
+ },
312
+ "true_class_counts": {
313
+ "bearish": 110,
314
+ "neutral": 15,
315
+ "bullish": 1408
316
+ },
317
+ "expected_calibration_error": 0.8247767172447622,
318
+ "brier_score": 1.5716914553389048
319
+ },
320
+ "direction_1d": {
321
+ "accuracy": 0.3333333333333333,
322
+ "macro_precision": 0.2797514006239241,
323
+ "macro_recall": 0.30693224852151285,
324
+ "macro_f1": 0.28541099148120913,
325
+ "confusion_matrix": [
326
+ [
327
+ 31,
328
+ 217,
329
+ 212
330
+ ],
331
+ [
332
+ 54,
333
+ 138,
334
+ 244
335
+ ],
336
+ [
337
+ 124,
338
+ 171,
339
+ 342
340
+ ]
341
+ ],
342
+ "predicted_class_counts": {
343
+ "bearish": 209,
344
+ "neutral": 526,
345
+ "bullish": 798
346
+ },
347
+ "true_class_counts": {
348
+ "bearish": 460,
349
+ "neutral": 436,
350
+ "bullish": 637
351
+ },
352
+ "expected_calibration_error": 0.03496117605077468,
353
+ "brier_score": 0.6723385830557309
354
+ },
355
+ "direction_5d": {
356
+ "accuracy": 0.45857795172863663,
357
+ "macro_precision": 0.28120269843193446,
358
+ "macro_recall": 0.33741824434028733,
359
+ "macro_f1": 0.30491929241175314,
360
+ "confusion_matrix": [
361
+ [
362
+ 0,
363
+ 123,
364
+ 143
365
+ ],
366
+ [
367
+ 0,
368
+ 553,
369
+ 311
370
+ ],
371
+ [
372
+ 0,
373
+ 253,
374
+ 150
375
+ ]
376
+ ],
377
+ "predicted_class_counts": {
378
+ "bearish": 0,
379
+ "neutral": 929,
380
+ "bullish": 604
381
+ },
382
+ "true_class_counts": {
383
+ "bearish": 266,
384
+ "neutral": 864,
385
+ "bullish": 403
386
+ },
387
+ "expected_calibration_error": 0.024249568105288305,
388
+ "brier_score": 0.633983250156182
389
+ },
390
+ "volatility": {
391
+ "accuracy": 0.39921722113502933,
392
+ "macro_precision": 0.13307240704500978,
393
+ "macro_recall": 0.3333333333333333,
394
+ "macro_f1": 0.1902097902097902,
395
+ "confusion_matrix": [
396
+ [
397
+ 0,
398
+ 534,
399
+ 0
400
+ ],
401
+ [
402
+ 0,
403
+ 612,
404
+ 0
405
+ ],
406
+ [
407
+ 0,
408
+ 387,
409
+ 0
410
+ ]
411
+ ],
412
+ "predicted_class_counts": {
413
+ "lower_vol": 0,
414
+ "normal_vol": 1533,
415
+ "higher_vol": 0
416
+ },
417
+ "true_class_counts": {
418
+ "lower_vol": 534,
419
+ "normal_vol": 612,
420
+ "higher_vol": 387
421
+ },
422
+ "expected_calibration_error": 0.1760233100613036,
423
+ "brier_score": 0.712886643106662
424
+ },
425
+ "event_backtest": {
426
+ "split": "test",
427
+ "transaction_cost_bps": 10.0,
428
+ "active_1d_fraction": 0.6568819308545336,
429
+ "hit_ratio_1d_active": 0.5789473684210527,
430
+ "avg_signal_return_1d_all": 0.0023507131510749455,
431
+ "avg_signal_return_1d_all_net": 0.0016938312202204116,
432
+ "avg_signal_return_1d_active": 0.0035785931088360393,
433
+ "avg_signal_return_1d_active_net": 0.002578593108836038,
434
+ "avg_abnormal_signal_return_1d_all": 0.0007162017627715418,
435
+ "avg_abnormal_signal_return_1d_all_net": 5.93198319170076e-05,
436
+ "avg_abnormal_signal_return_1d_active": 0.0010903051661656142,
437
+ "avg_abnormal_signal_return_1d_active_net": 9.030516616561321e-05,
438
+ "soft_signal_backtest_1d": {
439
+ "mean_abs_signal": 0.049029190093278885,
440
+ "active_fraction_abs_score_ge_0.10": 0.11545988258317025,
441
+ "hit_ratio_all_gross": 0.532941943900848,
442
+ "hit_ratio_active_gross": 0.4463276836158192,
443
+ "avg_signal_return_all": -4.1903094825102016e-05,
444
+ "avg_signal_return_all_net": -9.093229164136574e-05,
445
+ "avg_abnormal_signal_return_all": 6.6492548285168596e-06,
446
+ "avg_abnormal_signal_return_all_net": -4.237994289724156e-05
447
+ },
448
+ "active_5d_fraction": 0.39399869536855836,
449
+ "hit_ratio_5d_active": 0.3708609271523179,
450
+ "avg_signal_return_5d_all": -0.004762698527096634,
451
+ "avg_signal_return_5d_active": -0.012088107354369436,
452
+ "confidence_backtests_1d": {
453
+ "top_10pct_directional_confidence": {
454
+ "num_events": 153,
455
+ "mean_confidence": 0.40472039580345154,
456
+ "hit_ratio": 0.48366013071895425,
457
+ "avg_signal_return": -0.0017291086948829782,
458
+ "avg_signal_return_net": -0.002729108694882979,
459
+ "median_signal_return": -0.0011537353275343776,
460
+ "avg_abnormal_signal_return": 0.0023775517351216227,
461
+ "avg_abnormal_signal_return_net": 0.001377551735121622
462
+ },
463
+ "top_20pct_directional_confidence": {
464
+ "num_events": 307,
465
+ "mean_confidence": 0.39327916502952576,
466
+ "hit_ratio": 0.46579804560260585,
467
+ "avg_signal_return": -0.0018346647964980064,
468
+ "avg_signal_return_net": -0.0028346647964980075,
469
+ "median_signal_return": -0.001494303229264915,
470
+ "avg_abnormal_signal_return": 0.00013626769358741198,
471
+ "avg_abnormal_signal_return_net": -0.0008637323064125889
472
+ },
473
+ "top_30pct_directional_confidence": {
474
+ "num_events": 460,
475
+ "mean_confidence": 0.3858344554901123,
476
+ "hit_ratio": 0.5021739130434782,
477
+ "avg_signal_return": 0.000168503345380684,
478
+ "avg_signal_return_net": -0.0008314966546193169,
479
+ "median_signal_return": 0.00047751690726727247,
480
+ "avg_abnormal_signal_return": -0.0006967808645482443,
481
+ "avg_abnormal_signal_return_net": -0.0016967808645482452
482
+ },
483
+ "threshold_0.34": {
484
+ "num_events": 1108,
485
+ "coverage": 0.7227658186562296,
486
+ "hit_ratio": 0.5523465703971119,
487
+ "avg_signal_return": 0.002400788538323842,
488
+ "avg_signal_return_net": 0.001400788538323841,
489
+ "median_signal_return": 0.0029742957558482885,
490
+ "avg_abnormal_signal_return": 0.00043566977963306825,
491
+ "avg_abnormal_signal_return_net": -0.0005643302203669326
492
+ },
493
+ "threshold_0.36": {
494
+ "num_events": 636,
495
+ "coverage": 0.41487279843444225,
496
+ "hit_ratio": 0.5471698113207547,
497
+ "avg_signal_return": 0.001725182487273244,
498
+ "avg_signal_return_net": 0.000725182487273243,
499
+ "median_signal_return": 0.0027859483379870653,
500
+ "avg_abnormal_signal_return": -0.0001395135721791607,
501
+ "avg_abnormal_signal_return_net": -0.0011395135721791617
502
+ },
503
+ "threshold_0.38": {
504
+ "num_events": 247,
505
+ "coverage": 0.16112198303979125,
506
+ "hit_ratio": 0.44129554655870445,
507
+ "avg_signal_return": -0.0028902063559107334,
508
+ "avg_signal_return_net": -0.0038902063559107343,
509
+ "median_signal_return": -0.0022296553943306208,
510
+ "avg_abnormal_signal_return": 0.0008437850360328761,
511
+ "avg_abnormal_signal_return_net": -0.00015621496396712474
512
+ },
513
+ "threshold_0.40": {
514
+ "num_events": 85,
515
+ "coverage": 0.055446836268754074,
516
+ "hit_ratio": 0.43529411764705883,
517
+ "avg_signal_return": -0.0034284473146887168,
518
+ "avg_signal_return_net": -0.004428447314688717,
519
+ "median_signal_return": -0.0031941309571266174,
520
+ "avg_abnormal_signal_return": 0.0008126156158087884,
521
+ "avg_abnormal_signal_return_net": -0.00018738438419121242
522
+ },
523
+ "threshold_0.42": {
524
+ "num_events": 20,
525
+ "coverage": 0.01304631441617743,
526
+ "hit_ratio": 0.25,
527
+ "avg_signal_return": -0.008969856356270612,
528
+ "avg_signal_return_net": -0.009969856356270613,
529
+ "median_signal_return": -0.011959618888795376,
530
+ "avg_abnormal_signal_return": -0.0006033775032847188,
531
+ "avg_abnormal_signal_return_net": -0.0016033775032847197
532
+ },
533
+ "threshold_0.45": {
534
+ "num_events": 2,
535
+ "coverage": 0.001304631441617743,
536
+ "hit_ratio": 0.5,
537
+ "avg_signal_return": 0.0010717622935771942,
538
+ "avg_signal_return_net": 7.176229357719333e-05,
539
+ "median_signal_return": 0.0010717622935771942,
540
+ "avg_abnormal_signal_return": 0.01302456425037235,
541
+ "avg_abnormal_signal_return_net": 0.01202456425037235
542
+ },
543
+ "threshold_0.50": {
544
+ "num_events": 0,
545
+ "coverage": 0.0,
546
+ "hit_ratio": null,
547
+ "avg_signal_return": null,
548
+ "avg_signal_return_net": null,
549
+ "median_signal_return": null,
550
+ "avg_abnormal_signal_return": null,
551
+ "avg_abnormal_signal_return_net": null
552
+ }
553
+ },
554
+ "mean_text_gate_weight": 0.506599485874176,
555
+ "std_text_gate_weight": 0.0016385371563956141
556
+ }
557
+ },
558
+ "data": {
559
+ "num_rows": 13930,
560
+ "train_rows": 10865,
561
+ "validation_rows": 1532,
562
+ "test_rows": 1533,
563
+ "symbols": [
564
+ "AAPL",
565
+ "ABNB",
566
+ "ADBE",
567
+ "ADI",
568
+ "ADP",
569
+ "AMAT",
570
+ "AMD",
571
+ "AMZN",
572
+ "ARKK",
573
+ "ASML",
574
+ "AVGO",
575
+ "BA",
576
+ "BABA",
577
+ "BAC",
578
+ "BIDU",
579
+ "BMY",
580
+ "CAT",
581
+ "CMCSA",
582
+ "COIN",
583
+ "COST",
584
+ "CRM",
585
+ "CSCO",
586
+ "CVX",
587
+ "DHR",
588
+ "DIS",
589
+ "ELF",
590
+ "GE",
591
+ "GM",
592
+ "GOOGL",
593
+ "HD",
594
+ "IBM",
595
+ "INTC",
596
+ "IOVA",
597
+ "JNJ",
598
+ "JPM",
599
+ "KO",
600
+ "LULU",
601
+ "MA",
602
+ "MCD",
603
+ "MDB",
604
+ "META",
605
+ "MRNA",
606
+ "MSFT",
607
+ "MU",
608
+ "NFLX",
609
+ "NKE",
610
+ "NVDA",
611
+ "ORCL",
612
+ "PEP",
613
+ "PFE",
614
+ "PLTR",
615
+ "PYPL",
616
+ "QCOM",
617
+ "RCL",
618
+ "SHOP",
619
+ "SNOW",
620
+ "SPOT",
621
+ "TGT",
622
+ "TSLA",
623
+ "TSM",
624
+ "TXN",
625
+ "UAL",
626
+ "UBER",
627
+ "UNH",
628
+ "UPS",
629
+ "WMT",
630
+ "XOM",
631
+ "ZM"
632
+ ],
633
+ "date_min": "2022-08-12 21:13:30+00:00",
634
+ "date_max": "2023-10-04 20:50:09+00:00",
635
+ "volatility_thresholds": {
636
+ "q33": 0.01998170108283479,
637
+ "q66": 0.049223492463942975
638
+ }
639
+ }
640
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a3f6c77bf9ca6e4ab1c9db29f57173379f7e8daeedbdf439271456979730986b
3
+ size 869703256
modeling_finimpact.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ """
2
+ Minimal loading helper for the FinImpact gated fusion checkpoint.
3
+
4
+ Use this repository with the original training script for the full class definition.
5
+ The uploaded model is a custom torch.nn.Module, not a vanilla AutoModelForSequenceClassification.
6
+ """
processed_sample.csv ADDED
The diff for this file is too large to render. See raw diff
 
test_predictions.csv ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "backend": "tokenizers",
4
+ "bos_token": "[CLS]",
5
+ "cls_token": "[CLS]",
6
+ "do_lower_case": false,
7
+ "eos_token": "[SEP]",
8
+ "extra_special_tokens": [
9
+ "[PAD]",
10
+ "[CLS]",
11
+ "[SEP]"
12
+ ],
13
+ "is_local": false,
14
+ "local_files_only": false,
15
+ "mask_token": "[MASK]",
16
+ "model_max_length": 1000000000000000019884624838656,
17
+ "pad_token": "[PAD]",
18
+ "sep_token": "[SEP]",
19
+ "split_by_punct": false,
20
+ "tokenizer_class": "DebertaV2Tokenizer",
21
+ "unk_id": 3,
22
+ "unk_token": "[UNK]",
23
+ "vocab_type": "spm"
24
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3bf5789497aeb9dde5f757914e12ec101b234de047a1d0df789825a513676b4
3
+ size 5265
training_config.json ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_repo_id": "kyLELEng/finimpact-direction1d-v4",
3
+ "encoder_name": "microsoft/deberta-v3-large",
4
+ "news_dataset_id": "NickyNicky/finance-financialmodelingprep-stock-news-sentiments-rss-feed",
5
+ "news_split": "train",
6
+ "price_dataset_id": "siddharthmb/stocks-ohlcv",
7
+ "price_dataset_file": "ohlcv.csv",
8
+ "tickers": [
9
+ "AAPL",
10
+ "MSFT",
11
+ "AMZN",
12
+ "GOOGL",
13
+ "NVDA",
14
+ "TSLA",
15
+ "AMD",
16
+ "INTC",
17
+ "JPM",
18
+ "BAC",
19
+ "V",
20
+ "MA",
21
+ "XOM",
22
+ "META",
23
+ "NFLX",
24
+ "UAL",
25
+ "HD",
26
+ "BA",
27
+ "PYPL",
28
+ "CRM",
29
+ "DIS",
30
+ "NKE",
31
+ "WMT",
32
+ "COST",
33
+ "PFE",
34
+ "MRNA",
35
+ "CVX",
36
+ "KO",
37
+ "PEP",
38
+ "CSCO",
39
+ "ABNB",
40
+ "ADBE",
41
+ "ADI",
42
+ "ADP",
43
+ "AMAT",
44
+ "ARKK",
45
+ "ASML",
46
+ "AVGO",
47
+ "BABA",
48
+ "BIDU",
49
+ "BMY",
50
+ "CAT",
51
+ "CMCSA",
52
+ "COIN",
53
+ "DHR",
54
+ "ELF",
55
+ "GE",
56
+ "GEV",
57
+ "GM",
58
+ "IBM",
59
+ "IOVA",
60
+ "JNJ",
61
+ "LULU",
62
+ "MCD",
63
+ "MDB",
64
+ "MU",
65
+ "ORCL",
66
+ "PLTR",
67
+ "QCOM",
68
+ "RCL",
69
+ "SHOP",
70
+ "SNOW",
71
+ "SPOT",
72
+ "TGT",
73
+ "TSM",
74
+ "TXN",
75
+ "UBER",
76
+ "UNH",
77
+ "UPS",
78
+ "ZM"
79
+ ],
80
+ "benchmark_symbols": [
81
+ "SPY",
82
+ "QQQ",
83
+ "DIA"
84
+ ],
85
+ "beta_window": 60,
86
+ "max_news_rows": 250000,
87
+ "max_examples": 100000,
88
+ "min_text_chars": 20,
89
+ "max_length": 256,
90
+ "train_fraction": 0.78,
91
+ "validation_fraction": 0.11,
92
+ "direction_threshold_vol_mult_1d": 0.25,
93
+ "direction_threshold_vol_mult_5d": 0.5,
94
+ "min_direction_threshold_1d": 0.0015,
95
+ "min_direction_threshold_5d": 0.003,
96
+ "fusion_dim": 256,
97
+ "dropout": 0.2,
98
+ "freeze_encoder": false,
99
+ "sentiment_loss_weight": 0.0,
100
+ "direction_1d_loss_weight": 1.0,
101
+ "direction_5d_loss_weight": 0.0,
102
+ "volatility_loss_weight": 0.0,
103
+ "class_weight_cap": 2.5,
104
+ "focal_loss_gamma": 1.5,
105
+ "return_alignment_loss_weight": 0.1,
106
+ "return_confidence_loss_weight": 0.03,
107
+ "return_scale": 0.02,
108
+ "transaction_cost_bps": 10.0,
109
+ "per_device_train_batch_size": 4,
110
+ "per_device_eval_batch_size": 32,
111
+ "gradient_accumulation_steps": 1,
112
+ "learning_rate": 2e-06,
113
+ "weight_decay": 0.01,
114
+ "warmup_ratio": 0.06,
115
+ "max_grad_norm": 0.5,
116
+ "num_train_epochs": 3.0,
117
+ "max_steps": -1,
118
+ "logging_steps": 50,
119
+ "eval_strategy": "epoch",
120
+ "save_strategy": "no",
121
+ "bf16": false,
122
+ "fp16": false,
123
+ "random_seed": 13,
124
+ "csv_chunksize": 500000,
125
+ "output_dir": "finimpact-direction1d-v4",
126
+ "push_to_hub": true
127
+ }