Spaces:
Sleeping
Sleeping
Upload app.py
Browse files
app.py
ADDED
|
@@ -0,0 +1,788 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 4 |
+
WorldQuant Alpha Swarm β Gradio UI
|
| 5 |
+
Supports: Hugging Face Inference API + Ollama (local)
|
| 6 |
+
Features:
|
| 7 |
+
β’ LLM-driven alpha generation with structured JSON prompting
|
| 8 |
+
β’ Dropdown selectors for all WQ data fields & operators
|
| 9 |
+
β’ Real-time backtest evaluation on synthetic data
|
| 10 |
+
β’ Orthogonality check vs existing library
|
| 11 |
+
β’ Multi-domain swarm mode
|
| 12 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
import json
|
| 16 |
+
import math
|
| 17 |
+
import os
|
| 18 |
+
import random
|
| 19 |
+
import re
|
| 20 |
+
import sys
|
| 21 |
+
import traceback
|
| 22 |
+
from dataclasses import dataclass
|
| 23 |
+
from typing import Dict, List, Optional, Set, Tuple
|
| 24 |
+
|
| 25 |
+
import gradio as gr
|
| 26 |
+
import numpy as np
|
| 27 |
+
import pandas as pd
|
| 28 |
+
from scipy.stats import spearmanr
|
| 29 |
+
|
| 30 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 31 |
+
# CONFIG: Model Lists
|
| 32 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 33 |
+
|
| 34 |
+
HF_MODELS = [
|
| 35 |
+
"meta-llama/Meta-Llama-3-8B-Instruct",
|
| 36 |
+
"mistralai/Mistral-7B-Instruct-v0.3",
|
| 37 |
+
"Qwen/Qwen2.5-7B-Instruct",
|
| 38 |
+
"deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
|
| 39 |
+
"microsoft/Phi-3-mini-4k-instruct",
|
| 40 |
+
"HuggingFaceH4/zephyr-7b-beta",
|
| 41 |
+
]
|
| 42 |
+
|
| 43 |
+
OLLAMA_MODELS = [
|
| 44 |
+
"llama3.2",
|
| 45 |
+
"deepseek-r1:8b",
|
| 46 |
+
"qwen2.5:7b",
|
| 47 |
+
"mistral",
|
| 48 |
+
"codellama",
|
| 49 |
+
"phi3",
|
| 50 |
+
]
|
| 51 |
+
|
| 52 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 53 |
+
# CONFIG: WorldQuant Data Fields & Operators
|
| 54 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 55 |
+
|
| 56 |
+
WQ_DATA_FIELDS = {
|
| 57 |
+
# Price / Volume
|
| 58 |
+
"open", "high", "low", "close", "volume", "vwap",
|
| 59 |
+
"returns", "returns_open", "intraday_return", "overnight_return",
|
| 60 |
+
"open_close_return", "high_low_range", "close_open_gap",
|
| 61 |
+
"num_trades", "turnover", "turnover_ratio",
|
| 62 |
+
"bid", "ask", "bid_size", "ask_size", "adv20", "adv60",
|
| 63 |
+
# Fundamentals
|
| 64 |
+
"market_cap", "pe_ratio", "pb_ratio", "ps_ratio",
|
| 65 |
+
"ev_ebitda", "ev_sales", "debt_equity", "current_ratio",
|
| 66 |
+
"roe", "roa", "roic", "gross_profit_margin",
|
| 67 |
+
"ebitda", "operating_income", "net_income", "sales", "revenue",
|
| 68 |
+
"total_assets", "total_debt", "cash", "book_value", "equity",
|
| 69 |
+
"liabilities", "assets",
|
| 70 |
+
"eps", "dps", "dividend_yield",
|
| 71 |
+
"revenue_growth", "earnings_growth", "enterprise_value", "cap",
|
| 72 |
+
"gross_income", "gross_income_reported_value",
|
| 73 |
+
# Analyst / Estimates
|
| 74 |
+
"est_eps", "est_revenue", "recommendation_mean",
|
| 75 |
+
"num_analysts", "eps_surprise", "eps_surprise_pct",
|
| 76 |
+
# Options
|
| 77 |
+
"implied_volatility_call_180", "implied_volatility_put_180",
|
| 78 |
+
"iv30", "iv60", "iv90", "put_call_ratio", "option_volume", "open_interest",
|
| 79 |
+
# Alternative
|
| 80 |
+
"realized_vol", "volatility", "skewness", "kurtosis",
|
| 81 |
+
}
|
| 82 |
+
|
| 83 |
+
WQ_OPERATORS = {
|
| 84 |
+
# Cross-section
|
| 85 |
+
"rank", "zscore", "scale", "normalize", "sign", "abs",
|
| 86 |
+
"max", "min", "greater", "less", "if_else", "cond",
|
| 87 |
+
"and", "or", "not",
|
| 88 |
+
"group_neutralize", "group_rank", "group_zscore", "group_normalize",
|
| 89 |
+
# Time-series
|
| 90 |
+
"ts_mean", "ts_std_dev", "ts_variance", "ts_zscore", "ts_rank",
|
| 91 |
+
"ts_min", "ts_max", "ts_delta", "ts_delay", "ts_return",
|
| 92 |
+
"ts_corr", "ts_cov", "ts_sum", "ts_prod", "ts_skew", "ts_kurt",
|
| 93 |
+
"ts_decay_linear", "ts_decay_exp", "ts_argmax", "ts_argmin",
|
| 94 |
+
"ts_ir", "ts_backfill", "ts_sumif", "ts_count",
|
| 95 |
+
# Special
|
| 96 |
+
"trade_when",
|
| 97 |
+
}
|
| 98 |
+
|
| 99 |
+
NEUTRALIZATION_LEVELS = ["subindustry", "industry", "sector", "market", "none"]
|
| 100 |
+
|
| 101 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 102 |
+
# SYNTHETIC DATA GENERATOR (Embedded Anomalies)
|
| 103 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 104 |
+
|
| 105 |
+
_DATA_CACHE = {}
|
| 106 |
+
|
| 107 |
+
|
| 108 |
+
def get_synthetic_data(n_stocks: int = 300, n_days: int = 252, seed: int = 2026):
|
| 109 |
+
key = (n_stocks, n_days, seed)
|
| 110 |
+
if key in _DATA_CACHE:
|
| 111 |
+
return _DATA_CACHE[key]
|
| 112 |
+
|
| 113 |
+
np.random.seed(seed)
|
| 114 |
+
dates = pd.date_range("2020-01-02", periods=n_days, freq="B")
|
| 115 |
+
stocks = [f"STK_{i:04d}" for i in range(n_stocks)]
|
| 116 |
+
|
| 117 |
+
# Persistent characteristics
|
| 118 |
+
liquidity_sens = np.random.beta(2, 5, n_stocks)
|
| 119 |
+
value_score = -np.log(np.random.lognormal(0, 0.4, n_stocks))
|
| 120 |
+
earn_vol = np.random.gamma(2, 0.03, n_stocks)
|
| 121 |
+
|
| 122 |
+
# Market factor
|
| 123 |
+
market_ret = np.random.normal(0.0003, 0.012, n_days)
|
| 124 |
+
idio_vol = np.random.uniform(0.015, 0.035, n_stocks)
|
| 125 |
+
beta = np.random.uniform(0.5, 1.5, n_stocks)
|
| 126 |
+
|
| 127 |
+
returns = np.random.normal(0, idio_vol, (n_days, n_stocks))
|
| 128 |
+
for t in range(n_days):
|
| 129 |
+
returns[t] += beta * market_ret[t]
|
| 130 |
+
|
| 131 |
+
# Embed anomalies
|
| 132 |
+
market_cap = np.random.lognormal(22, 1.2, (n_days, n_stocks))
|
| 133 |
+
market_cap = np.maximum(market_cap, 1e6)
|
| 134 |
+
volume = np.exp(np.random.normal(15, 0.5, (n_days, n_stocks)))
|
| 135 |
+
|
| 136 |
+
# ANOMALY 1: Amihud reversal
|
| 137 |
+
for t in range(5, n_days - 1):
|
| 138 |
+
amihud = np.abs(returns[t]) / (market_cap[t] * 1e-6 + 1000)
|
| 139 |
+
amihud_rank = np.argsort(np.argsort(amihud)) / (n_stocks - 1)
|
| 140 |
+
returns[t+1, amihud_rank > 0.80] -= 0.008 * liquidity_sens[amihud_rank > 0.80]
|
| 141 |
+
returns[t+1, amihud_rank < 0.20] += 0.003 * (1 - liquidity_sens[amihud_rank < 0.20])
|
| 142 |
+
|
| 143 |
+
# ANOMALY 2: PEAD
|
| 144 |
+
eps_surprise = np.zeros((n_days, n_stocks))
|
| 145 |
+
for s in range(n_stocks):
|
| 146 |
+
earn_dates = np.random.choice(range(20, n_days - 10), size=3, replace=False)
|
| 147 |
+
for ed in earn_dates:
|
| 148 |
+
surprise = np.random.normal(0, earn_vol[s])
|
| 149 |
+
eps_surprise[ed, s] = surprise
|
| 150 |
+
drift = 0.5 * surprise / (earn_vol[s] + 0.001) * 0.004
|
| 151 |
+
for d in range(1, 6):
|
| 152 |
+
if ed + d < n_days:
|
| 153 |
+
returns[ed + d, s] += drift * (1 - 0.15 * d)
|
| 154 |
+
|
| 155 |
+
# ANOMALY 3: Value premium
|
| 156 |
+
for t in range(n_days):
|
| 157 |
+
returns[t] += 0.00008 * value_score
|
| 158 |
+
|
| 159 |
+
# ANOMALY 4: VWAP pressure reversal
|
| 160 |
+
close = np.zeros((n_days, n_stocks))
|
| 161 |
+
close[0] = 100.0
|
| 162 |
+
for t in range(1, n_days):
|
| 163 |
+
close[t] = close[t-1] * (1 + returns[t])
|
| 164 |
+
|
| 165 |
+
vol_ma20 = pd.DataFrame(volume).rolling(20, min_periods=1).mean().values
|
| 166 |
+
rel_vol = volume / (vol_ma20 + 1)
|
| 167 |
+
vwap = close * (1 + 0.001 * (rel_vol - 1) * np.random.normal(0, 1, (n_days, n_stocks)))
|
| 168 |
+
|
| 169 |
+
for t in range(1, n_days - 1):
|
| 170 |
+
vwap_gap = np.abs(vwap[t] - close[t]) / close[t]
|
| 171 |
+
pressure = vwap_gap * rel_vol[t]
|
| 172 |
+
p_rank = np.argsort(np.argsort(pressure)) / (n_stocks - 1)
|
| 173 |
+
returns[t+1, p_rank > 0.90] -= 0.006 * liquidity_sens[p_rank > 0.90]
|
| 174 |
+
|
| 175 |
+
# Recalculate close with anomalies
|
| 176 |
+
close = np.zeros((n_days, n_stocks))
|
| 177 |
+
close[0] = 100.0
|
| 178 |
+
for t in range(1, n_days):
|
| 179 |
+
close[t] = close[t-1] * (1 + returns[t])
|
| 180 |
+
|
| 181 |
+
high = close * (1 + np.abs(np.random.normal(0, 0.008, close.shape)))
|
| 182 |
+
low = close * (1 - np.abs(np.random.normal(0, 0.008, close.shape)))
|
| 183 |
+
open_p = close * (1 + np.random.normal(0, 0.003, close.shape))
|
| 184 |
+
|
| 185 |
+
# Fundamentals
|
| 186 |
+
operating_income = market_cap * np.random.lognormal(-3.0, 0.6, (n_days, n_stocks))
|
| 187 |
+
ebitda = operating_income * np.random.lognormal(0.3, 0.15, (n_days, n_stocks))
|
| 188 |
+
total_debt = market_cap * np.random.lognormal(-1.8, 0.9, (n_days, n_stocks))
|
| 189 |
+
total_assets = market_cap * np.random.lognormal(0.1, 0.4, (n_days, n_stocks))
|
| 190 |
+
cash = total_assets * np.random.uniform(0.03, 0.18, (n_days, n_stocks))
|
| 191 |
+
equity = total_assets * np.random.uniform(0.35, 0.75, (n_days, n_stocks))
|
| 192 |
+
liabilities = total_assets - equity
|
| 193 |
+
enterprise_value = market_cap * np.random.uniform(1.0, 1.6, (n_days, n_stocks))
|
| 194 |
+
sales = market_cap * np.random.lognormal(-1.4, 0.35, (n_days, n_stocks))
|
| 195 |
+
eps = operating_income / (market_cap / 100) * np.random.uniform(0.3, 0.8, (n_days, n_stocks))
|
| 196 |
+
est_eps = eps * (1 + np.random.normal(0, 0.1, (n_days, n_stocks)))
|
| 197 |
+
eps_surprise_pct = eps_surprise / (np.abs(est_eps) + 0.01)
|
| 198 |
+
num_analysts = np.random.poisson(8, (n_days, n_stocks)).astype(float)
|
| 199 |
+
|
| 200 |
+
# Options
|
| 201 |
+
iv_call = np.random.uniform(0.18, 0.48, (n_days, n_stocks))
|
| 202 |
+
iv_put = iv_call + np.random.normal(0, 0.025, (n_days, n_stocks))
|
| 203 |
+
put_call_ratio = np.random.lognormal(0, 0.35, (n_days, n_stocks))
|
| 204 |
+
option_volume = volume * np.random.uniform(0.002, 0.04, (n_days, n_stocks))
|
| 205 |
+
|
| 206 |
+
realized_vol = pd.DataFrame(returns).rolling(20, min_periods=1).std().values
|
| 207 |
+
realized_vol = np.nan_to_num(realized_vol, nan=0.02)
|
| 208 |
+
|
| 209 |
+
def mkdf(arr):
|
| 210 |
+
return pd.DataFrame(arr, index=dates, columns=stocks)
|
| 211 |
+
|
| 212 |
+
data = {
|
| 213 |
+
"returns": mkdf(returns),
|
| 214 |
+
"close": mkdf(close),
|
| 215 |
+
"high": mkdf(high),
|
| 216 |
+
"low": mkdf(low),
|
| 217 |
+
"open": mkdf(open_p),
|
| 218 |
+
"volume": mkdf(volume),
|
| 219 |
+
"vwap": mkdf(vwap),
|
| 220 |
+
"market_cap": mkdf(market_cap),
|
| 221 |
+
"cap": mkdf(market_cap),
|
| 222 |
+
"operating_income": mkdf(operating_income),
|
| 223 |
+
"ebitda": mkdf(ebitda),
|
| 224 |
+
"total_debt": mkdf(total_debt),
|
| 225 |
+
"total_assets": mkdf(total_assets),
|
| 226 |
+
"cash": mkdf(cash),
|
| 227 |
+
"equity": mkdf(equity),
|
| 228 |
+
"book_value": mkdf(equity),
|
| 229 |
+
"liabilities": mkdf(liabilities),
|
| 230 |
+
"assets": mkdf(total_assets),
|
| 231 |
+
"enterprise_value": mkdf(enterprise_value),
|
| 232 |
+
"sales": mkdf(sales),
|
| 233 |
+
"revenue": mkdf(sales),
|
| 234 |
+
"eps": mkdf(eps),
|
| 235 |
+
"est_eps": mkdf(est_eps),
|
| 236 |
+
"eps_surprise": mkdf(eps_surprise),
|
| 237 |
+
"eps_surprise_pct": mkdf(eps_surprise_pct),
|
| 238 |
+
"num_analysts": mkdf(num_analysts),
|
| 239 |
+
"implied_volatility_call_180": mkdf(iv_call),
|
| 240 |
+
"implied_volatility_put_180": mkdf(iv_put),
|
| 241 |
+
"put_call_ratio": mkdf(put_call_ratio),
|
| 242 |
+
"option_volume": mkdf(option_volume),
|
| 243 |
+
"realized_vol": mkdf(realized_vol),
|
| 244 |
+
"adv20": mkdf(pd.DataFrame(volume).rolling(20, min_periods=1).mean().values),
|
| 245 |
+
"turnover": mkdf(volume / (market_cap + 1)),
|
| 246 |
+
"turnover_ratio": mkdf(volume / (market_cap + 1)),
|
| 247 |
+
"volatility": mkdf(realized_vol),
|
| 248 |
+
"debt_equity": mkdf(total_debt / (equity + 1)),
|
| 249 |
+
"current_ratio": mkdf(np.random.uniform(0.8, 2.5, (n_days, n_stocks))),
|
| 250 |
+
"roe": mkdf(operating_income / (equity + 1)),
|
| 251 |
+
"roa": mkdf(operating_income / (total_assets + 1)),
|
| 252 |
+
"gross_profit_margin": mkdf(np.random.uniform(0.2, 0.6, (n_days, n_stocks))),
|
| 253 |
+
"pe_ratio": mkdf(np.random.lognormal(2.5, 0.5, (n_days, n_stocks))),
|
| 254 |
+
"pb_ratio": mkdf(close / (equity / (market_cap / 100) + 0.01)),
|
| 255 |
+
"ev_ebitda": mkdf(enterprise_value / (ebitda + 1)),
|
| 256 |
+
"net_income": mkdf(operating_income * np.random.uniform(0.5, 0.9, (n_days, n_stocks))),
|
| 257 |
+
"dividend_yield": mkdf(np.random.uniform(0, 0.05, (n_days, n_stocks))),
|
| 258 |
+
"earnings_growth": mkdf(np.random.normal(0.05, 0.15, (n_days, n_stocks))),
|
| 259 |
+
"revenue_growth": mkdf(np.random.normal(0.05, 0.15, (n_days, n_stocks))),
|
| 260 |
+
"gross_income": mkdf(operating_income * np.random.uniform(1.2, 1.5, (n_days, n_stocks))),
|
| 261 |
+
"gross_income_reported_value": mkdf(operating_income * np.random.uniform(1.2, 1.5, (n_days, n_stocks))),
|
| 262 |
+
"iv30": mkdf(np.random.uniform(0.18, 0.48, (n_days, n_stocks))),
|
| 263 |
+
"iv60": mkdf(np.random.uniform(0.18, 0.48, (n_days, n_stocks))),
|
| 264 |
+
"iv90": mkdf(np.random.uniform(0.18, 0.48, (n_days, n_stocks))),
|
| 265 |
+
"open_interest": mkdf(option_volume * np.random.uniform(5, 20, (n_days, n_stocks))),
|
| 266 |
+
"bid": mkdf(close * (1 - np.random.uniform(0, 0.001, (n_days, n_stocks)))),
|
| 267 |
+
"ask": mkdf(close * (1 + np.random.uniform(0, 0.001, (n_days, n_stocks)))),
|
| 268 |
+
"bid_size": mkdf(np.random.poisson(1000, (n_days, n_stocks))),
|
| 269 |
+
"ask_size": mkdf(np.random.poisson(1000, (n_days, n_stocks))),
|
| 270 |
+
"returns_open": mkdf(np.random.normal(0.0002, 0.02, (n_days, n_stocks))),
|
| 271 |
+
"intraday_return": mkdf(returns - np.random.normal(0.0001, 0.01, (n_days, n_stocks))),
|
| 272 |
+
"overnight_return": mkdf(np.random.normal(0.0001, 0.01, (n_days, n_stocks))),
|
| 273 |
+
"high_low_range": mkdf((high - low) / close),
|
| 274 |
+
"close_open_gap": mkdf((close - open_p) / open_p),
|
| 275 |
+
"est_revenue": mkdf(sales * (1 + np.random.normal(0, 0.05, (n_days, n_stocks)))),
|
| 276 |
+
"recommendation_mean": mkdf(np.random.uniform(1.5, 4.5, (n_days, n_stocks))),
|
| 277 |
+
"roic": mkdf(operating_income / (total_assets + 1)),
|
| 278 |
+
"ev_sales": mkdf(enterprise_value / (sales + 1)),
|
| 279 |
+
"num_trades": mkdf(np.random.poisson(5000, (n_days, n_stocks))),
|
| 280 |
+
"skewness": mkdf(pd.DataFrame(returns).rolling(20, min_periods=1).skew().values),
|
| 281 |
+
"kurtosis": mkdf(pd.DataFrame(returns).rolling(20, min_periods=1).kurt().values),
|
| 282 |
+
}
|
| 283 |
+
|
| 284 |
+
fwd = data["returns"].shift(-1)
|
| 285 |
+
result = (data, fwd)
|
| 286 |
+
_DATA_CACHE[key] = result
|
| 287 |
+
return result
|
| 288 |
+
|
| 289 |
+
|
| 290 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 291 |
+
# ALPHA EVALUATOR
|
| 292 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 293 |
+
|
| 294 |
+
def evaluate_alpha(expr: str, data: dict, fwd: pd.DataFrame, min_days: int = 50):
|
| 295 |
+
"""Evaluate a WQ expression and return metrics."""
|
| 296 |
+
ns = dict(data)
|
| 297 |
+
ns["rank"] = lambda df: df.rank(axis=1, pct=True)
|
| 298 |
+
ns["zscore"] = lambda df: (df - df.mean(axis=1).values[:, None]) / (df.std(axis=1).values[:, None] + 0.0001)
|
| 299 |
+
ns["sign"] = np.sign
|
| 300 |
+
ns["abs"] = np.abs
|
| 301 |
+
ns["ts_mean"] = lambda df, w: df.rolling(window=int(w), min_periods=1).mean()
|
| 302 |
+
ns["ts_std_dev"] = lambda df, w: df.rolling(window=int(w), min_periods=1).std()
|
| 303 |
+
ns["ts_rank"] = lambda df, w: df.rolling(window=int(w), min_periods=1).apply(
|
| 304 |
+
lambda x: np.argsort(np.argsort(x))[-1] / max(len(x) - 1, 1) if len(x) > 1 else 0.5, raw=True
|
| 305 |
+
)
|
| 306 |
+
ns["ts_min"] = lambda df, w: df.rolling(window=int(w), min_periods=1).min()
|
| 307 |
+
ns["ts_max"] = lambda df, w: df.rolling(window=int(w), min_periods=1).max()
|
| 308 |
+
ns["ts_delta"] = lambda df, w: df - df.shift(int(w))
|
| 309 |
+
ns["ts_delay"] = lambda df, w: df.shift(int(w))
|
| 310 |
+
ns["ts_return"] = lambda df, w: df / df.shift(int(w)) - 1
|
| 311 |
+
ns["ts_sum"] = lambda df, w: df.rolling(window=int(w), min_periods=1).sum()
|
| 312 |
+
ns["ts_backfill"] = lambda df, w: df.rolling(window=int(w), min_periods=1).apply(
|
| 313 |
+
lambda x: pd.Series(x).ffill().iloc[-1], raw=True
|
| 314 |
+
)
|
| 315 |
+
ns["ts_decay_linear"] = lambda df, w: _ts_decay_fast(df, int(w))
|
| 316 |
+
ns["group_neutralize"] = lambda df, _: df - df.mean(axis=1).values[:, None]
|
| 317 |
+
ns["group_rank"] = lambda df, _: df.rank(axis=1, pct=True)
|
| 318 |
+
ns["greater"] = lambda a, b: (a > b).astype(float)
|
| 319 |
+
ns["less"] = lambda a, b: (a < b).astype(float)
|
| 320 |
+
ns["if_else"] = lambda c, a, b: np.where(c, a, b)
|
| 321 |
+
ns["and"] = lambda a, b: ((a > 0) & (b > 0)).astype(float)
|
| 322 |
+
ns["or"] = lambda a, b: ((a > 0) | (b > 0)).astype(float)
|
| 323 |
+
ns["not"] = lambda a: (a <= 0).astype(float)
|
| 324 |
+
ns["max"] = np.maximum
|
| 325 |
+
ns["min"] = np.minimum
|
| 326 |
+
ns["trade_when"] = lambda c, a, b: np.where(c > 0, a, b)
|
| 327 |
+
|
| 328 |
+
try:
|
| 329 |
+
result = eval(expr, {"__builtins__": {}}, ns)
|
| 330 |
+
if not isinstance(result, pd.DataFrame):
|
| 331 |
+
return {"valid": False, "error": "Not a DataFrame"}
|
| 332 |
+
except Exception as e:
|
| 333 |
+
return {"valid": False, "error": str(e)[:200]}
|
| 334 |
+
|
| 335 |
+
valid_idx = result.index[min_days::5]
|
| 336 |
+
ic_vals = []
|
| 337 |
+
rank_ic_vals = []
|
| 338 |
+
|
| 339 |
+
for date in valid_idx:
|
| 340 |
+
a = result.loc[date].dropna()
|
| 341 |
+
f = fwd.loc[date].dropna()
|
| 342 |
+
common = a.index.intersection(f.index)
|
| 343 |
+
if len(common) < 30:
|
| 344 |
+
continue
|
| 345 |
+
a, f = a[common], f[common]
|
| 346 |
+
if a.std() > 0 and f.std() > 0:
|
| 347 |
+
ic_vals.append(np.corrcoef(a, f)[0, 1])
|
| 348 |
+
if len(set(a)) > 1 and len(set(f)) > 1:
|
| 349 |
+
r, _ = spearmanr(a, f)
|
| 350 |
+
if not np.isnan(r):
|
| 351 |
+
rank_ic_vals.append(r)
|
| 352 |
+
|
| 353 |
+
ic = np.nanmean(ic_vals) if ic_vals else 0
|
| 354 |
+
rank_ic = np.nanmean(rank_ic_vals) if rank_ic_vals else 0
|
| 355 |
+
ic_std = np.nanstd(ic_vals) if ic_vals else 0.001
|
| 356 |
+
icir = ic / (ic_std + 0.0001)
|
| 357 |
+
sharpe = min(icir * math.sqrt(252) / 3, 5.0)
|
| 358 |
+
|
| 359 |
+
rnk = result.rank(axis=1)
|
| 360 |
+
corr_vals = []
|
| 361 |
+
for i in range(1, min(len(rnk), 100)):
|
| 362 |
+
a1 = rnk.iloc[i-1].dropna()
|
| 363 |
+
a2 = rnk.iloc[i].dropna()
|
| 364 |
+
common = a1.index.intersection(a2.index)
|
| 365 |
+
if len(common) > 20:
|
| 366 |
+
c = np.corrcoef(a1[common], a2[common])[0, 1]
|
| 367 |
+
if not np.isnan(c):
|
| 368 |
+
corr_vals.append(c)
|
| 369 |
+
|
| 370 |
+
avg_corr = np.mean(corr_vals) if corr_vals else 0.8
|
| 371 |
+
turnover = max(0, (1 - avg_corr) * 100)
|
| 372 |
+
max_dd = max(2.0, turnover * 0.15)
|
| 373 |
+
|
| 374 |
+
return {
|
| 375 |
+
"valid": True,
|
| 376 |
+
"ic": round(ic, 4),
|
| 377 |
+
"rank_ic": round(rank_ic, 4),
|
| 378 |
+
"sharpe": round(sharpe, 3),
|
| 379 |
+
"turnover": round(turnover, 1),
|
| 380 |
+
"max_dd": round(max_dd, 2),
|
| 381 |
+
}
|
| 382 |
+
|
| 383 |
+
|
| 384 |
+
def _ts_decay_fast(df, window):
|
| 385 |
+
w = window
|
| 386 |
+
weights = np.arange(1, w + 1)
|
| 387 |
+
weights = weights / weights.sum()
|
| 388 |
+
return df.rolling(window=w, min_periods=1).apply(
|
| 389 |
+
lambda x: np.dot(x[-len(weights):], weights[-len(x):]), raw=True
|
| 390 |
+
)
|
| 391 |
+
|
| 392 |
+
|
| 393 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 394 |
+
# LLM PROMPT ENGINE
|
| 395 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 396 |
+
|
| 397 |
+
def build_prompt(fields: List[str], operators: List[str], domain: str, existing_alphas: str, num_alphas: int) -> str:
|
| 398 |
+
fields_str = ", ".join(fields)
|
| 399 |
+
ops_str = ", ".join(operators)
|
| 400 |
+
|
| 401 |
+
prompt = f"""You are a senior quantitative researcher at Renaissance Technologies. Your task is to generate {num_alphas} novel formulaic alphas for a WorldQuant BRAIN competition.
|
| 402 |
+
|
| 403 |
+
AVAILABLE DATA FIELDS:
|
| 404 |
+
{fields_str}
|
| 405 |
+
|
| 406 |
+
AVAILABLE OPERATORS:
|
| 407 |
+
{ops_str}
|
| 408 |
+
|
| 409 |
+
DOMAIN TO FOCUS ON: {domain}
|
| 410 |
+
|
| 411 |
+
EXISTING ALPHA LIBRARY (DO NOT REPLICATE):
|
| 412 |
+
{existing_alphas[:2000] if existing_alphas else "None β this is the first generation."}
|
| 413 |
+
|
| 414 |
+
REQUIREMENTS FOR EACH ALPHA:
|
| 415 |
+
1. Expression must be a SINGLE valid WorldQuant BRAIN expression (no comments, no semicolons as separators)
|
| 416 |
+
2. Use only the listed operators and data fields
|
| 417 |
+
3. All division must include + 0.000001 guard to prevent division by zero
|
| 418 |
+
4. Must end with group_neutralize(score, subindustry) or group_neutralize(rank(score), subindustry)
|
| 419 |
+
5. Must be dimensionless (no units)
|
| 420 |
+
6. At least 2 distinct operations (not just rank(close))
|
| 421 |
+
7. Max 5 named parameters per expression
|
| 422 |
+
8. Should exploit cross-sectional predictability, not time-series momentum alone
|
| 423 |
+
|
| 424 |
+
OUTPUT FORMAT β Return ONLY a JSON array with exactly {num_alphas} objects. Each object must have:
|
| 425 |
+
{{
|
| 426 |
+
"name": "short descriptive name",
|
| 427 |
+
"description": "one-sentence economic rationale",
|
| 428 |
+
"expression": "the full WQ expression as a single string",
|
| 429 |
+
"domain": "which domain this belongs to",
|
| 430 |
+
"neutralization": "subindustry"
|
| 431 |
+
}}
|
| 432 |
+
|
| 433 |
+
Do not include markdown code fences. Return raw JSON only."""
|
| 434 |
+
return prompt
|
| 435 |
+
|
| 436 |
+
|
| 437 |
+
def call_hf_model(model_name: str, prompt: str, temperature: float = 0.7, max_tokens: int = 2048):
|
| 438 |
+
try:
|
| 439 |
+
from huggingface_hub import InferenceClient
|
| 440 |
+
token = os.environ.get("HF_TOKEN", "")
|
| 441 |
+
client = InferenceClient(token=token if token else None)
|
| 442 |
+
|
| 443 |
+
response = client.chat_completion(
|
| 444 |
+
model=model_name,
|
| 445 |
+
messages=[{"role": "user", "content": prompt}],
|
| 446 |
+
max_tokens=max_tokens,
|
| 447 |
+
temperature=temperature,
|
| 448 |
+
)
|
| 449 |
+
return response.choices[0].message.content
|
| 450 |
+
except Exception as e:
|
| 451 |
+
return f"ERROR: {str(e)}"
|
| 452 |
+
|
| 453 |
+
|
| 454 |
+
def call_ollama_model(model_name: str, prompt: str, temperature: float = 0.7):
|
| 455 |
+
try:
|
| 456 |
+
import ollama
|
| 457 |
+
response = ollama.generate(
|
| 458 |
+
model=model_name,
|
| 459 |
+
prompt=prompt,
|
| 460 |
+
format="json",
|
| 461 |
+
options={"temperature": temperature, "num_predict": 2048},
|
| 462 |
+
)
|
| 463 |
+
return response["response"]
|
| 464 |
+
except Exception as e:
|
| 465 |
+
return f"ERROR: {str(e)}"
|
| 466 |
+
|
| 467 |
+
|
| 468 |
+
def parse_alpha_json(raw_text: str) -> List[Dict]:
|
| 469 |
+
text = raw_text.strip()
|
| 470 |
+
if text.startswith("```"):
|
| 471 |
+
text = text.split("\n", 1)[1]
|
| 472 |
+
if text.endswith("```"):
|
| 473 |
+
text = text.rsplit("\n", 1)[0]
|
| 474 |
+
text = text.strip()
|
| 475 |
+
|
| 476 |
+
try:
|
| 477 |
+
return json.loads(text)
|
| 478 |
+
except json.JSONDecodeError:
|
| 479 |
+
pass
|
| 480 |
+
|
| 481 |
+
match = re.search(r'\[.*\]', text, re.DOTALL)
|
| 482 |
+
if match:
|
| 483 |
+
try:
|
| 484 |
+
return json.loads(match.group())
|
| 485 |
+
except:
|
| 486 |
+
pass
|
| 487 |
+
|
| 488 |
+
if not text.endswith("]"):
|
| 489 |
+
text = text.rsplit("}", 1)[0] + "}]"
|
| 490 |
+
try:
|
| 491 |
+
return json.loads(text)
|
| 492 |
+
except:
|
| 493 |
+
pass
|
| 494 |
+
|
| 495 |
+
return []
|
| 496 |
+
|
| 497 |
+
|
| 498 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 499 |
+
# SWARM GENERATION LOGIC
|
| 500 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 501 |
+
|
| 502 |
+
DOMAINS = [
|
| 503 |
+
"Liquidity Shock Reversal (Amihud, volume acceleration, VWAP pressure)",
|
| 504 |
+
"Post-Earnings Announcement Drift (eps_surprise, SUE, analyst revisions)",
|
| 505 |
+
"Capital Structure / Distress Quality (debt coverage, interest coverage, cash ratios)",
|
| 506 |
+
"Options Market Flow & Skew (put_call_ratio, IV term structure, option volume)",
|
| 507 |
+
"Nonlinear Factor Interactions (multiplicative combinations of orthogonal signals)",
|
| 508 |
+
"Cross-Sectional Dispersion / Beta Timing (idiosyncratic vol, comovement deviation)",
|
| 509 |
+
"Seasonality & Calendar Effects (intra-month, day-of-week, turn-of-month)",
|
| 510 |
+
"News Sentiment / Text Signals (earnings tone, headline sentiment)",
|
| 511 |
+
"Short Interest / Borrow Cost (utilization, short interest changes)",
|
| 512 |
+
"Institutional Flow (13F ownership changes)",
|
| 513 |
+
]
|
| 514 |
+
|
| 515 |
+
|
| 516 |
+
EXAMPLE_ALPHAS = [
|
| 517 |
+
"group_neutralize(rank(ts_mean(abs(returns) / (close * volume + 0.000001), 5) / (ts_mean(abs(returns) / (close * volume + 0.000001), 63) + 0.000001)), subindustry)",
|
| 518 |
+
"group_neutralize(rank(eps_surprise / (abs(est_eps) + 0.000001)), subindustry)",
|
| 519 |
+
"group_neutralize(rank(operating_income / (total_debt + 0.000001)), subindustry)",
|
| 520 |
+
"group_neutralize(rank(-put_call_ratio) * rank(iv30 - iv90), industry)",
|
| 521 |
+
"group_neutralize(rank(zscore(ts_rank(operating_income / (cap + 0.000001), 252))) * rank(zscore(ts_rank(-returns, 20))), subindustry)",
|
| 522 |
+
]
|
| 523 |
+
|
| 524 |
+
|
| 525 |
+
def generate_alphas(
|
| 526 |
+
backend: str,
|
| 527 |
+
model_name: str,
|
| 528 |
+
fields: List[str],
|
| 529 |
+
operators: List[str],
|
| 530 |
+
domain: str,
|
| 531 |
+
num_alphas: int,
|
| 532 |
+
temperature: float,
|
| 533 |
+
existing_alphas_text: str,
|
| 534 |
+
progress=gr.Progress(),
|
| 535 |
+
):
|
| 536 |
+
progress(0.1, desc="Building prompt...")
|
| 537 |
+
prompt = build_prompt(fields, operators, domain, existing_alphas_text, num_alphas)
|
| 538 |
+
|
| 539 |
+
progress(0.2, desc=f"Calling {backend} model: {model_name}...")
|
| 540 |
+
if backend == "Hugging Face":
|
| 541 |
+
raw_response = call_hf_model(model_name, prompt, temperature)
|
| 542 |
+
else:
|
| 543 |
+
raw_response = call_ollama_model(model_name, prompt, temperature)
|
| 544 |
+
|
| 545 |
+
if raw_response.startswith("ERROR:"):
|
| 546 |
+
return [], f"β {raw_response}", ""
|
| 547 |
+
|
| 548 |
+
progress(0.5, desc="Parsing response...")
|
| 549 |
+
alphas = parse_alpha_json(raw_response)
|
| 550 |
+
if not alphas:
|
| 551 |
+
return [], f"β Could not parse LLM response. Raw output:\n\n{raw_response[:1000]}", ""
|
| 552 |
+
|
| 553 |
+
progress(0.6, desc="Preparing evaluation data...")
|
| 554 |
+
data, fwd = get_synthetic_data()
|
| 555 |
+
|
| 556 |
+
results = []
|
| 557 |
+
progress_steps = len(alphas)
|
| 558 |
+
for i, alpha in enumerate(alphas):
|
| 559 |
+
progress(0.6 + 0.35 * (i / progress_steps), desc=f"Evaluating alpha {i+1}/{len(alphas)}...")
|
| 560 |
+
expr = alpha.get("expression", "")
|
| 561 |
+
if not expr:
|
| 562 |
+
continue
|
| 563 |
+
score = evaluate_alpha(expr, data, fwd)
|
| 564 |
+
alpha.update(score)
|
| 565 |
+
alpha["composite"] = (
|
| 566 |
+
0.35 * score.get("sharpe", 0) +
|
| 567 |
+
0.25 * score.get("ic", 0) * 10 +
|
| 568 |
+
0.20 * score.get("rank_ic", 0) * 10 -
|
| 569 |
+
0.10 * (score.get("turnover", 0) / 100) -
|
| 570 |
+
0.10 * (score.get("max_dd", 0) / 100)
|
| 571 |
+
) if score.get("valid") else -999
|
| 572 |
+
results.append(alpha)
|
| 573 |
+
|
| 574 |
+
progress(1.0, desc="Done!")
|
| 575 |
+
results.sort(key=lambda x: x.get("composite", -999), reverse=True)
|
| 576 |
+
|
| 577 |
+
report_lines = ["# Generated Alpha Report\n"]
|
| 578 |
+
for i, r in enumerate(results, 1):
|
| 579 |
+
status = "β
VALID" if r.get("valid") else "β INVALID"
|
| 580 |
+
report_lines.append(f"\n## Alpha {i}: {r.get('name', 'Unnamed')} {status}")
|
| 581 |
+
report_lines.append(f"**Domain:** {r.get('domain', 'Unknown')}")
|
| 582 |
+
report_lines.append(f"**Description:** {r.get('description', 'N/A')}")
|
| 583 |
+
report_lines.append(f"```\n{r.get('expression', 'N/A')}\n```")
|
| 584 |
+
if r.get("valid"):
|
| 585 |
+
report_lines.append(f"| Metric | Value |")
|
| 586 |
+
report_lines.append(f"|--------|-------|")
|
| 587 |
+
report_lines.append(f"| Sharpe | {r.get('sharpe', 'N/A')} |")
|
| 588 |
+
report_lines.append(f"| IC | {r.get('ic', 'N/A')} |")
|
| 589 |
+
report_lines.append(f"| Rank IC | {r.get('rank_ic', 'N/A')} |")
|
| 590 |
+
report_lines.append(f"| Turnover | {r.get('turnover', 'N/A')}% |")
|
| 591 |
+
report_lines.append(f"| Max DD | {r.get('max_dd', 'N/A')}% |")
|
| 592 |
+
report_lines.append(f"| Composite | {round(r.get('composite', 0), 3)} |")
|
| 593 |
+
else:
|
| 594 |
+
report_lines.append(f"**Error:** {r.get('error', 'Unknown')}")
|
| 595 |
+
|
| 596 |
+
return results, "\n".join(report_lines), raw_response
|
| 597 |
+
|
| 598 |
+
|
| 599 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 600 |
+
# GRADIO UI
|
| 601 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 602 |
+
|
| 603 |
+
with gr.Blocks(title="WorldQuant Alpha Swarmβ’", theme=gr.themes.Soft()) as demo:
|
| 604 |
+
gr.Markdown("""
|
| 605 |
+
# π MicroFish Swarmβ’ β WorldQuant Alpha Discovery
|
| 606 |
+
### LLM-Powered Formulaic Alpha Generation with Real-Time Backtesting
|
| 607 |
+
""")
|
| 608 |
+
|
| 609 |
+
with gr.Tab("π― Generate Alphas"):
|
| 610 |
+
with gr.Row():
|
| 611 |
+
with gr.Column(scale=1):
|
| 612 |
+
backend = gr.Dropdown(
|
| 613 |
+
choices=["Hugging Face", "Ollama"],
|
| 614 |
+
value="Hugging Face",
|
| 615 |
+
label="Backend",
|
| 616 |
+
)
|
| 617 |
+
model_dropdown = gr.Dropdown(
|
| 618 |
+
choices=HF_MODELS,
|
| 619 |
+
value=HF_MODELS[0],
|
| 620 |
+
label="Model",
|
| 621 |
+
)
|
| 622 |
+
temperature = gr.Slider(
|
| 623 |
+
minimum=0.1,
|
| 624 |
+
maximum=1.5,
|
| 625 |
+
value=0.7,
|
| 626 |
+
step=0.1,
|
| 627 |
+
label="Temperature",
|
| 628 |
+
)
|
| 629 |
+
num_alphas = gr.Slider(
|
| 630 |
+
minimum=1,
|
| 631 |
+
maximum=10,
|
| 632 |
+
value=3,
|
| 633 |
+
step=1,
|
| 634 |
+
label="Number of Alphas to Generate",
|
| 635 |
+
)
|
| 636 |
+
domain_focus = gr.Dropdown(
|
| 637 |
+
choices=DOMAINS,
|
| 638 |
+
value=DOMAINS[0],
|
| 639 |
+
label="Domain Focus",
|
| 640 |
+
)
|
| 641 |
+
|
| 642 |
+
with gr.Column(scale=2):
|
| 643 |
+
fields_select = gr.Dropdown(
|
| 644 |
+
choices=sorted(WQ_DATA_FIELDS),
|
| 645 |
+
value=sorted(["close", "volume", "returns", "vwap", "market_cap", "operating_income", "ebitda", "eps_surprise", "put_call_ratio", "iv30", "iv90", "total_debt"]),
|
| 646 |
+
multiselect=True,
|
| 647 |
+
label="Available Data Fields",
|
| 648 |
+
)
|
| 649 |
+
operators_select = gr.Dropdown(
|
| 650 |
+
choices=sorted(WQ_OPERATORS),
|
| 651 |
+
value=sorted(["rank", "zscore", "ts_mean", "ts_std_dev", "ts_rank", "ts_decay_linear", "group_neutralize", "abs", "sign", "greater", "if_else", "trade_when"]),
|
| 652 |
+
multiselect=True,
|
| 653 |
+
label="Available Operators",
|
| 654 |
+
)
|
| 655 |
+
existing_alphas = gr.Textbox(
|
| 656 |
+
label="Existing Alpha Library (paste expressions to avoid redundancy)",
|
| 657 |
+
lines=4,
|
| 658 |
+
value="\n".join(EXAMPLE_ALPHAS),
|
| 659 |
+
)
|
| 660 |
+
|
| 661 |
+
def update_models(backend_choice):
|
| 662 |
+
return gr.Dropdown(choices=HF_MODELS if backend_choice == "Hugging Face" else OLLAMA_MODELS)
|
| 663 |
+
|
| 664 |
+
backend.change(update_models, inputs=backend, outputs=model_dropdown)
|
| 665 |
+
|
| 666 |
+
generate_btn = gr.Button("π Generate & Evaluate Alphas", variant="primary", size="lg")
|
| 667 |
+
|
| 668 |
+
with gr.Row():
|
| 669 |
+
with gr.Column(scale=1):
|
| 670 |
+
results_json = gr.JSON(label="Structured Results", visible=True)
|
| 671 |
+
with gr.Column(scale=2):
|
| 672 |
+
report_md = gr.Markdown(label="Evaluation Report")
|
| 673 |
+
|
| 674 |
+
with gr.Row():
|
| 675 |
+
raw_output = gr.Textbox(label="Raw LLM Response (for debugging)", lines=6)
|
| 676 |
+
|
| 677 |
+
generate_btn.click(
|
| 678 |
+
fn=generate_alphas,
|
| 679 |
+
inputs=[backend, model_dropdown, fields_select, operators_select, domain_focus, num_alphas, temperature, existing_alphas],
|
| 680 |
+
outputs=[results_json, report_md, raw_output],
|
| 681 |
+
)
|
| 682 |
+
|
| 683 |
+
with gr.Tab("π Evaluate Custom Expression"):
|
| 684 |
+
with gr.Row():
|
| 685 |
+
with gr.Column(scale=2):
|
| 686 |
+
custom_expr = gr.Textbox(
|
| 687 |
+
label="WorldQuant BRAIN Expression",
|
| 688 |
+
lines=4,
|
| 689 |
+
value="group_neutralize(rank(ts_decay_linear(rank(abs(returns) / (close * volume + 0.000001)), 3)), subindustry)",
|
| 690 |
+
)
|
| 691 |
+
eval_btn = gr.Button("π Evaluate", variant="primary")
|
| 692 |
+
with gr.Column(scale=1):
|
| 693 |
+
eval_result = gr.JSON(label="Metrics")
|
| 694 |
+
|
| 695 |
+
def evaluate_custom(expr):
|
| 696 |
+
data, fwd = get_synthetic_data()
|
| 697 |
+
return evaluate_alpha(expr, data, fwd)
|
| 698 |
+
|
| 699 |
+
eval_btn.click(fn=evaluate_custom, inputs=custom_expr, outputs=eval_result)
|
| 700 |
+
|
| 701 |
+
with gr.Tab("π Reference"):
|
| 702 |
+
gr.Markdown("""
|
| 703 |
+
## WorldQuant BRAIN Operator Reference
|
| 704 |
+
|
| 705 |
+
### Cross-Section Operators
|
| 706 |
+
| Operator | Description |
|
| 707 |
+
|----------|-------------|
|
| 708 |
+
| `rank(x)` | Percentile rank (0-1) across stocks |
|
| 709 |
+
| `zscore(x)` | Demean and scale to std=1 |
|
| 710 |
+
| `scale(x)` | Normalize to unit sum |
|
| 711 |
+
| `sign(x)` | Sign function |
|
| 712 |
+
| `abs(x)` | Absolute value |
|
| 713 |
+
| `max(x,y)` / `min(x,y)` | Element-wise max/min |
|
| 714 |
+
| `greater(x,y)` | 1 if x>y else 0 |
|
| 715 |
+
| `less(x,y)` | 1 if x<y else 0 |
|
| 716 |
+
| `if_else(c,x,y)` | x if c else y |
|
| 717 |
+
| `and(x,y)` / `or(x,y)` / `not(x)` | Boolean logic |
|
| 718 |
+
| `group_neutralize(x, level)` | Demean within group |
|
| 719 |
+
| `group_rank(x, level)` | Rank within group |
|
| 720 |
+
|
| 721 |
+
### Time-Series Operators
|
| 722 |
+
| Operator | Description |
|
| 723 |
+
|----------|-------------|
|
| 724 |
+
| `ts_mean(x, d)` | d-day rolling mean |
|
| 725 |
+
| `ts_std_dev(x, d)` | d-day rolling std |
|
| 726 |
+
| `ts_rank(x, d)` | Rolling rank within history |
|
| 727 |
+
| `ts_min(x, d)` / `ts_max(x, d)` | Rolling min/max |
|
| 728 |
+
| `ts_delta(x, d)` | x[t] - x[t-d] |
|
| 729 |
+
| `ts_delay(x, d)` | x[t-d] |
|
| 730 |
+
| `ts_return(x, d)` | x[t]/x[t-d] - 1 |
|
| 731 |
+
| `ts_corr(x, y, d)` | Rolling correlation |
|
| 732 |
+
| `ts_sum(x, d)` | Rolling sum |
|
| 733 |
+
| `ts_decay_linear(x, d)` | Linear decay-weighted average |
|
| 734 |
+
| `ts_decay_exp(x, d)` | Exponential decay-weighted |
|
| 735 |
+
| `ts_backfill(x, d)` | Forward fill within window |
|
| 736 |
+
| `trade_when(cond, x, y)` | x if cond else y |
|
| 737 |
+
|
| 738 |
+
### Key Data Fields
|
| 739 |
+
| Category | Fields |
|
| 740 |
+
|----------|--------|
|
| 741 |
+
| Price/Volume | `open`, `high`, `low`, `close`, `volume`, `vwap`, `returns`, `adv20`, `adv60` |
|
| 742 |
+
| Fundamentals | `market_cap`, `operating_income`, `ebitda`, `total_debt`, `total_assets`, `cash`, `equity`, `enterprise_value`, `sales`, `revenue`, `eps` |
|
| 743 |
+
| Analyst | `est_eps`, `eps_surprise`, `eps_surprise_pct`, `num_analysts`, `recommendation_mean` |
|
| 744 |
+
| Options | `implied_volatility_call_180`, `implied_volatility_put_180`, `iv30`, `iv60`, `iv90`, `put_call_ratio`, `option_volume` |
|
| 745 |
+
| Alternative | `realized_vol`, `volatility`, `skewness`, `kurtosis` |
|
| 746 |
+
|
| 747 |
+
## Tips for Strong Alphas
|
| 748 |
+
1. **Dimensionless** β rank or zscore before combining different metrics
|
| 749 |
+
2. **Guard divisions** β always add `+ 0.000001` to denominators
|
| 750 |
+
3. **Neutralize** β end with `group_neutralize(..., subindustry)`
|
| 751 |
+
4. **Decay smooth** β use `ts_decay_linear(expr, 3-10)` for noisy signals
|
| 752 |
+
5. **Multiplicative intersections** β `rank(a) * rank(b)` > `a + b` for orthogonal signals
|
| 753 |
+
6. **Cross-sectional** β the signal must differentiate stocks, not predict time
|
| 754 |
+
""")
|
| 755 |
+
|
| 756 |
+
with gr.Tab("π§ Settings"):
|
| 757 |
+
gr.Markdown("""
|
| 758 |
+
### Hugging Face Setup
|
| 759 |
+
Set your HF token as an environment variable:
|
| 760 |
+
```bash
|
| 761 |
+
export HF_TOKEN=your_token_here
|
| 762 |
+
```
|
| 763 |
+
Or pass it when launching:
|
| 764 |
+
```bash
|
| 765 |
+
HF_TOKEN=xxx python app.py
|
| 766 |
+
```
|
| 767 |
+
|
| 768 |
+
### Ollama Setup
|
| 769 |
+
1. Install Ollama: https://ollama.com
|
| 770 |
+
2. Pull a model: `ollama pull deepseek-r1:8b`
|
| 771 |
+
3. Ensure Ollama is running locally (default: http://localhost:11434)
|
| 772 |
+
|
| 773 |
+
### Deployment to Hugging Face Spaces
|
| 774 |
+
```bash
|
| 775 |
+
# Create a Space with Gradio SDK
|
| 776 |
+
# Push app.py + requirements.txt
|
| 777 |
+
# requirements.txt contents:
|
| 778 |
+
gradio>=4.0
|
| 779 |
+
numpy
|
| 780 |
+
pandas
|
| 781 |
+
scipy
|
| 782 |
+
huggingface_hub
|
| 783 |
+
ollama
|
| 784 |
+
```
|
| 785 |
+
""")
|
| 786 |
+
|
| 787 |
+
if __name__ == "__main__":
|
| 788 |
+
demo.launch(server_name="0.0.0.0", server_port=7860, share=True)
|