gaurv007 commited on
Commit
ba3c7ad
Β·
verified Β·
1 Parent(s): 3dd4a9b

fix: honest README with proper metadata"

Browse files
Files changed (1) hide show
  1. README.md +26 -74
README.md CHANGED
@@ -1,8 +1,16 @@
 
 
 
 
 
 
 
 
1
  # Alpha Factory
2
 
3
  **LLM-assisted alpha expression generator for WorldQuant BRAIN.**
4
 
5
- > ⚠️ This is a **prototype tool**, not a production system. It generates candidate expressions for manual review and BRAIN submission.
6
 
7
  ## What It Actually Does
8
 
@@ -12,16 +20,12 @@
12
  4. **Lints the expression** (validates 71 operators, checks arity, parentheses, look-ahead, coverage)
13
  5. **Stores in DuckDB** for review
14
 
15
- That's it. Steps 1-5 work. Everything below is scaffolding for future development.
16
-
17
  ## What Does NOT Work Yet
18
 
19
- - ❌ BRAIN API submission (no client connected)
20
- - ❌ Crowd Scout / Performance Surgeon / Gatekeeper personas (imported but never called)
21
- - ❌ RAG over arXiv papers (stub only)
22
- - ❌ Local BRAIN simulator (exists but not wired into pipeline)
23
- - ❌ Feedback loop / evolutionary improvement
24
- - ❌ Automatic iteration on near-misses
25
 
26
  ## Quickstart
27
 
@@ -36,90 +40,38 @@ Create `.env`:
36
  HF_TOKEN=hf_your_token_here
37
  ```
38
 
39
- ### Option 1: Proven Templates (RECOMMENDED β€” no LLM, guaranteed valid)
40
 
41
  ```bash
42
- uv run python alpha_factory/generate_proven.py
43
  ```
44
 
45
- This uses your proven Alpha 15/6 structures with novel AC=0 fields. **Every expression is syntactically valid and ready to paste into BRAIN.**
46
 
47
- ### Option 2: LLM-Assisted Generation
48
 
49
  ```bash
50
  uv run python -m alpha_factory.run --dry-run --batch-size 5
51
  ```
52
 
53
- Uses HuggingFace Inference API (Qwen 7B) to generate novel hypotheses. Quality varies β€” always lint-check before submitting to BRAIN.
54
-
55
- ### Option 3: Gradio UI
56
 
57
  ```bash
58
  uv run python -m alpha_factory.ui
59
  ```
60
 
61
- View generated alphas with timestamps, copy expressions, generate new batches from browser.
62
-
63
- ## Architecture
64
-
65
- ```
66
- alpha_factory/
67
- β”œβ”€β”€ data/ # BRAIN field registry (3,447 candidates), operators, groups
68
- β”‚ β”œβ”€β”€ brain_fields.py # 35 highest-EV fields with AC, coverage, sign metadata
69
- β”‚ β”œβ”€β”€ brain_groups.py # 15 novel neutralization keys (AC 3-20)
70
- β”‚ └── __init__.py
71
- β”œβ”€β”€ deterministic/ # No LLM required
72
- β”‚ β”œβ”€β”€ lint.py # 71-operator validation + arity checks
73
- β”‚ β”œβ”€β”€ theme_sampler.py # Gap analysis across 12 themes
74
- β”‚ β”œβ”€β”€ proven_templates.py # Alpha 15/6 structure with field swaps
75
- β”‚ β”œβ”€β”€ expression_mutator.py # 5 mutation operators for iteration
76
- β”‚ └── fitness.py # Composite fitness scoring
77
- β”œβ”€β”€ personas/ # LLM-powered agents
78
- β”‚ β”œβ”€β”€ hypothesis_hunter.py # Generates blueprints (ACTIVE)
79
- β”‚ β”œβ”€β”€ expression_compiler.py # Blueprint β†’ BRAIN expression (ACTIVE)
80
- β”‚ β”œβ”€β”€ crowd_scout.py # Novelty check (NOT WIRED)
81
- β”‚ β”œβ”€β”€ performance_surgeon.py # Failure diagnosis (NOT WIRED)
82
- β”‚ └── gatekeeper.py # Final go/no-go (NOT WIRED)
83
- β”œβ”€β”€ infra/ # Infrastructure
84
- β”‚ β”œβ”€β”€ llm_client.py # Unified Ollama/HF client
85
- β”‚ β”œβ”€β”€ factor_store.py # DuckDB persistence
86
- β”‚ β”œβ”€β”€ model_manager.py # Auto-discovers available models
87
- β”‚ β”œβ”€β”€ winner_memory.py # Feedback loop storage (NOT WIRED)
88
- β”‚ └── wq_client.py # BRAIN API wrapper (NOT CONNECTED)
89
- β”œβ”€β”€ orchestration/
90
- β”‚ └── pipeline.py # Main pipeline (steps 1-5 only)
91
- β”œβ”€β”€ run.py # CLI entry point
92
- β”œβ”€β”€ ui.py # Gradio dashboard
93
- └── generate_proven.py # Standalone proven template generator
94
- ```
95
-
96
- ## Field Strategy
97
-
98
- The pipeline prioritizes fields by expected value:
99
-
100
- | Tier | Dataset | Density (Ξ±/field) | Strategy |
101
- |------|---------|-------------------|----------|
102
- | 1 | model77 | 24 | Primary target β€” 5 fields with AC=0 globally |
103
- | 2 | model16, news12 | 192-385 | Secondary β€” score derivatives |
104
- | 3 | analyst4, option9, pv13 | 656-822 | Tertiary β€” supply chain, PCR |
105
- | 4 | pv1, socialmedia | 2500-64350 | Avoid β€” over-mined |
106
-
107
  ## BRAIN Submission Settings
108
 
109
- When pasting expressions into BRAIN manually:
110
- - Region: USA
111
- - Universe: TOP3000
112
- - Delay: 1
113
- - Decay: 5
114
- - Truncation: 0.08
115
- - Pasteurization: ON
116
- - NaN Handling: OFF
117
 
118
- ## Requirements
119
 
120
- - Python 3.11+
121
- - HuggingFace token (free tier works for Qwen 7B)
122
- - Optional: Ollama for local inference
 
 
123
 
124
  ## License
125
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - quantitative-finance
5
+ - alpha-generation
6
+ - worldquant-brain
7
+ ---
8
+
9
  # Alpha Factory
10
 
11
  **LLM-assisted alpha expression generator for WorldQuant BRAIN.**
12
 
13
+ > This is a Python application, not a model. It generates candidate BRAIN expressions for manual review and submission.
14
 
15
  ## What It Actually Does
16
 
 
20
  4. **Lints the expression** (validates 71 operators, checks arity, parentheses, look-ahead, coverage)
21
  5. **Stores in DuckDB** for review
22
 
 
 
23
  ## What Does NOT Work Yet
24
 
25
+ - BRAIN API submission (no client connected β€” manual paste required)
26
+ - Crowd Scout / Performance Surgeon / Gatekeeper personas (stub only)
27
+ - RAG over arXiv papers (stub only)
28
+ - Feedback loop / evolutionary improvement
 
 
29
 
30
  ## Quickstart
31
 
 
40
  HF_TOKEN=hf_your_token_here
41
  ```
42
 
43
+ ### Proven Templates (RECOMMENDED β€” no LLM, guaranteed valid)
44
 
45
  ```bash
46
+ uv run python -m alpha_factory.run --proven --batch-size 10
47
  ```
48
 
49
+ Uses proven Alpha 15/6 structures with novel AC=0 fields. Every expression is syntactically valid.
50
 
51
+ ### LLM-Assisted Generation
52
 
53
  ```bash
54
  uv run python -m alpha_factory.run --dry-run --batch-size 5
55
  ```
56
 
57
+ ### Gradio UI
 
 
58
 
59
  ```bash
60
  uv run python -m alpha_factory.ui
61
  ```
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  ## BRAIN Submission Settings
64
 
65
+ When pasting expressions into BRAIN:
66
+ - Region: USA | Universe: TOP3000 | Delay: 1 | Decay: 5 | Truncation: 0.08
 
 
 
 
 
 
67
 
68
+ ## Field Strategy
69
 
70
+ | Tier | Dataset | Density | Notes |
71
+ |------|---------|---------|-------|
72
+ | 1 | model77 | 24 Ξ±/field | 5 fields with AC=0 globally |
73
+ | 2 | model16, news12 | 192-385 | Score derivatives |
74
+ | 3 | analyst4, option9, pv13 | 656-822 | Supply chain, PCR |
75
 
76
  ## License
77