Add README with setup and architecture docs
Browse files
README.md
CHANGED
|
@@ -1,26 +1,90 @@
|
|
| 1 |
-
--
|
| 2 |
-
tags:
|
| 3 |
-
- ml-intern
|
| 4 |
-
---
|
| 5 |
|
| 6 |
-
|
| 7 |
|
| 8 |
-
|
| 9 |
-
## Generated by ML Intern
|
| 10 |
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
-
|
| 14 |
-
|
|
|
|
| 15 |
|
| 16 |
-
#
|
|
|
|
| 17 |
|
| 18 |
-
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
```
|
| 25 |
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Alpha Factory β Open-Source LLM-Driven Pipeline for WorldQuant BRAIN
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
+
Autonomous alpha generation system using multi-LLM agents with 7-layer acceptance engineering.
|
| 4 |
|
| 5 |
+
## Quick Start
|
|
|
|
| 6 |
|
| 7 |
+
```bash
|
| 8 |
+
git clone https://huggingface.co/gaurv007/alpha-factory
|
| 9 |
+
cd alpha-factory
|
| 10 |
+
pip install -e .
|
| 11 |
|
| 12 |
+
# Start LLM server (pick one)
|
| 13 |
+
ollama serve # then: ollama pull qwen2.5:1.5b && ollama pull qwen2.5:7b
|
| 14 |
+
# OR: vllm serve Qwen/Qwen2.5-7B-Instruct --port 8000
|
| 15 |
|
| 16 |
+
# Dry run (no BRAIN credits spent)
|
| 17 |
+
python -m alpha_factory.run --dry-run --batch-size 5
|
| 18 |
|
| 19 |
+
# Live run (requires BRAIN API credentials)
|
| 20 |
+
python -m alpha_factory.run --batch-size 10
|
| 21 |
+
```
|
| 22 |
+
|
| 23 |
+
## Architecture
|
| 24 |
+
|
| 25 |
+
```
|
| 26 |
+
Theme Sampler β Hypothesis Hunter (Microfish) β Expression Compiler (Jinja/Tinyfish)
|
| 27 |
+
β Static Lint β Dedup β BRAIN Submit β Crowd Scout (Mediumfish)
|
| 28 |
+
β Performance Surgeon (Mediumfish) β Gatekeeper (Bigfish) β Portfolio
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
## 6 LLM Personas
|
| 32 |
+
|
| 33 |
+
| # | Persona | Model | Job |
|
| 34 |
+
|---|---------|-------|-----|
|
| 35 |
+
| 1 | Hypothesis Hunter | 1.5B (Microfish) | Generate novel factor blueprints |
|
| 36 |
+
| 2 | Expression Compiler | 3B (Tinyfish) / Jinja | Convert blueprint to BRAIN expression |
|
| 37 |
+
| 3 | Look-Ahead Sniffer | Deterministic | Static analysis for future leakage |
|
| 38 |
+
| 4 | Crowd Scout | 7B (Mediumfish) | Novelty + correlation check |
|
| 39 |
+
| 5 | Performance Surgeon | 7B (Mediumfish) | Diagnose failures, suggest fixes |
|
| 40 |
+
| 6 | Production Gatekeeper | 72B (Bigfish) | Final go/no-go memo |
|
| 41 |
|
| 42 |
+
## Key Features
|
| 43 |
+
|
| 44 |
+
- Zero recurring cost β all LLMs run locally
|
| 45 |
+
- Schema-constrained generation β no hallucinated operators
|
| 46 |
+
- 7-layer acceptance engineering β saves 60%+ BRAIN credits
|
| 47 |
+
- Deterministic kill switches β circuit breakers for runaway pipelines
|
| 48 |
+
- Factor store β DuckDB persistence for all alpha history
|
| 49 |
+
- Dead theme registry β avoids re-exploring failed themes
|
| 50 |
+
|
| 51 |
+
## File Structure
|
| 52 |
+
|
| 53 |
+
```
|
| 54 |
+
alpha_factory/
|
| 55 |
+
βββ config.py # All settings (Pydantic)
|
| 56 |
+
βββ run.py # Entry point
|
| 57 |
+
βββ schemas/ # Typed contracts (Blueprint, Expression, Verdict)
|
| 58 |
+
βββ deterministic/
|
| 59 |
+
β βββ lint.py # Static pre-flight checks (Layer 2)
|
| 60 |
+
β βββ theme_sampler.py # Gap analysis (Layer 1)
|
| 61 |
+
β βββ fitness.py # Composite scoring function
|
| 62 |
+
βββ infra/
|
| 63 |
+
β βββ llm_client.py # vLLM/Ollama with guided JSON
|
| 64 |
+
β βββ factor_store.py # DuckDB persistence
|
| 65 |
+
β βββ wq_client.py # BRAIN API wrapper
|
| 66 |
+
βββ personas/
|
| 67 |
+
β βββ hypothesis_hunter.py # Persona 1 (generation)
|
| 68 |
+
β βββ expression_compiler.py # Persona 2 (compilation)
|
| 69 |
+
β βββ crowd_scout.py # Persona 4 (novelty)
|
| 70 |
+
β βββ performance_surgeon.py # Persona 5 (diagnosis)
|
| 71 |
+
β βββ gatekeeper.py # Persona 6 (final gate)
|
| 72 |
+
βββ orchestration/
|
| 73 |
+
βββ pipeline.py # Full DAG
|
| 74 |
```
|
| 75 |
|
| 76 |
+
## Setup Your Environment
|
| 77 |
+
|
| 78 |
+
1. Install Ollama: https://ollama.ai
|
| 79 |
+
2. Pull models: `ollama pull qwen2.5:1.5b && ollama pull qwen2.5:7b`
|
| 80 |
+
3. Place your `operators.csv` and `fields_USA_TOP3000_D1.csv` in `data/`
|
| 81 |
+
4. Seed factor store with your 18 existing alphas
|
| 82 |
+
5. Run: `python -m alpha_factory.run --dry-run`
|
| 83 |
+
|
| 84 |
+
## Cost
|
| 85 |
+
|
| 86 |
+
| Item | Cost |
|
| 87 |
+
|------|------|
|
| 88 |
+
| Local GPU (RTX 3090/4090) | $0 (already owned) |
|
| 89 |
+
| BRAIN account | $0 (existing) |
|
| 90 |
+
| Monthly running cost | $0 |
|