gaurv007 commited on
Commit
86157cd
Β·
verified Β·
1 Parent(s): cd6d52f

Update README: use uv instead of pip"

Browse files
Files changed (1) hide show
  1. README.md +72 -56
README.md CHANGED
@@ -1,7 +1,3 @@
1
- ---
2
- tags:
3
- - ml-intern
4
- ---
5
  # Alpha Factory β€” Open-Source LLM-Driven Pipeline for WorldQuant BRAIN
6
 
7
  Autonomous alpha generation system using multi-LLM agents with 7-layer acceptance engineering.
@@ -9,19 +5,41 @@ Autonomous alpha generation system using multi-LLM agents with 7-layer acceptanc
9
  ## Quick Start
10
 
11
  ```bash
 
 
 
 
 
 
 
12
  git clone https://huggingface.co/gaurv007/alpha-factory
13
  cd alpha-factory
14
- pip install -e .
15
 
16
- # Start LLM server (pick one)
17
- ollama serve # then: ollama pull qwen2.5:1.5b && ollama pull qwen2.5:7b
18
- # OR: vllm serve Qwen/Qwen2.5-7B-Instruct --port 8000
 
 
 
 
 
 
 
 
 
 
19
 
20
  # Dry run (no BRAIN credits spent)
21
- python -m alpha_factory.run --dry-run --batch-size 5
 
 
 
22
 
23
- # Live run (requires BRAIN API credentials)
24
- python -m alpha_factory.run --batch-size 10
 
 
 
25
  ```
26
 
27
  ## Architecture
@@ -34,23 +52,33 @@ Theme Sampler β†’ Hypothesis Hunter (Microfish) β†’ Expression Compiler (Jinja/T
34
 
35
  ## 6 LLM Personas
36
 
37
- | # | Persona | Model | Job |
38
- |---|---------|-------|-----|
39
- | 1 | Hypothesis Hunter | 1.5B (Microfish) | Generate novel factor blueprints |
40
- | 2 | Expression Compiler | 3B (Tinyfish) / Jinja | Convert blueprint to BRAIN expression |
41
  | 3 | Look-Ahead Sniffer | Deterministic | Static analysis for future leakage |
42
- | 4 | Crowd Scout | 7B (Mediumfish) | Novelty + correlation check |
43
- | 5 | Performance Surgeon | 7B (Mediumfish) | Diagnose failures, suggest fixes |
44
- | 6 | Production Gatekeeper | 72B (Bigfish) | Final go/no-go memo |
 
 
 
 
 
 
 
 
 
45
 
46
  ## Key Features
47
 
48
- - Zero recurring cost β€” all LLMs run locally
49
  - Schema-constrained generation β€” no hallucinated operators
50
  - 7-layer acceptance engineering β€” saves 60%+ BRAIN credits
51
  - Deterministic kill switches β€” circuit breakers for runaway pipelines
52
  - Factor store β€” DuckDB persistence for all alpha history
53
  - Dead theme registry β€” avoids re-exploring failed themes
 
54
 
55
  ## File Structure
56
 
@@ -58,32 +86,39 @@ Theme Sampler β†’ Hypothesis Hunter (Microfish) β†’ Expression Compiler (Jinja/T
58
  alpha_factory/
59
  β”œβ”€β”€ config.py # All settings (Pydantic)
60
  β”œβ”€β”€ run.py # Entry point
61
- β”œβ”€β”€ schemas/ # Typed contracts (Blueprint, Expression, Verdict)
62
  β”œβ”€β”€ deterministic/
63
- β”‚ β”œβ”€β”€ lint.py # Static pre-flight checks (Layer 2)
64
  β”‚ β”œβ”€β”€ theme_sampler.py # Gap analysis (Layer 1)
65
- β”‚ └── fitness.py # Composite scoring function
 
 
66
  β”œβ”€β”€ infra/
67
- β”‚ β”œβ”€β”€ llm_client.py # vLLM/Ollama with guided JSON
 
68
  β”‚ β”œβ”€β”€ factor_store.py # DuckDB persistence
69
- β”‚ └── wq_client.py # BRAIN API wrapper
 
 
 
70
  β”œβ”€β”€ personas/
71
- β”‚ β”œβ”€β”€ hypothesis_hunter.py # Persona 1 (generation)
72
- β”‚ β”œοΏ½οΏ½β”€ expression_compiler.py # Persona 2 (compilation)
73
- β”‚ β”œβ”€β”€ crowd_scout.py # Persona 4 (novelty)
74
- β”‚ β”œβ”€β”€ performance_surgeon.py # Persona 5 (diagnosis)
75
- β”‚ └── gatekeeper.py # Persona 6 (final gate)
76
  └── orchestration/
77
  └── pipeline.py # Full DAG
78
  ```
79
 
80
- ## Setup Your Environment
81
 
82
- 1. Install Ollama: https://ollama.ai
83
- 2. Pull models: `ollama pull qwen2.5:1.5b && ollama pull qwen2.5:7b`
84
- 3. Place your `operators.csv` and `fields_USA_TOP3000_D1.csv` in `data/`
85
- 4. Seed factor store with your 18 existing alphas
86
- 5. Run: `python -m alpha_factory.run --dry-run`
 
87
 
88
  ## Cost
89
 
@@ -91,24 +126,5 @@ alpha_factory/
91
  |------|------|
92
  | Local GPU (RTX 3090/4090) | $0 (already owned) |
93
  | BRAIN account | $0 (existing) |
94
- | Monthly running cost | $0 |
95
-
96
- <!-- ml-intern-provenance -->
97
- ## Generated by ML Intern
98
-
99
- This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.
100
-
101
- - Try ML Intern: https://smolagents-ml-intern.hf.space
102
- - Source code: https://github.com/huggingface/ml-intern
103
-
104
- ## Usage
105
-
106
- ```python
107
- from transformers import AutoModelForCausalLM, AutoTokenizer
108
-
109
- model_id = "gaurv007/alpha-factory"
110
- tokenizer = AutoTokenizer.from_pretrained(model_id)
111
- model = AutoModelForCausalLM.from_pretrained(model_id)
112
- ```
113
-
114
- For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.
 
 
 
 
 
1
  # Alpha Factory β€” Open-Source LLM-Driven Pipeline for WorldQuant BRAIN
2
 
3
  Autonomous alpha generation system using multi-LLM agents with 7-layer acceptance engineering.
 
5
  ## Quick Start
6
 
7
  ```bash
8
+ # Install uv (if not already installed)
9
+ # Windows:
10
+ powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
11
+ # macOS/Linux:
12
+ curl -LsSf https://astral.sh/uv/install.sh | sh
13
+
14
+ # Clone
15
  git clone https://huggingface.co/gaurv007/alpha-factory
16
  cd alpha-factory
 
17
 
18
+ # Install (uv handles everything β€” venv, deps, lockfile)
19
+ uv sync
20
+
21
+ # With optional RAG support
22
+ uv sync --extra rag
23
+
24
+ # With all optional deps
25
+ uv sync --extra all
26
+
27
+ # Start Ollama (local LLM server)
28
+ ollama pull qwen2.5:1.5b
29
+ ollama pull qwen2.5:7b
30
+ ollama serve
31
 
32
  # Dry run (no BRAIN credits spent)
33
+ uv run python -m alpha_factory.run --dry-run --batch-size 5
34
+
35
+ # Interactive model selection
36
+ uv run python -m alpha_factory.run --interactive --dry-run
37
 
38
+ # With HuggingFace cloud models
39
+ uv run python -m alpha_factory.run --hf-token hf_your_token --batch-size 10
40
+
41
+ # Run tests
42
+ uv run pytest tests/ -v
43
  ```
44
 
45
  ## Architecture
 
52
 
53
  ## 6 LLM Personas
54
 
55
+ | # | Persona | Model Tier | Job |
56
+ |---|---------|------------|-----|
57
+ | 1 | Hypothesis Hunter | Microfish (1.5B) | Generate novel factor blueprints |
58
+ | 2 | Expression Compiler | Tinyfish (3B) / Jinja | Convert blueprint to BRAIN expression |
59
  | 3 | Look-Ahead Sniffer | Deterministic | Static analysis for future leakage |
60
+ | 4 | Crowd Scout | Mediumfish (7B) | Novelty + correlation check |
61
+ | 5 | Performance Surgeon | Mediumfish (7B) | Diagnose failures, suggest fixes |
62
+ | 6 | Production Gatekeeper | Bigfish (14-72B) | Final go/no-go memo |
63
+
64
+ ## Model Support
65
+
66
+ Automatically detects and uses:
67
+ - **Ollama (local)** β€” auto-detected at localhost:11434
68
+ - **HuggingFace Inference API (cloud)** β€” set HF_TOKEN env var
69
+ - **vLLM (local/remote)** β€” any OpenAI-compatible endpoint
70
+
71
+ Use `--interactive` flag to manually pick models for each tier from a dropdown.
72
 
73
  ## Key Features
74
 
75
+ - Zero recurring cost β€” all LLMs run locally via Ollama
76
  - Schema-constrained generation β€” no hallucinated operators
77
  - 7-layer acceptance engineering β€” saves 60%+ BRAIN credits
78
  - Deterministic kill switches β€” circuit breakers for runaway pipelines
79
  - Factor store β€” DuckDB persistence for all alpha history
80
  - Dead theme registry β€” avoids re-exploring failed themes
81
+ - Local BRAIN simulator β€” triage alphas before spending credits
82
 
83
  ## File Structure
84
 
 
86
  alpha_factory/
87
  β”œβ”€β”€ config.py # All settings (Pydantic)
88
  β”œβ”€β”€ run.py # Entry point
89
+ β”œβ”€β”€ schemas/ # Typed contracts
90
  β”œβ”€β”€ deterministic/
91
+ β”‚ β”œβ”€β”€ lint.py # Static pre-flight (Layer 2)
92
  β”‚ β”œβ”€β”€ theme_sampler.py # Gap analysis (Layer 1)
93
+ β”‚ β”œβ”€β”€ fitness.py # Composite scoring
94
+ β”‚ β”œβ”€β”€ regime_tagger.py # Vol/trend/rate/style regimes
95
+ β”‚ └── acceptance_checklist.py # 14-point checklist
96
  β”œβ”€β”€ infra/
97
+ β”‚ β”œβ”€β”€ model_manager.py # Ollama + HF auto-detection
98
+ β”‚ β”œβ”€β”€ llm_client.py # Unified LLM interface
99
  β”‚ β”œβ”€β”€ factor_store.py # DuckDB persistence
100
+ β”‚ β”œβ”€β”€ wq_client.py # BRAIN API wrapper
101
+ β”‚ └── rag.py # ChromaDB + arXiv
102
+ β”œβ”€β”€ local/
103
+ β”‚ └── brain_sim.py # Local BRAIN simulator (Layer 4)
104
  β”œβ”€β”€ personas/
105
+ β”‚ β”œβ”€β”€ hypothesis_hunter.py # Persona 1
106
+ β”‚ β”œβ”€β”€ expression_compiler.py # Persona 2
107
+ β”‚ β”œβ”€β”€ crowd_scout.py # Persona 4
108
+ β”‚ β”œβ”€β”€ performance_surgeon.py # Persona 5
109
+ β”‚ └── gatekeeper.py # Persona 6
110
  └── orchestration/
111
  └── pipeline.py # Full DAG
112
  ```
113
 
114
+ ## Setup
115
 
116
+ 1. Install uv: https://docs.astral.sh/uv/getting-started/installation/
117
+ 2. `uv sync`
118
+ 3. Install Ollama: https://ollama.ai
119
+ 4. Pull models: `ollama pull qwen2.5:1.5b && ollama pull qwen2.5:7b`
120
+ 5. Place your `operators.csv` and `fields_USA_TOP3000_D1.csv` in `data/`
121
+ 6. Run: `uv run python -m alpha_factory.run --dry-run --interactive`
122
 
123
  ## Cost
124
 
 
126
  |------|------|
127
  | Local GPU (RTX 3090/4090) | $0 (already owned) |
128
  | BRAIN account | $0 (existing) |
129
+ | uv + Ollama + all deps | $0 |
130
+ | Monthly running cost | **$0** |