gaurv007 commited on
Commit
af04321
Β·
verified Β·
1 Parent(s): 97d11e0

Add README with setup and architecture docs

Browse files
Files changed (1) hide show
  1. README.md +81 -17
README.md CHANGED
@@ -1,26 +1,90 @@
1
- ---
2
- tags:
3
- - ml-intern
4
- ---
5
 
6
- # gaurv007/alpha-factory
7
 
8
- <!-- ml-intern-provenance -->
9
- ## Generated by ML Intern
10
 
11
- This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.
 
 
 
12
 
13
- - Try ML Intern: https://smolagents-ml-intern.hf.space
14
- - Source code: https://github.com/huggingface/ml-intern
 
15
 
16
- ## Usage
 
17
 
18
- ```python
19
- from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
- model_id = "gaurv007/alpha-factory"
22
- tokenizer = AutoTokenizer.from_pretrained(model_id)
23
- model = AutoModelForCausalLM.from_pretrained(model_id)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ```
25
 
26
- For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Alpha Factory β€” Open-Source LLM-Driven Pipeline for WorldQuant BRAIN
 
 
 
2
 
3
+ Autonomous alpha generation system using multi-LLM agents with 7-layer acceptance engineering.
4
 
5
+ ## Quick Start
 
6
 
7
+ ```bash
8
+ git clone https://huggingface.co/gaurv007/alpha-factory
9
+ cd alpha-factory
10
+ pip install -e .
11
 
12
+ # Start LLM server (pick one)
13
+ ollama serve # then: ollama pull qwen2.5:1.5b && ollama pull qwen2.5:7b
14
+ # OR: vllm serve Qwen/Qwen2.5-7B-Instruct --port 8000
15
 
16
+ # Dry run (no BRAIN credits spent)
17
+ python -m alpha_factory.run --dry-run --batch-size 5
18
 
19
+ # Live run (requires BRAIN API credentials)
20
+ python -m alpha_factory.run --batch-size 10
21
+ ```
22
+
23
+ ## Architecture
24
+
25
+ ```
26
+ Theme Sampler β†’ Hypothesis Hunter (Microfish) β†’ Expression Compiler (Jinja/Tinyfish)
27
+ β†’ Static Lint β†’ Dedup β†’ BRAIN Submit β†’ Crowd Scout (Mediumfish)
28
+ β†’ Performance Surgeon (Mediumfish) β†’ Gatekeeper (Bigfish) β†’ Portfolio
29
+ ```
30
+
31
+ ## 6 LLM Personas
32
+
33
+ | # | Persona | Model | Job |
34
+ |---|---------|-------|-----|
35
+ | 1 | Hypothesis Hunter | 1.5B (Microfish) | Generate novel factor blueprints |
36
+ | 2 | Expression Compiler | 3B (Tinyfish) / Jinja | Convert blueprint to BRAIN expression |
37
+ | 3 | Look-Ahead Sniffer | Deterministic | Static analysis for future leakage |
38
+ | 4 | Crowd Scout | 7B (Mediumfish) | Novelty + correlation check |
39
+ | 5 | Performance Surgeon | 7B (Mediumfish) | Diagnose failures, suggest fixes |
40
+ | 6 | Production Gatekeeper | 72B (Bigfish) | Final go/no-go memo |
41
 
42
+ ## Key Features
43
+
44
+ - Zero recurring cost β€” all LLMs run locally
45
+ - Schema-constrained generation β€” no hallucinated operators
46
+ - 7-layer acceptance engineering β€” saves 60%+ BRAIN credits
47
+ - Deterministic kill switches β€” circuit breakers for runaway pipelines
48
+ - Factor store β€” DuckDB persistence for all alpha history
49
+ - Dead theme registry β€” avoids re-exploring failed themes
50
+
51
+ ## File Structure
52
+
53
+ ```
54
+ alpha_factory/
55
+ β”œβ”€β”€ config.py # All settings (Pydantic)
56
+ β”œβ”€β”€ run.py # Entry point
57
+ β”œβ”€β”€ schemas/ # Typed contracts (Blueprint, Expression, Verdict)
58
+ β”œβ”€β”€ deterministic/
59
+ β”‚ β”œβ”€β”€ lint.py # Static pre-flight checks (Layer 2)
60
+ β”‚ β”œβ”€β”€ theme_sampler.py # Gap analysis (Layer 1)
61
+ β”‚ └── fitness.py # Composite scoring function
62
+ β”œβ”€β”€ infra/
63
+ β”‚ β”œβ”€β”€ llm_client.py # vLLM/Ollama with guided JSON
64
+ β”‚ β”œβ”€β”€ factor_store.py # DuckDB persistence
65
+ β”‚ └── wq_client.py # BRAIN API wrapper
66
+ β”œβ”€β”€ personas/
67
+ β”‚ β”œβ”€β”€ hypothesis_hunter.py # Persona 1 (generation)
68
+ β”‚ β”œβ”€β”€ expression_compiler.py # Persona 2 (compilation)
69
+ β”‚ β”œβ”€β”€ crowd_scout.py # Persona 4 (novelty)
70
+ β”‚ β”œβ”€β”€ performance_surgeon.py # Persona 5 (diagnosis)
71
+ β”‚ └── gatekeeper.py # Persona 6 (final gate)
72
+ └── orchestration/
73
+ └── pipeline.py # Full DAG
74
  ```
75
 
76
+ ## Setup Your Environment
77
+
78
+ 1. Install Ollama: https://ollama.ai
79
+ 2. Pull models: `ollama pull qwen2.5:1.5b && ollama pull qwen2.5:7b`
80
+ 3. Place your `operators.csv` and `fields_USA_TOP3000_D1.csv` in `data/`
81
+ 4. Seed factor store with your 18 existing alphas
82
+ 5. Run: `python -m alpha_factory.run --dry-run`
83
+
84
+ ## Cost
85
+
86
+ | Item | Cost |
87
+ |------|------|
88
+ | Local GPU (RTX 3090/4090) | $0 (already owned) |
89
+ | BRAIN account | $0 (existing) |
90
+ | Monthly running cost | $0 |