gaurv007
/

alpha-factory

@@ -1,26 +1,90 @@
----
-tags:
-- ml-intern
----
-# gaurv007/alpha-factory
-<!-- ml-intern-provenance -->
-## Generated by ML Intern
-This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.
-- Try ML Intern: https://smolagents-ml-intern.hf.space
-- Source code: https://github.com/huggingface/ml-intern
-## Usage
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_id = "gaurv007/alpha-factory"
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(model_id)
 ```
-For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.

+# Alpha Factory — Open-Source LLM-Driven Pipeline for WorldQuant BRAIN
+Autonomous alpha generation system using multi-LLM agents with 7-layer acceptance engineering.
+## Quick Start
+```bash
+git clone https://huggingface.co/gaurv007/alpha-factory
+cd alpha-factory
+pip install -e .
+# Start LLM server (pick one)
+ollama serve  # then: ollama pull qwen2.5:1.5b && ollama pull qwen2.5:7b
+# OR: vllm serve Qwen/Qwen2.5-7B-Instruct --port 8000
+# Dry run (no BRAIN credits spent)
+python -m alpha_factory.run --dry-run --batch-size 5
+# Live run (requires BRAIN API credentials)
+python -m alpha_factory.run --batch-size 10
+```
+## Architecture
+```
+Theme Sampler → Hypothesis Hunter (Microfish) → Expression Compiler (Jinja/Tinyfish)
+     → Static Lint → Dedup → BRAIN Submit → Crowd Scout (Mediumfish)
+     → Performance Surgeon (Mediumfish) → Gatekeeper (Bigfish) → Portfolio
+```
+## 6 LLM Personas
+| # | Persona | Model | Job |
+|---|---------|-------|-----|
+| 1 | Hypothesis Hunter | 1.5B (Microfish) | Generate novel factor blueprints |
+| 2 | Expression Compiler | 3B (Tinyfish) / Jinja | Convert blueprint to BRAIN expression |
+| 3 | Look-Ahead Sniffer | Deterministic | Static analysis for future leakage |
+| 4 | Crowd Scout | 7B (Mediumfish) | Novelty + correlation check |
+| 5 | Performance Surgeon | 7B (Mediumfish) | Diagnose failures, suggest fixes |
+| 6 | Production Gatekeeper | 72B (Bigfish) | Final go/no-go memo |
+## Key Features
+- Zero recurring cost — all LLMs run locally
+- Schema-constrained generation — no hallucinated operators
+- 7-layer acceptance engineering — saves 60%+ BRAIN credits
+- Deterministic kill switches — circuit breakers for runaway pipelines
+- Factor store — DuckDB persistence for all alpha history
+- Dead theme registry — avoids re-exploring failed themes
+## File Structure
+```
+alpha_factory/
+├── config.py                  # All settings (Pydantic)
+├── run.py                     # Entry point
+├── schemas/                   # Typed contracts (Blueprint, Expression, Verdict)
+├── deterministic/
+│   ├── lint.py                # Static pre-flight checks (Layer 2)
+│   ├── theme_sampler.py       # Gap analysis (Layer 1)
+│   └── fitness.py             # Composite scoring function
+├── infra/
+│   ├── llm_client.py          # vLLM/Ollama with guided JSON
+│   ├── factor_store.py        # DuckDB persistence
+│   └── wq_client.py           # BRAIN API wrapper
+├── personas/
+│   ├── hypothesis_hunter.py   # Persona 1 (generation)
+│   ├── expression_compiler.py # Persona 2 (compilation)
+│   ├── crowd_scout.py         # Persona 4 (novelty)
+│   ├── performance_surgeon.py # Persona 5 (diagnosis)
+│   └── gatekeeper.py          # Persona 6 (final gate)
+└── orchestration/
+    └── pipeline.py            # Full DAG
 ```
+## Setup Your Environment
+1. Install Ollama: https://ollama.ai
+2. Pull models: `ollama pull qwen2.5:1.5b && ollama pull qwen2.5:7b`
+3. Place your `operators.csv` and `fields_USA_TOP3000_D1.csv` in `data/`
+4. Seed factor store with your 18 existing alphas
+5. Run: `python -m alpha_factory.run --dry-run`
+## Cost
+| Item | Cost |
+|------|------|
+| Local GPU (RTX 3090/4090) | $0 (already owned) |
+| BRAIN account | $0 (existing) |
+| Monthly running cost | $0 |