gaurv007
/

alpha-factory

@@ -1,7 +1,3 @@
----
-tags:
-- ml-intern
----
 # Alpha Factory — Open-Source LLM-Driven Pipeline for WorldQuant BRAIN
 Autonomous alpha generation system using multi-LLM agents with 7-layer acceptance engineering.
@@ -9,19 +5,41 @@ Autonomous alpha generation system using multi-LLM agents with 7-layer acceptanc
 ## Quick Start
 ```bash
 git clone https://huggingface.co/gaurv007/alpha-factory
 cd alpha-factory
-pip install -e .
-# Start LLM server (pick one)
-ollama serve  # then: ollama pull qwen2.5:1.5b && ollama pull qwen2.5:7b
-# OR: vllm serve Qwen/Qwen2.5-7B-Instruct --port 8000
 # Dry run (no BRAIN credits spent)
-python -m alpha_factory.run --dry-run --batch-size 5
-# Live run (requires BRAIN API credentials)
-python -m alpha_factory.run --batch-size 10
 ```
 ## Architecture
@@ -34,23 +52,33 @@ Theme Sampler → Hypothesis Hunter (Microfish) → Expression Compiler (Jinja/T
 ## 6 LLM Personas
-| # | Persona | Model | Job |
-|---|---------|-------|-----|
-| 1 | Hypothesis Hunter | 1.5B (Microfish) | Generate novel factor blueprints |
-| 2 | Expression Compiler | 3B (Tinyfish) / Jinja | Convert blueprint to BRAIN expression |
 | 3 | Look-Ahead Sniffer | Deterministic | Static analysis for future leakage |
-| 4 | Crowd Scout | 7B (Mediumfish) | Novelty + correlation check |
-| 5 | Performance Surgeon | 7B (Mediumfish) | Diagnose failures, suggest fixes |
-| 6 | Production Gatekeeper | 72B (Bigfish) | Final go/no-go memo |
 ## Key Features
-- Zero recurring cost — all LLMs run locally
 - Schema-constrained generation — no hallucinated operators
 - 7-layer acceptance engineering — saves 60%+ BRAIN credits
 - Deterministic kill switches — circuit breakers for runaway pipelines
 - Factor store — DuckDB persistence for all alpha history
 - Dead theme registry — avoids re-exploring failed themes
 ## File Structure
@@ -58,32 +86,39 @@ Theme Sampler → Hypothesis Hunter (Microfish) → Expression Compiler (Jinja/T
 alpha_factory/
 ├── config.py                  # All settings (Pydantic)
 ├── run.py                     # Entry point
-├── schemas/                   # Typed contracts (Blueprint, Expression, Verdict)
 ├── deterministic/
-│   ├── lint.py                # Static pre-flight checks (Layer 2)
 │   ├── theme_sampler.py       # Gap analysis (Layer 1)
-│   └── fitness.py             # Composite scoring function
 ├── infra/
-│   ├── llm_client.py          # vLLM/Ollama with guided JSON
 │   ├── factor_store.py        # DuckDB persistence
-│   └── wq_client.py           # BRAIN API wrapper
 ├── personas/
-│   ├── hypothesis_hunter.py   # Persona 1 (generation)
-│   ├��─ expression_compiler.py # Persona 2 (compilation)
-│   ├── crowd_scout.py         # Persona 4 (novelty)
-│   ├── performance_surgeon.py # Persona 5 (diagnosis)
-│   └── gatekeeper.py          # Persona 6 (final gate)
 └── orchestration/
     └── pipeline.py            # Full DAG
 ```
-## Setup Your Environment
-1. Install Ollama: https://ollama.ai
-2. Pull models: `ollama pull qwen2.5:1.5b && ollama pull qwen2.5:7b`
-3. Place your `operators.csv` and `fields_USA_TOP3000_D1.csv` in `data/`
-4. Seed factor store with your 18 existing alphas
-5. Run: `python -m alpha_factory.run --dry-run`
 ## Cost
@@ -91,24 +126,5 @@ alpha_factory/
 |------|------|
 | Local GPU (RTX 3090/4090) | $0 (already owned) |
 | BRAIN account | $0 (existing) |
-| Monthly running cost | $0 |
-<!-- ml-intern-provenance -->
-## Generated by ML Intern
-This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.
-- Try ML Intern: https://smolagents-ml-intern.hf.space
-- Source code: https://github.com/huggingface/ml-intern
-## Usage
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_id = "gaurv007/alpha-factory"
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(model_id)
-```
-For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.

 # Alpha Factory — Open-Source LLM-Driven Pipeline for WorldQuant BRAIN
 Autonomous alpha generation system using multi-LLM agents with 7-layer acceptance engineering.
 ## Quick Start
 ```bash
+# Install uv (if not already installed)
+# Windows:
+powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
+# macOS/Linux:
+curl -LsSf https://astral.sh/uv/install.sh | sh
+# Clone
 git clone https://huggingface.co/gaurv007/alpha-factory
 cd alpha-factory
+# Install (uv handles everything — venv, deps, lockfile)
+uv sync
+# With optional RAG support
+uv sync --extra rag
+# With all optional deps
+uv sync --extra all
+# Start Ollama (local LLM server)
+ollama pull qwen2.5:1.5b
+ollama pull qwen2.5:7b
+ollama serve
 # Dry run (no BRAIN credits spent)
+uv run python -m alpha_factory.run --dry-run --batch-size 5
+# Interactive model selection
+uv run python -m alpha_factory.run --interactive --dry-run
+# With HuggingFace cloud models
+uv run python -m alpha_factory.run --hf-token hf_your_token --batch-size 10
+# Run tests
+uv run pytest tests/ -v
 ```
 ## Architecture
 ## 6 LLM Personas
+| # | Persona | Model Tier | Job |
+|---|---------|------------|-----|
+| 1 | Hypothesis Hunter | Microfish (1.5B) | Generate novel factor blueprints |
+| 2 | Expression Compiler | Tinyfish (3B) / Jinja | Convert blueprint to BRAIN expression |
 | 3 | Look-Ahead Sniffer | Deterministic | Static analysis for future leakage |
+| 4 | Crowd Scout | Mediumfish (7B) | Novelty + correlation check |
+| 5 | Performance Surgeon | Mediumfish (7B) | Diagnose failures, suggest fixes |
+| 6 | Production Gatekeeper | Bigfish (14-72B) | Final go/no-go memo |
+## Model Support
+Automatically detects and uses:
+- **Ollama (local)** — auto-detected at localhost:11434
+- **HuggingFace Inference API (cloud)** — set HF_TOKEN env var
+- **vLLM (local/remote)** — any OpenAI-compatible endpoint
+Use `--interactive` flag to manually pick models for each tier from a dropdown.
 ## Key Features
+- Zero recurring cost — all LLMs run locally via Ollama
 - Schema-constrained generation — no hallucinated operators
 - 7-layer acceptance engineering — saves 60%+ BRAIN credits
 - Deterministic kill switches — circuit breakers for runaway pipelines
 - Factor store — DuckDB persistence for all alpha history
 - Dead theme registry — avoids re-exploring failed themes
+- Local BRAIN simulator — triage alphas before spending credits
 ## File Structure
 alpha_factory/
 ├── config.py                  # All settings (Pydantic)
 ├── run.py                     # Entry point
+├── schemas/                   # Typed contracts
 ├── deterministic/
+│   ├── lint.py                # Static pre-flight (Layer 2)
 │   ├── theme_sampler.py       # Gap analysis (Layer 1)
+│   ├── fitness.py             # Composite scoring
+│   ├── regime_tagger.py       # Vol/trend/rate/style regimes
+│   └── acceptance_checklist.py # 14-point checklist
 ├── infra/
+│   ├── model_manager.py       # Ollama + HF auto-detection
+│   ├── llm_client.py          # Unified LLM interface
 │   ├── factor_store.py        # DuckDB persistence
+│   ├── wq_client.py           # BRAIN API wrapper
+│   └── rag.py                 # ChromaDB + arXiv
+├── local/
+│   └── brain_sim.py           # Local BRAIN simulator (Layer 4)
 ├── personas/
+│   ├── hypothesis_hunter.py   # Persona 1
+│   ├── expression_compiler.py # Persona 2
+│   ├── crowd_scout.py         # Persona 4
+│   ├── performance_surgeon.py # Persona 5
+│   └── gatekeeper.py          # Persona 6
 └── orchestration/
     └── pipeline.py            # Full DAG
 ```
+## Setup
+1. Install uv: https://docs.astral.sh/uv/getting-started/installation/
+2. `uv sync`
+3. Install Ollama: https://ollama.ai
+4. Pull models: `ollama pull qwen2.5:1.5b && ollama pull qwen2.5:7b`
+5. Place your `operators.csv` and `fields_USA_TOP3000_D1.csv` in `data/`
+6. Run: `uv run python -m alpha_factory.run --dry-run --interactive`
 ## Cost
 |------|------|
 | Local GPU (RTX 3090/4090) | $0 (already owned) |
 | BRAIN account | $0 (existing) |
+| uv + Ollama + all deps | $0 |
+| Monthly running cost | **$0** |