gaurv007
/

alpha-factory

@@ -1,8 +1,16 @@
 # Alpha Factory
 **LLM-assisted alpha expression generator for WorldQuant BRAIN.**
-> ⚠️ This is a **prototype tool**, not a production system. It generates candidate expressions for manual review and BRAIN submission.
 ## What It Actually Does
@@ -12,16 +20,12 @@
 4. **Lints the expression** (validates 71 operators, checks arity, parentheses, look-ahead, coverage)
 5. **Stores in DuckDB** for review
-That's it. Steps 1-5 work. Everything below is scaffolding for future development.
 ## What Does NOT Work Yet
-- ❌ BRAIN API submission (no client connected)
-- ❌ Crowd Scout / Performance Surgeon / Gatekeeper personas (imported but never called)
-- ❌ RAG over arXiv papers (stub only)
-- ❌ Local BRAIN simulator (exists but not wired into pipeline)
-- ❌ Feedback loop / evolutionary improvement
-- ❌ Automatic iteration on near-misses
 ## Quickstart
@@ -36,90 +40,38 @@ Create `.env`:
 HF_TOKEN=hf_your_token_here
 ```
-### Option 1: Proven Templates (RECOMMENDED — no LLM, guaranteed valid)
 ```bash
-uv run python alpha_factory/generate_proven.py
 ```
-This uses your proven Alpha 15/6 structures with novel AC=0 fields. **Every expression is syntactically valid and ready to paste into BRAIN.**
-### Option 2: LLM-Assisted Generation
 ```bash
 uv run python -m alpha_factory.run --dry-run --batch-size 5
 ```
-Uses HuggingFace Inference API (Qwen 7B) to generate novel hypotheses. Quality varies — always lint-check before submitting to BRAIN.
-### Option 3: Gradio UI
 ```bash
 uv run python -m alpha_factory.ui
 ```
-View generated alphas with timestamps, copy expressions, generate new batches from browser.
-## Architecture
-```
-alpha_factory/
-├── data/                  # BRAIN field registry (3,447 candidates), operators, groups
-│   ├── brain_fields.py    # 35 highest-EV fields with AC, coverage, sign metadata
-│   ├── brain_groups.py    # 15 novel neutralization keys (AC 3-20)
-│   └── __init__.py
-├── deterministic/         # No LLM required
-│   ├── lint.py            # 71-operator validation + arity checks
-│   ├── theme_sampler.py   # Gap analysis across 12 themes
-│   ├── proven_templates.py # Alpha 15/6 structure with field swaps
-│   ├── expression_mutator.py # 5 mutation operators for iteration
-│   └── fitness.py         # Composite fitness scoring
-├── personas/              # LLM-powered agents
-│   ├── hypothesis_hunter.py  # Generates blueprints (ACTIVE)
-│   ├── expression_compiler.py # Blueprint → BRAIN expression (ACTIVE)
-│   ├── crowd_scout.py        # Novelty check (NOT WIRED)
-│   ├── performance_surgeon.py # Failure diagnosis (NOT WIRED)
-│   └── gatekeeper.py         # Final go/no-go (NOT WIRED)
-├── infra/                 # Infrastructure
-│   ├── llm_client.py      # Unified Ollama/HF client
-│   ├── factor_store.py    # DuckDB persistence
-│   ├── model_manager.py   # Auto-discovers available models
-│   ├── winner_memory.py   # Feedback loop storage (NOT WIRED)
-│   └── wq_client.py       # BRAIN API wrapper (NOT CONNECTED)
-├── orchestration/
-│   └── pipeline.py        # Main pipeline (steps 1-5 only)
-├── run.py                 # CLI entry point
-├── ui.py                  # Gradio dashboard
-└── generate_proven.py     # Standalone proven template generator
-```
-## Field Strategy
-The pipeline prioritizes fields by expected value:
-| Tier | Dataset | Density (α/field) | Strategy |
-|------|---------|-------------------|----------|
-| 1 | model77 | 24 | Primary target — 5 fields with AC=0 globally |
-| 2 | model16, news12 | 192-385 | Secondary — score derivatives |
-| 3 | analyst4, option9, pv13 | 656-822 | Tertiary — supply chain, PCR |
-| 4 | pv1, socialmedia | 2500-64350 | Avoid — over-mined |
 ## BRAIN Submission Settings
-When pasting expressions into BRAIN manually:
-- Region: USA
-- Universe: TOP3000
-- Delay: 1
-- Decay: 5
-- Truncation: 0.08
-- Pasteurization: ON
-- NaN Handling: OFF
-## Requirements
-- Python 3.11+
-- HuggingFace token (free tier works for Qwen 7B)
-- Optional: Ollama for local inference
 ## License

+---
+license: mit
+tags:
+- quantitative-finance
+- alpha-generation
+- worldquant-brain
+---
 # Alpha Factory
 **LLM-assisted alpha expression generator for WorldQuant BRAIN.**
+> This is a Python application, not a model. It generates candidate BRAIN expressions for manual review and submission.
 ## What It Actually Does
 4. **Lints the expression** (validates 71 operators, checks arity, parentheses, look-ahead, coverage)
 5. **Stores in DuckDB** for review
 ## What Does NOT Work Yet
+- BRAIN API submission (no client connected — manual paste required)
+- Crowd Scout / Performance Surgeon / Gatekeeper personas (stub only)
+- RAG over arXiv papers (stub only)
+- Feedback loop / evolutionary improvement
 ## Quickstart
 HF_TOKEN=hf_your_token_here
 ```
+### Proven Templates (RECOMMENDED — no LLM, guaranteed valid)
 ```bash
+uv run python -m alpha_factory.run --proven --batch-size 10
 ```
+Uses proven Alpha 15/6 structures with novel AC=0 fields. Every expression is syntactically valid.
+### LLM-Assisted Generation
 ```bash
 uv run python -m alpha_factory.run --dry-run --batch-size 5
 ```
+### Gradio UI
 ```bash
 uv run python -m alpha_factory.ui
 ```
 ## BRAIN Submission Settings
+When pasting expressions into BRAIN:
+- Region: USA | Universe: TOP3000 | Delay: 1 | Decay: 5 | Truncation: 0.08
+## Field Strategy
+| Tier | Dataset | Density | Notes |
+|------|---------|---------|-------|
+| 1 | model77 | 24 α/field | 5 fields with AC=0 globally |
+| 2 | model16, news12 | 192-385 | Score derivatives |
+| 3 | analyst4, option9, pv13 | 656-822 | Supply chain, PCR |
 ## License