Alpha Factory β€” Open-Source LLM-Driven Pipeline for WorldQuant BRAIN

Autonomous alpha generation system using multi-LLM agents with 7-layer acceptance engineering.

Quick Start

# Install uv (if not already installed)
# Windows:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone
git clone https://huggingface.co/gaurv007/alpha-factory
cd alpha-factory

# Install (uv handles everything β€” venv, deps, lockfile)
uv sync

# With optional RAG support
uv sync --extra rag

# With all optional deps
uv sync --extra all

# Start Ollama (local LLM server)
ollama pull qwen2.5:1.5b
ollama pull qwen2.5:7b
ollama serve

# Dry run (no BRAIN credits spent)
uv run python -m alpha_factory.run --dry-run --batch-size 5

# Interactive model selection
uv run python -m alpha_factory.run --interactive --dry-run

# With HuggingFace cloud models
uv run python -m alpha_factory.run --hf-token hf_your_token --batch-size 10

# Run tests
uv run pytest tests/ -v

Architecture

Theme Sampler β†’ Hypothesis Hunter (Microfish) β†’ Expression Compiler (Jinja/Tinyfish)
     β†’ Static Lint β†’ Dedup β†’ BRAIN Submit β†’ Crowd Scout (Mediumfish)
     β†’ Performance Surgeon (Mediumfish) β†’ Gatekeeper (Bigfish) β†’ Portfolio

6 LLM Personas

# Persona Model Tier Job
1 Hypothesis Hunter Microfish (1.5B) Generate novel factor blueprints
2 Expression Compiler Tinyfish (3B) / Jinja Convert blueprint to BRAIN expression
3 Look-Ahead Sniffer Deterministic Static analysis for future leakage
4 Crowd Scout Mediumfish (7B) Novelty + correlation check
5 Performance Surgeon Mediumfish (7B) Diagnose failures, suggest fixes
6 Production Gatekeeper Bigfish (14-72B) Final go/no-go memo

Model Support

Automatically detects and uses:

  • Ollama (local) β€” auto-detected at localhost:11434
  • HuggingFace Inference API (cloud) β€” set HF_TOKEN env var
  • vLLM (local/remote) β€” any OpenAI-compatible endpoint

Use --interactive flag to manually pick models for each tier from a dropdown.

Key Features

  • Zero recurring cost β€” all LLMs run locally via Ollama
  • Schema-constrained generation β€” no hallucinated operators
  • 7-layer acceptance engineering β€” saves 60%+ BRAIN credits
  • Deterministic kill switches β€” circuit breakers for runaway pipelines
  • Factor store β€” DuckDB persistence for all alpha history
  • Dead theme registry β€” avoids re-exploring failed themes
  • Local BRAIN simulator β€” triage alphas before spending credits

File Structure

alpha_factory/
β”œβ”€β”€ config.py                  # All settings (Pydantic)
β”œβ”€β”€ run.py                     # Entry point
β”œβ”€β”€ schemas/                   # Typed contracts
β”œβ”€β”€ deterministic/
β”‚   β”œβ”€β”€ lint.py                # Static pre-flight (Layer 2)
β”‚   β”œβ”€β”€ theme_sampler.py       # Gap analysis (Layer 1)
β”‚   β”œβ”€β”€ fitness.py             # Composite scoring
β”‚   β”œβ”€β”€ regime_tagger.py       # Vol/trend/rate/style regimes
β”‚   └── acceptance_checklist.py # 14-point checklist
β”œβ”€β”€ infra/
β”‚   β”œβ”€β”€ model_manager.py       # Ollama + HF auto-detection
β”‚   β”œβ”€β”€ llm_client.py          # Unified LLM interface
β”‚   β”œβ”€β”€ factor_store.py        # DuckDB persistence
β”‚   β”œβ”€β”€ wq_client.py           # BRAIN API wrapper
β”‚   └── rag.py                 # ChromaDB + arXiv
β”œβ”€β”€ local/
β”‚   └── brain_sim.py           # Local BRAIN simulator (Layer 4)
β”œβ”€β”€ personas/
β”‚   β”œβ”€β”€ hypothesis_hunter.py   # Persona 1
β”‚   β”œβ”€β”€ expression_compiler.py # Persona 2
β”‚   β”œβ”€β”€ crowd_scout.py         # Persona 4
β”‚   β”œβ”€β”€ performance_surgeon.py # Persona 5
β”‚   └── gatekeeper.py          # Persona 6
└── orchestration/
    └── pipeline.py            # Full DAG

Setup

  1. Install uv: https://docs.astral.sh/uv/getting-started/installation/
  2. uv sync
  3. Install Ollama: https://ollama.ai
  4. Pull models: ollama pull qwen2.5:1.5b && ollama pull qwen2.5:7b
  5. Place your operators.csv and fields_USA_TOP3000_D1.csv in data/
  6. Run: uv run python -m alpha_factory.run --dry-run --interactive

Cost

Item Cost
Local GPU (RTX 3090/4090) $0 (already owned)
BRAIN account $0 (existing)
uv + Ollama + all deps $0
Monthly running cost $0

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "gaurv007/alpha-factory"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support