phd-research-os-brain / DEPLOYMENT_ARM.md
nkshirsa's picture
Add Snapdragon ARM deployment guide
a8c25ee verified

Deployment Guide: Snapdragon X ARM Laptop (16GB RAM)

Your Hardware

  • Snapdragon X 10-core X1P64100 @ 3.40 GHz
  • 16 GB RAM (ARM-based, no discrete GPU)
  • Windows 11 ARM64

What Works on Your Machine

The Full Application (No GPU Needed)

git clone https://huggingface.co/nkshirsa/phd-research-os-brain
cd phd-research-os-brain
pip install gradio pymupdf
python -m phd_research_os_v2.app

This runs the entire UI: paper ingestion, section-aware parsing, claim extraction (heuristic mode), knowledge graph, conflict detection, scoring, Obsidian export. All on CPU.

Local LLM Inference via Ollama

Ollama runs natively on Windows ARM64 since v0.1.29+.

# Install Ollama (Windows ARM64 supported)
winget install Ollama.Ollama

# Pull a quantized model that fits in 16GB RAM
# Option 1: Qwen2.5 3B (Q4_K_M β‰ˆ 2.5GB RAM)
ollama pull qwen2.5:3b

# Option 2: Phi-3 Mini (Q4_K_M β‰ˆ 2.5GB RAM) - good at structured output
ollama pull phi3:mini

# Option 3: If you want to push it β€” Qwen2.5 7B (Q4_K_M β‰ˆ 5GB RAM)
ollama pull qwen2.5:7b

# Verify it works
ollama run qwen2.5:3b "Extract claims from: The LOD was 0.8 fM."

RAM budget with Ollama running:

Windows + apps:     ~6 GB
Ollama + 3B model:  ~3 GB
Research OS app:    ~1 GB
ChromaDB:           ~0.5 GB
───────────────────────────
Total:              ~10.5 GB of 16 GB β€” comfortable

For 7B models: tighter (~13GB total) but workable if you close other apps.

Connect Ollama to Research OS

Set this environment variable before launching:

$env:OLLAMA_BASE_URL = "http://localhost:11434"

Or use API fallback (cloud, no local model needed):

$env:ANTHROPIC_API_KEY = "sk-ant-..."
# OR
$env:OPENAI_API_KEY = "sk-..."

What Requires Cloud

Model Training

Your machine cannot train. Use one of:

  1. ZeroGPU Space: nkshirsa/phd-research-os-train β€” requires HF PRO ($9/month) for ZeroGPU access
  2. Google Colab: Free T4 GPU. Upload train.py and run.
  3. Any cloud GPU: Lambda, Vast.ai, RunPod β€” run python train.py

After training, the LoRA adapter gets pushed to Hub. Then pull it into Ollama:

# Convert trained adapter to GGUF (on a machine with GPU)
# Then: ollama create research-os-brain -f Modelfile

Marker PDF Parser (Optional Upgrade)

Marker requires PyTorch which is heavy on ARM. The built-in PyMuPDF parser works fine for most papers. Install Marker only if you need better layout detection:

pip install marker-pdf  # May be slow on ARM without GPU

Recommended Daily Workflow

  1. Start Ollama (runs in background): ollama serve
  2. Start Research OS: python -m phd_research_os_v2.app
  3. Open browser: http://localhost:7860
  4. Phase 1: Upload PDFs β†’ system parses into sections
  5. Phase 2: Extract claims (Ollama-powered or heuristic)
  6. Phase 3: Build knowledge graph
  7. Phase 4: Detect conflicts
  8. Phase 5: Rescore with code-computed confidence
  9. Export: Obsidian vault or CSV/JSON download