| --- |
| license: mit |
| tags: |
| - cancer-genomics |
| - bioinformatics |
| - graph-database |
| - neo4j |
| - distributed-computing |
| - boinc |
| - healthcare |
| - genomics |
| - fastq |
| - blast |
| - variant-calling |
| - gdc-portal |
| - tcga |
| library_name: cancer-at-home-v2 |
| pipeline_tag: other |
| metrics: |
| - accuracy |
| - bleu |
| - bleurt |
| --- |
| |
| # Cancer@Home v2 |
|
|
| A distributed computing platform for cancer genomics research, combining BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization. |
|
|
| ## π Quick Start (5 minutes) |
|
|
| ### Prerequisites |
| - Python 3.8+ |
| - Docker Desktop |
| - 8GB RAM minimum |
|
|
| ### Installation |
|
|
| 1. **Clone and setup** |
| ```bash |
| cd CancerAtHome2 |
| python -m venv venv |
| venv\Scripts\activate # Windows |
| pip install -r requirements.txt |
| ``` |
|
|
| 2. **Start Neo4j Database** |
| ```bash |
| docker-compose up -d |
| ``` |
|
|
| 3. **Run the application** |
| ```bash |
| python run.py |
| ``` |
|
|
| 4. **Open your browser** |
| - Application: http://localhost:5000 |
| - Neo4j Browser: http://localhost:7474 (username: neo4j, password: cancer123) |
|
|
| ## π― Features |
|
|
| ### 1. **Distributed Computing (BOINC Integration)** |
| - Submit cancer research computational tasks |
| - Monitor distributed workload processing |
| - Real-time task status tracking |
|
|
| ### 2. **GDC Data Integration** |
| - Download cancer genomics data from GDC Portal |
| - Support for various cancer types (TCGA, TARGET projects) |
| - Automatic data parsing and normalization |
|
|
| ### 3. **Sequence Analysis Pipeline** |
| - FASTQ file processing |
| - BLAST sequence alignment |
| - Variant calling and annotation |
|
|
| ### 4. **Neo4j Graph Database** |
| - Graph-based cancer data modeling |
| - Relationships: Gene β Mutation β Patient β Cancer Type |
| - Interactive graph visualization |
|
|
| ### 5. **GraphQL API** |
| - Query cancer data flexibly |
| - Filter by gene, mutation, patient cohort |
| - Aggregate statistics |
|
|
| ### 6. **Interactive Dashboard** |
| - Real-time data visualization |
| - Network graphs for gene interactions |
| - Mutation frequency charts |
| - Patient cohort analysis |
|
|
| ## π Architecture |
|
|
| ``` |
| Cancer@Home v2 |
| β |
| βββ Frontend (React + D3.js) |
| β βββ Dashboard |
| β βββ Neo4j Visualization |
| β βββ Task Monitor |
| β |
| βββ Backend (FastAPI) |
| β βββ REST API |
| β βββ GraphQL Endpoint |
| β βββ WebSocket (real-time updates) |
| β |
| βββ Data Layer |
| β βββ Neo4j (Graph Database) |
| β βββ BOINC Client |
| β βββ GDC API Client |
| β |
| βββ Analysis Pipeline |
| βββ FASTQ Parser |
| βββ BLAST Wrapper |
| βββ Variant Annotator |
| ``` |
|
|
| ## ποΈ Project Structure |
|
|
| ``` |
| CancerAtHome2/ |
| βββ backend/ |
| β βββ api/ # FastAPI routes |
| β βββ boinc/ # BOINC integration |
| β βββ gdc/ # GDC data fetcher |
| β βββ neo4j/ # Neo4j database layer |
| β βββ pipeline/ # Bioinformatics pipeline |
| β βββ graphql/ # GraphQL schema |
| βββ frontend/ |
| β βββ public/ |
| β βββ src/ |
| β βββ components/ # React components |
| β βββ views/ # Page views |
| β βββ api/ # API client |
| βββ data/ # Downloaded datasets |
| βββ docker-compose.yml # Neo4j container |
| βββ requirements.txt # Python dependencies |
| βββ run.py # Main entry point |
| ``` |
|
|
| ## 𧬠Data Flow |
|
|
| 1. **Data Ingestion**: Download cancer genomics data from GDC Portal |
| 2. **Processing**: Run FASTQ/BLAST analysis on distributed BOINC network |
| 3. **Storage**: Store results in Neo4j graph database |
| 4. **Visualization**: Query and visualize via web dashboard |
|
|
| ## π§ Configuration |
|
|
| Edit `config.yml` to customize: |
| - Neo4j connection settings |
| - GDC API parameters |
| - BOINC project URL |
| - Analysis pipeline options |
|
|
| ## π Usage Examples |
|
|
| ### Query Mutations by Gene |
| ```graphql |
| query { |
| mutations(gene: "TP53") { |
| id |
| position |
| consequence |
| patients { |
| cancerType |
| stage |
| } |
| } |
| } |
| ``` |
|
|
| ### Submit Analysis Task |
| ```python |
| from backend.boinc import BOINCClient |
| |
| client = BOINCClient() |
| task_id = client.submit_task( |
| workunit_type="variant_calling", |
| input_file="sample.fastq" |
| ) |
| ``` |
|
|
| ## π€ Inspired By |
|
|
| - [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - Distributed cancer research |
| - [Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4) - Graph-based cancer data modeling |
|
|
| ## π License |
|
|
| MIT License |
|
|
| ## π Support |
|
|
| For issues or questions, please open a Huggingface or GitHub issue. |