Spaces:

mekosotto
/

hackathon

Running

App Files Files Community

hackathon / data /knowledge_base /README.md

bekir32419

Add project files

c0a7163 5 days ago

preview code

raw

history blame contribute delete

1.36 kB

RAG Knowledge Base

Drop reference documents here (.md, .txt, or .pdf). They are ingested by python -m src.rag.ingest at Docker build time and surfaced to the orchestrator agent via the retrieve_context tool. The container entrypoint also rebuilds the index at startup when a mounted data/ volume does not already contain data/processed/faiss_index/.

Recommended seed set

For a clinical-ML / NeuroBridge demo:

BBB / molecules: Lipinski's Rule of Five (1997, 2001), Pajouhesh & Lenz CNS multiparameter optimization (2005)
MRI / harmonization: Fortin et al. ComBat for cortical thickness (2017), Fortin et al. ComBat for diffusion (2018), Johnson et al. original ComBat (2007, gene expression)
EEG / artifacts: Hyvärinen ICA primer (1999), MNE-Python overview (Gramfort 2013)

Format notes

PDFs work via pypdf. OCR-only PDFs (scanned images) won't extract text; pre-OCR them first.
Markdown is preferred — full text + headers chunk cleanly.
Files are gitignored by default. Mount them via Docker volume in production, or COPY them in via a sub-path before the RUN ingest line.

Re-indexing

After adding/removing files, re-run:

python -m src.rag.ingest

This rewrites data/processed/faiss_index/ from scratch (no incremental update — the index is small enough to rebuild in seconds).