# Model Card: DocuMind Enterprise RAG System ## Model Details - **Architecture**: Retrieval-Augmented Generation (RAG) - **Embedding Model**: `sentence-transformers/all-MiniLM-L6-v2` (Local HuggingFace model) - **Reranker Model**: `cross-encoder/ms-marco-MiniLM-L-6-v2` (Local HuggingFace model) - **Generation Model**: `llama-3.1-8b-instant` (Provided remotely via Groq) - **Vector Database**: ChromaDB (SQLite-backed local instance) ## Intended Use This system is intended as an internal Enterprise assistant. Its primary function is to answer employee, legal, and operational inquiries by surfacing facts *strictly* from the documents provided. ## Document Parsing Capabilities - **Supported Formats**: `.pdf`, `.docx`, `.txt` - **Chunking Profile**: 512 characters with a 64 character overlap, prioritizing paragraph retention to prevent loss of semantic context. ## Ethical Considerations & Limitations - **Hallucination Mitigation**: The generation model is strictly prompted to answer "I don't know" if the provided context does not hold the answer. All responses are emitted alongside their explicit sources. - **Data Privacy**: Documents ingested remain on-device/in-network within the ChromaDB instance. However, generated requests and contexts are passed to the Groq API. For strictly confidential environments, replacing Groq with a locally hosted Llama/Mistral node is required. - **Top-K Limit**: The system pulls the 5 most statistically similar chunks and uses a CrossEncoder to rerank, passing the top 3 items to the LLM. Extremely dispersed information (e.g. "summarize all 50 documents") will result in partial or missing answers.