Spaces:
Running
Running
| # Model Card: DocuMind Enterprise RAG System | |
| ## Model Details | |
| - **Architecture**: Retrieval-Augmented Generation (RAG) | |
| - **Embedding Model**: `sentence-transformers/all-MiniLM-L6-v2` (Local HuggingFace model) | |
| - **Reranker Model**: `cross-encoder/ms-marco-MiniLM-L-6-v2` (Local HuggingFace model) | |
| - **Generation Model**: `llama-3.1-8b-instant` (Provided remotely via Groq) | |
| - **Vector Database**: ChromaDB (SQLite-backed local instance) | |
| ## Intended Use | |
| This system is intended as an internal Enterprise assistant. Its primary function is to answer employee, legal, and operational inquiries by surfacing facts *strictly* from the documents provided. | |
| ## Document Parsing Capabilities | |
| - **Supported Formats**: `.pdf`, `.docx`, `.txt` | |
| - **Chunking Profile**: 512 characters with a 64 character overlap, prioritizing paragraph retention to prevent loss of semantic context. | |
| ## Ethical Considerations & Limitations | |
| - **Hallucination Mitigation**: The generation model is strictly prompted to answer "I don't know" if the provided context does not hold the answer. All responses are emitted alongside their explicit sources. | |
| - **Data Privacy**: Documents ingested remain on-device/in-network within the ChromaDB instance. However, generated requests and contexts are passed to the Groq API. For strictly confidential environments, replacing Groq with a locally hosted Llama/Mistral node is required. | |
| - **Top-K Limit**: The system pulls the 5 most statistically similar chunks and uses a CrossEncoder to rerank, passing the top 3 items to the LLM. Extremely dispersed information (e.g. "summarize all 50 documents") will result in partial or missing answers. | |