Spaces:

ktejeshnaidu
/

DocuMind_hf

Running

App Files Files Community

DocuMind_hf / MODEL_CARD.md

ktejeshnaidu

Upload 23 files

f83e60c verified about 2 months ago

preview code

raw

history blame contribute delete

1.66 kB

	# Model Card: DocuMind Enterprise RAG System

	## Model Details
	- Architecture: Retrieval-Augmented Generation (RAG)
	- Embedding Model: `sentence-transformers/all-MiniLM-L6-v2` (Local HuggingFace model)
	- Reranker Model: `cross-encoder/ms-marco-MiniLM-L-6-v2` (Local HuggingFace model)
	- Generation Model: `llama-3.1-8b-instant` (Provided remotely via Groq)
	- Vector Database: ChromaDB (SQLite-backed local instance)

	## Intended Use
	This system is intended as an internal Enterprise assistant. Its primary function is to answer employee, legal, and operational inquiries by surfacing facts strictly from the documents provided.

	## Document Parsing Capabilities
	- Supported Formats: `.pdf`, `.docx`, `.txt`
	- Chunking Profile: 512 characters with a 64 character overlap, prioritizing paragraph retention to prevent loss of semantic context.

	## Ethical Considerations & Limitations
	- Hallucination Mitigation: The generation model is strictly prompted to answer "I don't know" if the provided context does not hold the answer. All responses are emitted alongside their explicit sources.
	- Data Privacy: Documents ingested remain on-device/in-network within the ChromaDB instance. However, generated requests and contexts are passed to the Groq API. For strictly confidential environments, replacing Groq with a locally hosted Llama/Mistral node is required.
	- Top-K Limit: The system pulls the 5 most statistically similar chunks and uses a CrossEncoder to rerank, passing the top 3 items to the LLM. Extremely dispersed information (e.g. "summarize all 50 documents") will result in partial or missing answers.

	# Model Card: DocuMind Enterprise RAG System

	## Model Details
	- Architecture: Retrieval-Augmented Generation (RAG)
	- Embedding Model: `sentence-transformers/all-MiniLM-L6-v2` (Local HuggingFace model)
	- Reranker Model: `cross-encoder/ms-marco-MiniLM-L-6-v2` (Local HuggingFace model)
	- Generation Model: `llama-3.1-8b-instant` (Provided remotely via Groq)
	- Vector Database: ChromaDB (SQLite-backed local instance)

	## Intended Use
	This system is intended as an internal Enterprise assistant. Its primary function is to answer employee, legal, and operational inquiries by surfacing facts strictly from the documents provided.

	## Document Parsing Capabilities
	- Supported Formats: `.pdf`, `.docx`, `.txt`
	- Chunking Profile: 512 characters with a 64 character overlap, prioritizing paragraph retention to prevent loss of semantic context.

	## Ethical Considerations & Limitations
	- Hallucination Mitigation: The generation model is strictly prompted to answer "I don't know" if the provided context does not hold the answer. All responses are emitted alongside their explicit sources.
	- Data Privacy: Documents ingested remain on-device/in-network within the ChromaDB instance. However, generated requests and contexts are passed to the Groq API. For strictly confidential environments, replacing Groq with a locally hosted Llama/Mistral node is required.
	- Top-K Limit: The system pulls the 5 most statistically similar chunks and uses a CrossEncoder to rerank, passing the top 3 items to the LLM. Extremely dispersed information (e.g. "summarize all 50 documents") will result in partial or missing answers.