pkgprateek commited on
Commit
e054490
Β·
1 Parent(s): a864c4e

Fix Openrouter model settings

Browse files
Files changed (1) hide show
  1. README.md +188 -37
README.md CHANGED
@@ -11,67 +11,218 @@ pinned: false
11
 
12
  # AI Document Intelligence System
13
 
14
- Upload documents and ask questions using advanced RAG (Retrieval-Augmented Generation) technology. Built with:
15
- - **LangChain** for RAG orchestration
16
- - **ChromaDB** for vector storage
17
- - **BAAI/bge-small-en-v1.5** embeddings for superior retrieval quality
18
- - **Meta Llama 3.2** via HuggingFace Inference API
19
- - **Gradio** for interactive UI
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  ## Features
22
- - Interactive document processing (PDF, DOCX, TXT)
23
- - Context-aware question answering with improved embeddings
24
- - ⚑ Real-time processing and analysis
25
- - Source citation for transparency
26
- - Cloud-ready deployment on HuggingFace Spaces
27
 
28
- ## Setup
 
 
 
 
 
 
 
 
29
 
30
- ### 1. Get HuggingFace Token
31
- 1. Create a free account at [HuggingFace](https://huggingface.co/join)
32
- 2. Go to [Settings β†’ Access Tokens](https://huggingface.co/settings/tokens)
33
- 3. Create a new token with **READ** access
34
- 4. Copy the token
35
 
36
- ### 2. Local Installation
37
 
38
  ```bash
39
- # Clone the repository
40
  git clone https://github.com/pkgprateek/ai-rag-document.git
41
  cd ai-rag-document
42
 
43
  # Create virtual environment
44
  python -m venv venv
45
- source venv/bin/activate # On Windows: venv\Scripts\activate
46
 
47
  # Install dependencies
48
  pip install -r requirements.txt
49
 
50
- # Set up environment variables
51
  cp .env.example .env
52
- # Edit .env and add your HF_TOKEN
 
 
 
 
 
 
 
 
53
 
54
- # Run the application
 
 
55
  python app/main.py
56
  ```
57
 
58
- ### 3. Deploy to HuggingFace Spaces
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
- 1. **Fork or upload this repo to HuggingFace Spaces**
61
- 2. **Add your HF_TOKEN as a Space Secret:**
62
- - Go to your Space Settings β†’ Repository secrets
63
- - Add a new secret: `HF_TOKEN` = your token
64
- 3. **Your app will automatically deploy!**
 
 
 
 
65
 
66
  ## Usage
67
 
68
- 1. Upload a PDF/DOCX/TXT file
69
- 2. Click "Process Document"
70
- 3. Get accurate answers with markdown formatting
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
- ## Technical Details
73
 
74
- - **Embeddings**: BAAI/bge-small-en-v1.5 (significantly better than all-MiniLM-L6-v2)
75
- - **LLM**: Meta Llama-3.2-3B-Instruct via HuggingFace Inference API
76
- - **Vector Store**: ChromaDB with persistent storage
77
- - **Chunking**: Smart text splitting with overlap for context preservation
 
11
 
12
  # AI Document Intelligence System
13
 
14
+ A production-ready document question-answering system built with Retrieval-Augmented Generation (RAG). Upload documents and query them using natural language with citation-backed responses.
15
+
16
+ ## Architecture
17
+
18
+ This system implements a complete RAG pipeline with the following components:
19
+
20
+ **Document Processing**
21
+ - Multi-format support (PDF, DOCX, TXT)
22
+ - Intelligent text chunking with configurable overlap (1000 chars, 200 overlap)
23
+ - Preserves document structure with metadata tracking
24
+
25
+ **Retrieval System**
26
+ - Vector embeddings using BAAI/bge-small-en-v1.5 (384 dimensions)
27
+ - ChromaDB persistent vector store
28
+ - Top-k retrieval (k=4) with semantic similarity search
29
+ - Cosine similarity with L2 normalization
30
+
31
+ **Generation**
32
+ - Google Gemma 3-4B-IT via OpenRouter free tier
33
+ - Temperature: 0.1 for consistent, factual responses
34
+ - Max tokens: 512 for concise answers
35
+ - Hallucination prevention through strict context grounding
36
+
37
+ **Rate Limiting**
38
+ - 10 queries per hour tracked via filesystem-based state
39
+ - Prevents API abuse while maintaining usability
40
+
41
+ ## Technology Stack
42
+
43
+ | Component | Technology | Purpose |
44
+ |-----------|-----------|---------|
45
+ | Framework | LangChain 1.0.7 | RAG orchestration and chaining |
46
+ | Vector DB | ChromaDB 1.3.4 | Persistent vector storage |
47
+ | Embeddings | BAAI/bge-small-en-v1.5 | Semantic text representation |
48
+ | LLM | Google Gemma 3-4B-IT | Answer generation |
49
+ | UI | Gradio 5.49.1 | Interactive web interface |
50
+ | API | OpenRouter | Cost-free LLM access |
51
 
52
  ## Features
 
 
 
 
 
53
 
54
+ - Multi-format document ingestion with automatic format detection
55
+ - Context-aware question answering with source attribution
56
+ - Persistent vector storage (survives restarts)
57
+ - Rate limiting to prevent API abuse
58
+ - Markdown-formatted responses for readability
59
+ - Comprehensive error handling and validation
60
+ - Modular architecture for easy extension
61
+
62
+ ## Local Development
63
 
64
+ ### Prerequisites
65
+ - Python 3.10+
66
+ - pip or conda package manager
67
+ - OpenRouter API key (free tier available)
 
68
 
69
+ ### Installation
70
 
71
  ```bash
72
+ # Clone repository
73
  git clone https://github.com/pkgprateek/ai-rag-document.git
74
  cd ai-rag-document
75
 
76
  # Create virtual environment
77
  python -m venv venv
78
+ source venv/bin/activate # Windows: venv\Scripts\activate
79
 
80
  # Install dependencies
81
  pip install -r requirements.txt
82
 
83
+ # Configure environment
84
  cp .env.example .env
85
+ # Edit .env and add your OPENROUTER_API_KEY
86
+ ```
87
+
88
+ ### Get OpenRouter API Key
89
+
90
+ 1. Visit [OpenRouter](https://openrouter.ai/keys)
91
+ 2. Sign up for a free account
92
+ 3. Generate an API key
93
+ 4. Add to `.env` file: `OPENROUTER_API_KEY=your_key_here`
94
 
95
+ ### Run Application
96
+
97
+ ```bash
98
  python app/main.py
99
  ```
100
 
101
+ The application will start on `http://localhost:7860`
102
+
103
+ ## Deployment to Hugging Face Spaces
104
+
105
+ ### Method 1: Direct Upload
106
+
107
+ 1. Create a new Space on [Hugging Face](https://huggingface.co/new-space)
108
+ 2. Select "Gradio" as SDK
109
+ 3. Upload repository files
110
+ 4. Add repository secret:
111
+ - Navigate to Settings β†’ Repository secrets
112
+ - Create `OPENROUTER_API_KEY` with your API key
113
+ 5. Space will auto-deploy
114
+
115
+ ### Method 2: Git Push
116
 
117
+ ```bash
118
+ # Add Hugging Face remote
119
+ git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME
120
+
121
+ # Push to Hugging Face
122
+ git push hf main
123
+ ```
124
+
125
+ **Important**: Ensure the YAML frontmatter (lines 1-9) remains at the top of README.md for proper Space configuration.
126
 
127
  ## Usage
128
 
129
+ 1. **Upload Document**: Select PDF, DOCX, or TXT file (max recommended: 50MB)
130
+ 2. **Process**: Click "Process Document" to chunk and index
131
+ 3. **Query**: Ask natural language questions about the content
132
+ 4. **Review**: Receive markdown-formatted answers with context
133
+
134
+ ### Example Queries
135
+
136
+ - "What are the main conclusions of this research paper?"
137
+ - "Summarize the key points from section 3"
138
+ - "What methodology was used in this study?"
139
+ - "Extract all mentioned dates and events"
140
+
141
+ ## Project Structure
142
+
143
+ ```
144
+ ai-rag-document/
145
+ β”œβ”€β”€ app/
146
+ β”‚ β”œβ”€β”€ main.py # Gradio UI and application entry
147
+ β”‚ β”œβ”€β”€ rag_pipeline.py # RAG chain implementation
148
+ β”‚ └── document_processor.py # Document parsing and chunking
149
+ β”œβ”€β”€ tests/
150
+ β”‚ β”œβ”€β”€ test_rag_pipeline.py # RAG pipeline tests
151
+ β”‚ β”œβ”€β”€ test_document_processor.py
152
+ β”‚ └── experiments.py # Dev experiments
153
+ β”œβ”€β”€ data/
154
+ β”‚ β”œβ”€β”€ chroma_db/ # Vector DB persistence
155
+ β”‚ └── rate_limit.json # Query rate tracking
156
+ β”œβ”€β”€ requirements.txt
157
+ β”œβ”€β”€ .env.example
158
+ └── README.md
159
+ ```
160
+
161
+ ## Technical Implementation Details
162
+
163
+ ### Text Chunking Strategy
164
+
165
+ Uses `RecursiveCharacterTextSplitter` with:
166
+ - **Chunk size**: 1000 characters (balances context vs. precision)
167
+ - **Overlap**: 200 characters (prevents context loss at boundaries)
168
+ - **Metadata preservation**: Tracks source file and document type
169
+
170
+ ### Embedding Model Selection
171
+
172
+ BAAI/bge-small-en-v1.5 chosen for:
173
+ - Superior performance on MTEB benchmark vs. all-MiniLM-L6-v2
174
+ - 384-dimension vectors (compact yet effective)
175
+ - Instruction-tuned for retrieval tasks
176
+ - L2 normalization for cosine similarity
177
+
178
+ ### LLM Configuration
179
+
180
+ Google Gemma 3-4B-IT via OpenRouter:
181
+ - **Free tier**: No cost, suitable for demos and light production
182
+ - **Temperature 0.1**: Reduces hallucination, increases factuality
183
+ - **Max tokens 512**: Concise answers, faster responses
184
+ - **OpenRouter benefits**: Unified API, no vendor lock-in
185
+
186
+ ### Prompt Engineering
187
+
188
+ The system uses a carefully designed prompt:
189
+ - Explicit instruction against hallucination
190
+ - Context grounding requirement
191
+ - Markdown formatting for readability
192
+ - Fallback response for insufficient context
193
+
194
+ ## Testing
195
+
196
+ ```bash
197
+ # Run tests
198
+ python -m pytest tests/
199
+
200
+ # Run specific test
201
+ python -m pytest tests/test_rag_pipeline.py -v
202
+ ```
203
+
204
+ ## Limitations and Considerations
205
+
206
+ - **Rate limit**: 10 queries/hour (configurable in `rag_pipeline.py`)
207
+ - **Document size**: Large files (>100MB) may cause memory issues
208
+ - **Context window**: Limited to 4 retrieved chunks per query
209
+ - **Free tier**: OpenRouter free tier has usage limits
210
+
211
+ ## Future Enhancements
212
+
213
+ - Multi-document cross-referencing
214
+ - Conversation history for follow-up questions
215
+ - Hybrid search (semantic + keyword)
216
+ - Advanced chunking strategies (semantic chunking)
217
+ - Support for images and tables (multimodal RAG)
218
+ - User authentication and document management
219
+
220
+ ## License
221
+
222
+ This project is open source and available for portfolio and educational purposes.
223
 
224
+ ## Contact
225
 
226
+ **Prateek Kumar Goel**
227
+ - GitHub: [@pkgprateek](https://github.com/pkgprateek)
228
+ - Project deployed on [Hugging Face Spaces](https://huggingface.co/spaces)