---
title: RAG Document Q&A System
emoji: 📚
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.3.0
app_file: app.py
pinned: false
license: mit
---

# 📚 RAG Document Q&A System

A Retrieval-Augmented Generation (RAG) system that answers questions about uploaded PDF documents.

## 🎯 What This Does

1. **Upload** a PDF document
2. **Process** the document (chunks it and creates embeddings)
3. **Ask** questions about the document
4. **Get** accurate answers with source citations

## 🏗️ Architecture
```
User Question → Embedding → Vector Search → Retrieved Chunks → LLM → Answer
```

| Component | Technology |
|-----------|------------|
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 (384 dimensions) |
| Vector Store | FAISS (Facebook AI Similarity Search) |
| Text Splitter | RecursiveCharacterTextSplitter (1000 chars, 200 overlap) |
| LLM | HuggingFaceH4/zephyr-7b-beta via Inference API |
| Framework | LangChain + Gradio |

## 🛠️ Development Challenges

This project encountered several technical challenges during development:

### Challenge 1: LangChain API Changes
**Problem:** Import errors due to LangChain's package restructuring.
```python
# Old (broken)
from langchain.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA

# New (working)
from langchain_community.document_loaders import PyPDFLoader
# RetrievalQA deprecated → use LCEL chains instead
```
**Lesson:** Fast-evolving libraries require checking current documentation.

### Challenge 2: PDF Download Issues
**Problem:** `PdfStreamError: Stream has ended unexpectedly`
**Cause:** Incomplete download due to missing User-Agent header.
**Solution:** Added proper headers to HTTP request.

### Challenge 3: LLM Response Quality
**Problem:** FLAN-T5-Large produced fragment-like responses instead of complete answers.
**Attempted Solutions:**
1. Adjusted generation parameters — minimal improvement
2. Modified prompt format — slight improvement
3. Switched to FLAN-T5-XL — OOM error

**Solution:** Switched to Zephyr-7B-beta, which produces comprehensive answers.

### Challenge 4: Hugging Face Spaces Python 3.13 Migration
**Problem:** Space failed on startup with `ModuleNotFoundError: No module named 'audioop'`
**Cause:** Hugging Face Spaces updated to Python 3.13, which removed the deprecated `audioop` module from the standard library. Gradio 4.x depended on `pydub`, which required `audioop`.
**Solution:** Upgraded to Gradio 6.3.0, which includes Python 3.13 compatibility fixes.

### Challenge 5: Inference API Changes
**Problem:** `InferenceClient.text_generation()` failed with "task not supported" error.
**Cause:** The Hugging Face Inference API routing changed, requiring conversational models to use the `chat_completion` endpoint.
**Solution:** Refactored from raw prompt templates to the structured `chat_completion()` API with message arrays.

## 📝 Limitations

- Only processes PDF documents
- English language only
- Free Inference API has rate limits

## 👤 Author

[Nav772](https://huggingface.co/Nav772) - Built as part of AI Engineering portfolio

## 📚 Related Projects

- [Movie Sentiment Analyzer](https://huggingface.co/spaces/Nav772/movie-sentiment-analyzer)
- [Amazon Review Rating Predictor](https://huggingface.co/spaces/Nav772/amazon-review-rating-predictor)
- [Food Image Classifier](https://huggingface.co/spaces/Nav772/food-image-classifier)
- [Sentiment Model Comparison](https://huggingface.co/spaces/Nav772/sentiment-model-comparison)