title: DocsQA Smart Research Assistant
emoji: 📄
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 7860
DocsQA Smart Research Assistant
This is my take-home submission for the ABSTRABIT AI/ML Engineer assignment: a RAG-powered assistant where users upload PDFs, ask questions, and get grounded answers with citations.
Live Project
- Live app (Railway):
https://docsbot-web-production.up.railway.app - GitHub:
https://github.com/KBaba7/DocsBot - Loom walkthrough: add your link here
What I Built
The app supports authentication, PDF upload (up to 5 files and 10 pages per file), document chunking + vector indexing, and a chat experience that answers from uploaded documents first.
If the uploaded documents are not enough, the agent falls back to web search and cites those sources too.
Stack
- FastAPI + SQLAlchemy
- LangGraph agent
- Groq chat model
- Jina embeddings + Jina reranker
- Supabase Postgres +
pgvector - Railway deployment
How Retrieval Works
Uploaded PDFs are parsed page by page and split into chunks.
Each chunk is stored with metadata (document, page number, chunk index) and embedded into pgvector.
At question time:
- LLM-based document filtering selects relevant documents from user's library
- Vector search retrieves relevant chunks from selected documents
- Jina reranking reorders the retrieved chunks for better final relevance
- The agent answers from those chunks when possible
- If evidence is weak, the agent uses web search and cites external URLs
Chunking Strategy
- Splitter: LangChain
RecursiveCharacterTextSplitter - Chunk size:
1000 - Overlap:
150
Why this setup:
- It prefers breaking on paragraphs and sentence boundaries before falling back to smaller separators.
- It preserves more coherent chunks for contracts, specs, and structured PDFs.
- A smaller overlap keeps recall while reducing duplicated context in retrieval.
Retrieval Approach
I use cosine similarity search in pgvector, then apply Jina reranking for better final ordering.
The system uses an LLM-based retrieval planner to choose:
- the final number of chunks to keep
- the candidate pool to rerank
Those values are clamped to safe bounds before retrieval runs.
The UI shows:
- document name
- page number
- chunk excerpt
for retrieved document sources.
Agent Routing Logic
The agent is prompted to prefer document context first.
- If retrieved document context is sufficient: answer from documents with citations.
- If not sufficient: clearly say docs are insufficient and use web search tool.
This is implemented as tool-based behavior in LangGraph rather than a static fallback message.
Source Citations
Each turn stores/returns source metadata separately from the answer body.
- Vector source cards include:
- document name
- page number
- snippet (short snippet from retrieved chunk)
- Web source cards include:
- title
- URL
Conversation Memory
Conversation history is maintained within session scope, so follow-ups like “tell me more about that” work as expected. The frontend also preserves the visible chat thread per session, so upload-triggered page refreshes do not wipe the current conversation view.
Streaming UX
Answers are streamed into the chat UI progressively.
- the visible response is rendered chunk by chunk
- source cards are attached after the answer completes
- a slight pacing delay is added so the stream feels live to the user
The streaming route is separate from the standard JSON /ask response path.
Bonus Feature
I added hash-based deduplicated ingestion:
- If the same PDF is uploaded again, processing/indexing is reused.
- Access control is still user-scoped via ownership mapping.
Why I chose this:
- saves compute/time,
- avoids duplicate indexing,
- keeps retrieval secure per user.
I also implemented LLM-based document filtering:
- The system sends all user documents (filename, summary, preview) to the LLM
- LLM semantically analyzes and selects only truly relevant documents for the query
- Returns a JSON array of relevant file hashes
- It is not forced to return a capped number of documents
- Fallback returns all candidate document hashes if the LLM call fails
Challenges I Ran Into
- Heavy embedding dependencies made deployment images too large.
- I standardized on Jina API embeddings/reranking to keep the runtime lighter while preserving retrieval quality.
- Source rendering got messy across multiple chat turns.
- I separated answer text from source payloads and extracted sources per turn.
- Intermittent DB DNS/pooler issues during deployment.
- I improved connection handling and standardized Supabase transaction-pooler config.
- UI state was getting lost after document uploads.
- I persisted the active chat thread in session storage so the current conversation remains visible after refresh.
If I Had More Time
- Add conversation history UI to display past chat sessions
- Add automated citation-faithfulness checks
- Add Alembic migrations for cleaner schema evolution
- Add stronger eval/observability for routing and retrieval quality
Local Setup
cp .env.example .env
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
uvicorn app.main:app --reload
Open: http://127.0.0.1:8000
Important Environment Variables
Required:
GROQ_API_KEYSECRET_KEYDATABASE_URLJINA_API_KEY
Embeddings:
JINA_API_BASE(default:https://api.jina.ai/v1/embeddings)JINA_EMBEDDING_MODEL(default:jina-embeddings-v3)JINA_RERANKER_API_BASE(default:https://api.jina.ai/v1/rerank)JINA_RERANKER_MODEL(default:jina-reranker-v3)EMBEDDING_DIMENSIONS(default:1024)RETRIEVAL_K(default minimum final context size:4)RERANK_CANDIDATE_K(default minimum rerank candidate pool:12)
Storage:
STORAGE_BACKEND=local|supabaseSUPABASE_URLSUPABASE_SERVICE_ROLE_KEYSUPABASE_STORAGE_BUCKETSUPABASE_STORAGE_PREFIX
Web search:
WEB_SEARCH_PROVIDER=duckduckgo|tavilyTAVILY_API_KEY(if using Tavily)
Auth:
ACCESS_TOKEN_EXPIRE_MINUTES(default:720)- For local development, lowering this can make login/logout testing easier
API Endpoints
POST /registerPOST /loginPOST /logoutPOST /uploadGET /documentsDELETE /documents/{document_id}GET /documents/{document_id}/pdfPOST /askPOST /ask/stream
Sample Documents
As requested in the assignment, sample PDFs are included in test_documents/.
Railway Deployment
railway login
railway link
railway up
Set the same env vars in Railway service settings before deploying.