File size: 4,569 Bytes
f866820
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
title: RAG Document Assistant
emoji: πŸ”’
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
license: mit
app_port: 7860
short_description: Privacy-first document search with zero storage
---

# RAG Document Assistant

**Privacy-first document search. Your data never leaves your device.**

[![Privacy](https://img.shields.io/badge/Privacy-Zero%20Storage-green)](#privacy-first-architecture)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

| Resource | Link |
|----------|------|
| Live Demo | [rag-document-assistant.vercel.app](https://rag-document-assistant.vercel.app/) |
| Product Demo Video | [Pre-recorded Demo](https://github.com/vn6295337/RAG-document-assistant/issues/2) |
| Business Guide | [BUSINESS_README.md](BUSINESS_README.md) |

---

## Privacy-First Architecture

```
INDEXING (one-time)
───────────────────────────────────────────────────────────
Your Device                           Server
───────────────────────────────────────────────────────────
  Dropbox ──→ Files loaded
              in browser
                 β”‚
                 β–Ό
           Text chunked ─────────────→ Embeddings +
           locally                     file positions only
                 β”‚                     (no text stored)
                 β–Ό
           Original text
           PURGED βœ“
───────────────────────────────────────────────────────────

QUERY TIME (every search)
───────────────────────────────────────────────────────────
Your Question ──→ Find matching ──→ Re-fetch text
                  embeddings        from YOUR Dropbox
                       β”‚                  β”‚
                       β–Ό                  β–Ό
                  File paths ───→ Extract chunks ──→ Answer
                  + positions     using positions    generated
───────────────────────────────────────────────────────────
```

### True Zero-Storage Privacy

1. **Client-Side Chunking**: Documents are read and chunked entirely in your browser
2. **Embeddings Only**: Only mathematical vectors are stored (irreversible)
3. **No Text Stored**: Only file paths and character positions are kept
4. **Query-Time Re-fetch**: Text is retrieved fresh from YOUR Dropbox for each query
5. **You Control Access**: Disconnect Dropbox = queries stop working = your data stays yours

## How It Works

1. **Connect** - Link your Dropbox account (OAuth - we never see your password)
2. **Select** - Choose files to index (.txt, .md, .pdf up to 5 MB)
3. **Process** - Text is chunked and embedded in your browser
4. **Search** - Query your documents with natural language
5. **Answer** - Get cited responses from your indexed content

## What Gets Stored

| Data | Stored? | Where |
|------|---------|-------|
| Your files | No | Stay in YOUR Dropbox |
| Document text | No | Re-fetched at query time |
| Embeddings | Yes | Pinecone (encrypted) |
| File paths | Yes | Pinecone metadata |
| Chunk positions | Yes | Pinecone metadata |
| Queries | No | Not logged |

Embeddings are mathematical vectors that cannot be reversed to reconstruct text. File paths and positions are used to re-fetch the exact text from your Dropbox when you search.

## Quick Start

```bash
git clone https://github.com/vn6295337/RAG-document-assistant.git
cd RAG-document-assistant

# Backend
pip install -r requirements.txt
uvicorn src.api.main:app --reload

# Frontend
cd frontend && npm install && npm run dev
```

## Tech Stack

- **Frontend**: React + Vite + Tailwind CSS
- **Backend**: FastAPI on HuggingFace Spaces
- **Vector DB**: Pinecone (embeddings only)
- **File Source**: Dropbox OAuth
- **LLM**: Multi-provider fallback (Gemini, Groq, OpenRouter)

## Documentation

- [Architecture](docs/architecture.md) - Technical design
- [API Reference](docs/api_reference.md) - Backend endpoints
- [Business Overview](BUSINESS_README.md) - Use cases and value

## License

MIT License - see [LICENSE](LICENSE)