File size: 9,454 Bytes
629dc15
 
 
 
 
 
f2f5519
fc1a00b
629dc15
 
 
2b2cc6c
 
 
629dc15
441dc65
 
2b2cc6c
8efe0a8
2b2cc6c
8efe0a8
2b2cc6c
 
 
8efe0a8
2b2cc6c
 
 
 
8efe0a8
2b2cc6c
 
 
 
 
8efe0a8
2b2cc6c
 
 
 
 
8efe0a8
 
 
 
2b2cc6c
8efe0a8
 
2b2cc6c
8efe0a8
 
 
2b2cc6c
a6e63ea
8efe0a8
 
 
 
 
 
 
2b2cc6c
 
 
 
 
 
8efe0a8
2b2cc6c
 
 
 
8efe0a8
2b2cc6c
8efe0a8
2b2cc6c
8efe0a8
2b2cc6c
 
 
 
 
 
8efe0a8
2b2cc6c
 
 
 
 
 
8efe0a8
2b2cc6c
8efe0a8
2b2cc6c
 
 
8efe0a8
2b2cc6c
 
 
 
 
 
 
8efe0a8
 
 
2b2cc6c
 
 
 
 
 
 
 
8efe0a8
2b2cc6c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8efe0a8
2b2cc6c
 
 
 
 
 
 
8efe0a8
2b2cc6c
 
 
 
 
 
 
 
 
 
8efe0a8
2b2cc6c
 
 
 
 
 
 
 
 
 
 
 
8efe0a8
 
 
2b2cc6c
 
8efe0a8
 
2b2cc6c
 
8efe0a8
 
2b2cc6c
 
 
 
8efe0a8
 
2b2cc6c
 
8efe0a8
 
 
 
2b2cc6c
 
 
 
8efe0a8
 
 
2b2cc6c
 
 
 
 
 
8efe0a8
2b2cc6c
 
8efe0a8
2b2cc6c
 
 
 
 
 
 
 
 
 
8efe0a8
2b2cc6c
 
8efe0a8
2b2cc6c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8efe0a8
 
 
2b2cc6c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8efe0a8
 
 
2b2cc6c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8efe0a8
 
 
 
 
 
 
2b2cc6c
 
 
 
 
8efe0a8
 
 
2b2cc6c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
---
title: Semantic Book Recommender
emoji: πŸ“š
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.33.1
app_file: app.py
pinned: false
license: mit
---
# Smart Book Recommender πŸ“š

An intelligent book recommendation system with dual search modes: semantic understanding and flexible literal matching. Features emotional tone analysis, category filtering, and a responsive web interface built with LangChain, ChromaDB, and Gradio.

## πŸš€ [Try the Live Demo](https://huggingface.co/spaces/nonsodev/semantic-book-recommender)

![Book Recommender Interface](demo.png)

## ✨ Key Features

### πŸ” **Dual Search Modes**
- **Semantic Search**: AI-powered understanding of natural language queries (e.g., "fantasy adventure with magic")
- **Literal Search**: Flexible keyword matching with partial word support (e.g., "harry" β†’ Harry Potter books)

### 🎯 **Smart Filtering**
- **Category Filtering**: Browse by specific book genres
- **Emotional Tone Matching**: Find books by emotional experience (Happy, Surprising, Angry, Suspenseful, Sad)
- **Intelligent Sorting**: Results ranked by relevance and emotional scores

### 🎨 **Modern Interface**
- Responsive card-based design with book covers
- Star ratings and reader statistics
- Direct download links when available
- Dark theme optimized for reading

### ⚑ **Performance Optimized**
- Cached embedding models for fast startup
- Efficient ChromaDB vector database
- Fallback image handling for missing covers
- Robust error handling and regex search

## Installation

### Prerequisites
- Python 3.8+
- pip package manager

### Quick Setup

1. **Clone the repository**
   ```bash
   git clone https://github.com/nonsodev/semantic-book-recommender.git
   cd semantic-book-recommender
   ```

2. **Install dependencies**
   ```bash
   pip install -r requirements.txt
   ```

3. **Ensure required data files**
   ```
   β”œβ”€β”€ final_book_df.csv          # Main book dataset
   β”œβ”€β”€ tagged_description.txt     # Book descriptions for embedding
   └── chroma_books/             # Vector database (auto-created)
   ```

4. **Run the application**
   ```bash
   python app.py
   ```

## Usage Guide

### Search Modes

#### 🧠 **Semantic Search**
Perfect for describing what you want in natural language:
- "Dark fantasy with dragons and magic"
- "Romantic comedy set in Paris"
- "Thrilling mystery in Victorian London"
- "Science fiction about artificial intelligence"

#### πŸ”€ **Literal Search**
Best for finding specific titles or authors:
- "harry" β†’ finds Harry Potter books
- "tolkien" β†’ finds J.R.R. Tolkien works
- "game thrones" β†’ finds Game of Thrones
- "stephen king" β†’ finds Stephen King novels

### Advanced Features

#### **Category Filtering**
Narrow results by genre:
- Fiction, Non-fiction, Fantasy, Romance, Mystery, etc.

#### **Emotional Tone Matching**
Find books by mood:
- **Happy**: High joy scores
- **Surprising**: High surprise scores  
- **Angry**: High anger scores
- **Suspenseful**: High fear scores
- **Sad**: High sadness scores

## How It Works

### πŸ”¬ **Semantic Search Engine**
```python
# Uses sentence-transformers for embedding generation
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={'device': 'cpu'},
    encode_kwargs={'normalize_embeddings': True}
)

# ChromaDB for efficient similarity search
db_books = Chroma.from_documents(
    documents, embedding=embeddings,
    collection_name="books", persist_directory="chroma_books"
)
```

### πŸ” **Flexible Literal Search**
```python
# Intelligent regex pattern matching
def retrieve_literal_recommendations(query, category=None, tone=None):
    # Creates flexible patterns for partial word matching
    # Handles special characters and multiple word combinations
    # Falls back to simple string matching if regex fails
```

### 🎭 **Emotional Intelligence**
Books are analyzed and scored across five emotional dimensions:
- **Joy**: Happiness, humor, uplifting content
- **Surprise**: Plot twists, unexpected elements
- **Anger**: Conflict, tension, dramatic intensity  
- **Fear**: Suspense, thriller elements, mystery
- **Sadness**: Emotional depth, tragic elements

### 🎨 **Smart UI Components**
```python
def create_book_card_html(row):
    # Responsive card design with:
    # - Book cover with fallback handling
    # - Star ratings visualization  
    # - Author formatting (handles multiple authors)
    # - Truncated descriptions with full content
    # - Download links when available
```

## Project Structure

```
semantic-book-recommender/
β”œβ”€β”€ app.py                      # Main application (your updated file)
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ final_book_df.csv          # Book dataset with metadata
β”œβ”€β”€ tagged_description.txt     # Book descriptions for embedding
β”œβ”€β”€ chroma_books/              # ChromaDB vector database
β”œβ”€β”€ demo.png                   # Interface screenshot
└── README.md                  # This file
```

## Configuration

### **Embedding Models**
Switch between models for different performance profiles:

```python
# Fast and efficient (default)
"sentence-transformers/all-MiniLM-L6-v2"

# Higher quality, slower
"sentence-transformers/all-mpnet-base-v2"  

# Multilingual support
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
```

### **Search Parameters**
Customize recommendation behavior:

```python
def retrieve_semantic_recommendations(
    query: str,
    initial_top_k: int = 50,    # Initial retrieval size
    final_top_k: int = 8,       # Final recommendations shown
    category: str = None,       # Category filter
    tone: str = None           # Emotional tone filter
)
```

### **UI Customization**
Modify card display and styling:

```python
# Book card dimensions
style="width: 80px; height: 120px"

# Description truncation
-webkit-line-clamp: 4

# Rating display
create_star_rating(rating)  # β˜…β˜…β˜…β˜…β˜† format
```

## Data Schema

### Book Dataset Columns
```python
# Core metadata
'isbn13', 'title_and_subtitle', 'authors', 'categories'

# Visual elements  
'thumbnail', 'large_thumbnail'

# Ratings and metrics
'average_rating', 'ratings_count'

# Content
'description'

# Emotional scores
'joy', 'surprise', 'anger', 'fear', 'sadness'

# Access
'url'  # Download/purchase links
```

## API Reference

### **Main Functions**

```python
# Semantic search with AI understanding
retrieve_semantic_recommendations(query, category, tone, initial_top_k, final_top_k)

# Literal search with flexible matching  
retrieve_literal_recommendations(query, category, tone, final_top_k)

# HTML card generation
create_book_card_html(row)

# Main Gradio interface function
recommend_books(query, category, tone, search_type)
```

## Dependencies

```python
# Core ML and Vector Database
langchain-chroma>=0.1.0
langchain-huggingface>=0.0.3  
langchain-community>=0.2.0
sentence-transformers>=2.2.0

# Data Processing
pandas>=1.5.0
numpy>=1.21.0

# Web Interface
gradio>=4.0.0

# Text Processing  
regex>=2022.0.0
```

## Performance Tips

### **Startup Optimization**
```python
# Model caching for faster restarts
os.environ["HF_HOME"] = "/tmp/hf_cache"
os.environ["TRANSFORMERS_CACHE"] = "/tmp/hf_cache"
```

### **Search Optimization**
- Use semantic search for exploratory queries
- Use literal search for known titles/authors
- Combine category and tone filters for precision
- Try variations if initial results aren't satisfactory

### **Memory Management**
- ChromaDB persists to disk automatically
- Embeddings cached after first load
- Efficient pandas operations for filtering

## Contributing

1. **Fork** the repository
2. **Create** a feature branch (`git checkout -b feature/amazing-feature`)
3. **Commit** changes (`git commit -m 'Add amazing feature'`)
4. **Push** to branch (`git push origin feature/amazing-feature`)
5. **Open** a Pull Request

### Development Areas
- [ ] Additional emotional dimensions
- [ ] Multi-language support
- [ ] User preference learning
- [ ] Social features (reviews, ratings)
- [ ] Advanced filtering (publication year, page count)

## Troubleshooting

### **Common Issues**

**ChromaDB not found:**
```bash
# The app will auto-create from tagged_description.txt
# Ensure this file exists in the project root
```

**Model download slow:**
```bash
# Models cache automatically after first download
# Subsequent starts will be much faster
```

**No search results:**
```bash
# Try switching between search modes
# Reduce filter constraints (category/tone)
# Use broader search terms
```

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Acknowledgments

- **Sentence Transformers** for powerful embedding models
- **ChromaDB** for efficient vector storage and retrieval
- **Gradio** for creating accessible ML interfaces
- **LangChain** for seamless AI integration
- **HuggingFace** for model hosting and ecosystem

---

## 🎯 Example Queries to Try

### Semantic Search
- "Epic fantasy with complex magic systems"
- "Cozy mystery in a small town setting"  
- "Hard science fiction about space exploration"
- "Historical romance during the Regency era"

### Literal Search
- "agatha christie" (find Agatha Christie novels)
- "dune" (find Dune series books)
- "pride prejudice" (find Pride and Prejudice)
- "lord rings" (find Lord of the Rings)

**Happy Reading! πŸ“–βœ¨**