Spaces:
Sleeping
Sleeping
File size: 9,454 Bytes
629dc15 f2f5519 fc1a00b 629dc15 2b2cc6c 629dc15 441dc65 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c a6e63ea 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c 8efe0a8 2b2cc6c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 | ---
title: Semantic Book Recommender
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.33.1
app_file: app.py
pinned: false
license: mit
---
# Smart Book Recommender π
An intelligent book recommendation system with dual search modes: semantic understanding and flexible literal matching. Features emotional tone analysis, category filtering, and a responsive web interface built with LangChain, ChromaDB, and Gradio.
## π [Try the Live Demo](https://huggingface.co/spaces/nonsodev/semantic-book-recommender)

## β¨ Key Features
### π **Dual Search Modes**
- **Semantic Search**: AI-powered understanding of natural language queries (e.g., "fantasy adventure with magic")
- **Literal Search**: Flexible keyword matching with partial word support (e.g., "harry" β Harry Potter books)
### π― **Smart Filtering**
- **Category Filtering**: Browse by specific book genres
- **Emotional Tone Matching**: Find books by emotional experience (Happy, Surprising, Angry, Suspenseful, Sad)
- **Intelligent Sorting**: Results ranked by relevance and emotional scores
### π¨ **Modern Interface**
- Responsive card-based design with book covers
- Star ratings and reader statistics
- Direct download links when available
- Dark theme optimized for reading
### β‘ **Performance Optimized**
- Cached embedding models for fast startup
- Efficient ChromaDB vector database
- Fallback image handling for missing covers
- Robust error handling and regex search
## Installation
### Prerequisites
- Python 3.8+
- pip package manager
### Quick Setup
1. **Clone the repository**
```bash
git clone https://github.com/nonsodev/semantic-book-recommender.git
cd semantic-book-recommender
```
2. **Install dependencies**
```bash
pip install -r requirements.txt
```
3. **Ensure required data files**
```
βββ final_book_df.csv # Main book dataset
βββ tagged_description.txt # Book descriptions for embedding
βββ chroma_books/ # Vector database (auto-created)
```
4. **Run the application**
```bash
python app.py
```
## Usage Guide
### Search Modes
#### π§ **Semantic Search**
Perfect for describing what you want in natural language:
- "Dark fantasy with dragons and magic"
- "Romantic comedy set in Paris"
- "Thrilling mystery in Victorian London"
- "Science fiction about artificial intelligence"
#### π€ **Literal Search**
Best for finding specific titles or authors:
- "harry" β finds Harry Potter books
- "tolkien" β finds J.R.R. Tolkien works
- "game thrones" β finds Game of Thrones
- "stephen king" β finds Stephen King novels
### Advanced Features
#### **Category Filtering**
Narrow results by genre:
- Fiction, Non-fiction, Fantasy, Romance, Mystery, etc.
#### **Emotional Tone Matching**
Find books by mood:
- **Happy**: High joy scores
- **Surprising**: High surprise scores
- **Angry**: High anger scores
- **Suspenseful**: High fear scores
- **Sad**: High sadness scores
## How It Works
### π¬ **Semantic Search Engine**
```python
# Uses sentence-transformers for embedding generation
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2",
model_kwargs={'device': 'cpu'},
encode_kwargs={'normalize_embeddings': True}
)
# ChromaDB for efficient similarity search
db_books = Chroma.from_documents(
documents, embedding=embeddings,
collection_name="books", persist_directory="chroma_books"
)
```
### π **Flexible Literal Search**
```python
# Intelligent regex pattern matching
def retrieve_literal_recommendations(query, category=None, tone=None):
# Creates flexible patterns for partial word matching
# Handles special characters and multiple word combinations
# Falls back to simple string matching if regex fails
```
### π **Emotional Intelligence**
Books are analyzed and scored across five emotional dimensions:
- **Joy**: Happiness, humor, uplifting content
- **Surprise**: Plot twists, unexpected elements
- **Anger**: Conflict, tension, dramatic intensity
- **Fear**: Suspense, thriller elements, mystery
- **Sadness**: Emotional depth, tragic elements
### π¨ **Smart UI Components**
```python
def create_book_card_html(row):
# Responsive card design with:
# - Book cover with fallback handling
# - Star ratings visualization
# - Author formatting (handles multiple authors)
# - Truncated descriptions with full content
# - Download links when available
```
## Project Structure
```
semantic-book-recommender/
βββ app.py # Main application (your updated file)
βββ requirements.txt # Python dependencies
βββ final_book_df.csv # Book dataset with metadata
βββ tagged_description.txt # Book descriptions for embedding
βββ chroma_books/ # ChromaDB vector database
βββ demo.png # Interface screenshot
βββ README.md # This file
```
## Configuration
### **Embedding Models**
Switch between models for different performance profiles:
```python
# Fast and efficient (default)
"sentence-transformers/all-MiniLM-L6-v2"
# Higher quality, slower
"sentence-transformers/all-mpnet-base-v2"
# Multilingual support
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
```
### **Search Parameters**
Customize recommendation behavior:
```python
def retrieve_semantic_recommendations(
query: str,
initial_top_k: int = 50, # Initial retrieval size
final_top_k: int = 8, # Final recommendations shown
category: str = None, # Category filter
tone: str = None # Emotional tone filter
)
```
### **UI Customization**
Modify card display and styling:
```python
# Book card dimensions
style="width: 80px; height: 120px"
# Description truncation
-webkit-line-clamp: 4
# Rating display
create_star_rating(rating) # β
β
β
β
β format
```
## Data Schema
### Book Dataset Columns
```python
# Core metadata
'isbn13', 'title_and_subtitle', 'authors', 'categories'
# Visual elements
'thumbnail', 'large_thumbnail'
# Ratings and metrics
'average_rating', 'ratings_count'
# Content
'description'
# Emotional scores
'joy', 'surprise', 'anger', 'fear', 'sadness'
# Access
'url' # Download/purchase links
```
## API Reference
### **Main Functions**
```python
# Semantic search with AI understanding
retrieve_semantic_recommendations(query, category, tone, initial_top_k, final_top_k)
# Literal search with flexible matching
retrieve_literal_recommendations(query, category, tone, final_top_k)
# HTML card generation
create_book_card_html(row)
# Main Gradio interface function
recommend_books(query, category, tone, search_type)
```
## Dependencies
```python
# Core ML and Vector Database
langchain-chroma>=0.1.0
langchain-huggingface>=0.0.3
langchain-community>=0.2.0
sentence-transformers>=2.2.0
# Data Processing
pandas>=1.5.0
numpy>=1.21.0
# Web Interface
gradio>=4.0.0
# Text Processing
regex>=2022.0.0
```
## Performance Tips
### **Startup Optimization**
```python
# Model caching for faster restarts
os.environ["HF_HOME"] = "/tmp/hf_cache"
os.environ["TRANSFORMERS_CACHE"] = "/tmp/hf_cache"
```
### **Search Optimization**
- Use semantic search for exploratory queries
- Use literal search for known titles/authors
- Combine category and tone filters for precision
- Try variations if initial results aren't satisfactory
### **Memory Management**
- ChromaDB persists to disk automatically
- Embeddings cached after first load
- Efficient pandas operations for filtering
## Contributing
1. **Fork** the repository
2. **Create** a feature branch (`git checkout -b feature/amazing-feature`)
3. **Commit** changes (`git commit -m 'Add amazing feature'`)
4. **Push** to branch (`git push origin feature/amazing-feature`)
5. **Open** a Pull Request
### Development Areas
- [ ] Additional emotional dimensions
- [ ] Multi-language support
- [ ] User preference learning
- [ ] Social features (reviews, ratings)
- [ ] Advanced filtering (publication year, page count)
## Troubleshooting
### **Common Issues**
**ChromaDB not found:**
```bash
# The app will auto-create from tagged_description.txt
# Ensure this file exists in the project root
```
**Model download slow:**
```bash
# Models cache automatically after first download
# Subsequent starts will be much faster
```
**No search results:**
```bash
# Try switching between search modes
# Reduce filter constraints (category/tone)
# Use broader search terms
```
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Acknowledgments
- **Sentence Transformers** for powerful embedding models
- **ChromaDB** for efficient vector storage and retrieval
- **Gradio** for creating accessible ML interfaces
- **LangChain** for seamless AI integration
- **HuggingFace** for model hosting and ecosystem
---
## π― Example Queries to Try
### Semantic Search
- "Epic fantasy with complex magic systems"
- "Cozy mystery in a small town setting"
- "Hard science fiction about space exploration"
- "Historical romance during the Regency era"
### Literal Search
- "agatha christie" (find Agatha Christie novels)
- "dune" (find Dune series books)
- "pride prejudice" (find Pride and Prejudice)
- "lord rings" (find Lord of the Rings)
**Happy Reading! πβ¨** |