nonsodev commited on
Commit
2b2cc6c
Β·
1 Parent(s): d2ab819

redeone readme

Browse files
Files changed (2) hide show
  1. README.md +268 -117
  2. demo.png +2 -2
README.md CHANGED
@@ -9,39 +9,48 @@ app_file: app.py
9
  pinned: false
10
  license: mit
11
  ---
 
 
 
12
 
13
  ## πŸš€ [Try the Live Demo](https://huggingface.co/spaces/nonsodev/semantic-book-recommender)
14
 
15
- # Semantic Book Recommender πŸ“š
16
 
17
- A smart book recommendation system that uses semantic search and emotional tone analysis to help users discover their next favorite read. Built with LangChain, ChromaDB, and Gradio for an intuitive web interface.
18
 
19
- ## Features
 
 
20
 
21
- - **Semantic Search**: Uses advanced sentence transformers to understand the meaning behind your book preferences
22
- - **Category Filtering**: Browse recommendations by specific book categories
23
- - **Emotional Tone Matching**: Find books that match your desired emotional experience (Happy, Surprising, Angry, Suspenseful, Sad)
24
- - **Visual Gallery**: Browse recommendations with book covers and detailed descriptions
25
- - **Fast Performance**: Optimized vector database for quick retrieval
26
 
27
- ## Demo
 
 
 
 
28
 
29
- ![Book Recommender Interface](demo.png)
30
-
31
- Simply describe what you're looking for, select your preferred category and emotional tone, and get personalized book recommendations!
 
 
32
 
33
  ## Installation
34
 
35
  ### Prerequisites
36
-
37
- - Python 3.10+
38
  - pip package manager
39
 
40
- ### Setup
41
 
42
  1. **Clone the repository**
43
  ```bash
44
- git clone https://github.com/nonsodev/semantic-book-recommender.git)
45
  cd semantic-book-recommender
46
  ```
47
 
@@ -50,143 +59,270 @@ Simply describe what you're looking for, select your preferred category and emot
50
  pip install -r requirements.txt
51
  ```
52
 
53
- 3. **Ensure data files are present**
54
- - `final_book_df.csv`: Main book dataset with metadata
55
- - `chroma_books/`: ChromaDB vector database directory
56
- - `cover-not-found.jpg`: Placeholder image for missing book covers
57
-
58
- ## Usage
59
 
60
- ### Running the Application
 
 
 
61
 
62
- ```bash
63
- python gradio_dashboard.py
64
- ```
65
 
66
- The application will launch a web interface (typically at `http://localhost:7860`) where you can:
67
 
68
- 1. Enter a description of your ideal book
69
- 2. Select a category (optional)
70
- 3. Choose an emotional tone (optional)
71
- 4. Click "Submit" to get recommendations
 
 
72
 
73
- ### Example Queries
 
 
 
 
 
74
 
75
- - "A thrilling mystery set in Victorian London"
76
- - "Romantic comedy with strong female protagonist"
77
- - "Science fiction about artificial intelligence"
78
- - "Historical fiction during World War II"
79
 
80
- ## Project Structure
 
 
81
 
82
- ```
83
- semantic-book-recommender/
84
- β”œβ”€β”€ gradio_dashboard.py # Main application file
85
- β”œβ”€β”€ requirements.txt # Python dependencies
86
- β”œβ”€β”€ final_book_df.csv # Book dataset
87
- β”œβ”€β”€ cover-not-found.jpg # Placeholder image
88
- β”œβ”€β”€ chroma_books/ # Vector database
89
- β”œβ”€β”€ notebooks/
90
- β”‚ β”œβ”€β”€ data-exploration.ipynb # Data analysis
91
- β”‚ β”œβ”€β”€ download_url.ipynb # Data download utilities
92
- β”‚ β”œβ”€β”€ final_df.ipynb # Data processing
93
- β”‚ β”œβ”€β”€ sentiment_analysis.ipynb # Emotion analysis
94
- β”‚ β”œβ”€β”€ supervised_clean.py # Data cleaning
95
- β”‚ └── test_classification.ipynb # Model testing
96
- └── data/
97
- β”œβ”€β”€ books_cleaned.csv # Processed book data
98
- β”œβ”€β”€ books_with_categories.csv
99
- β”œβ”€β”€ books_with_sentiment.csv
100
- β”œβ”€β”€ books_with_urls.csv
101
- β”œβ”€β”€ search_progress.csv # Processing logs
102
- β”œβ”€β”€ tagged_description.txt # Tagged descriptions
103
- └── to_drop.txt # Items to exclude
104
- ```
105
 
106
  ## How It Works
107
 
108
- ### 1. Semantic Search
109
- - Uses `sentence-transformers/all-MiniLM-L6-v2` for fast, high-quality embeddings
110
- - ChromaDB stores and retrieves similar books based on vector similarity
111
- - Initial retrieval of top 50 matches, refined to top 16 recommendations
 
 
 
 
112
 
113
- ### 2. Filtering & Ranking
114
- - **Category Filter**: Narrows results to specific genres
115
- - **Emotional Tone**: Ranks books by emotion scores (joy, surprise, anger, fear, sadness)
116
- - **Relevance**: Maintains semantic relevance while applying filters
 
 
 
 
 
 
 
 
 
 
 
117
 
118
- ### 3. User Interface
119
- - Clean, modern design using Gradio's Glass theme
120
- - Gallery view with book covers and descriptions
121
- - Responsive layout for different screen sizes
 
 
 
122
 
123
- ## Data Sources
 
 
 
 
 
 
 
 
 
124
 
125
- The book dataset includes:
126
- - **Metadata**: Title, authors, ISBN, categories, publication info
127
- - **Content**: Descriptions, summaries
128
- - **Visual**: Thumbnail images, large cover images
129
- - **Emotional Scores**: Joy, surprise, anger, fear, sadness ratings
 
 
 
 
 
 
 
130
 
131
  ## Configuration
132
 
133
- ### Embedding Models
134
- You can switch between different embedding models in `gradio_dashboard.py`:
135
 
136
  ```python
137
- # Fast and good quality (default)
138
- embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
139
 
140
  # Higher quality, slower
141
- embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
 
 
 
142
  ```
143
 
144
- ### Search Parameters
145
- Adjust recommendation parameters:
146
 
147
  ```python
148
  def retrieve_semantic_recommendations(
149
  query: str,
150
- category: str = None,
151
- tone: str = None,
152
- initial_top_k: int = 50, # Initial retrieval count
153
- final_top_k: int = 16, # Final recommendation count
154
  )
155
  ```
156
 
157
- ## Development
 
 
 
 
 
158
 
159
- ### Adding New Features
 
160
 
161
- 1. **New Emotional Tones**: Add emotion columns to your dataset and update the `tones` list
162
- 2. **Additional Filters**: Extend the filtering logic in `retrieve_semantic_recommendations()`
163
- 3. **UI Improvements**: Modify the Gradio interface in the `dashboard` block
 
 
 
 
 
 
 
164
 
165
- ### Data Processing Pipeline
 
166
 
167
- The project includes several notebooks for data processing:
168
- - Data exploration and cleaning
169
- - Sentiment analysis for emotional scoring
170
- - URL processing for book covers
171
- - Model testing and validation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172
 
173
  ## Dependencies
174
 
175
- Key libraries used:
176
- - **LangChain**: Vector database integration
177
- - **ChromaDB**: Vector storage and similarity search
178
- - **Gradio**: Web interface
179
- - **HuggingFace Transformers**: Sentence embeddings
180
- - **Pandas**: Data manipulation
181
- - **NumPy**: Numerical operations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
182
 
183
  ## Contributing
184
 
185
- 1. Fork the repository
186
- 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
187
- 3. Commit your changes (`git commit -m 'Add amazing feature'`)
188
- 4. Push to the branch (`git push origin feature/amazing-feature`)
189
- 5. Open a Pull Request
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
 
191
  ## License
192
 
@@ -194,11 +330,26 @@ This project is licensed under the MIT License - see the LICENSE file for detail
194
 
195
  ## Acknowledgments
196
 
197
- - Sentence Transformers for powerful embedding models
198
- - ChromaDB for efficient vector storage
199
- - Gradio for making ML interfaces accessible
200
- - The open-source community for book metadata
 
201
 
202
  ---
203
 
204
- **Happy Reading! πŸ“–βœ¨**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  pinned: false
10
  license: mit
11
  ---
12
+ # Smart Book Recommender πŸ“š
13
+
14
+ An intelligent book recommendation system with dual search modes: semantic understanding and flexible literal matching. Features emotional tone analysis, category filtering, and a responsive web interface built with LangChain, ChromaDB, and Gradio.
15
 
16
  ## πŸš€ [Try the Live Demo](https://huggingface.co/spaces/nonsodev/semantic-book-recommender)
17
 
18
+ ![Book Recommender Interface](demo.png)
19
 
20
+ ## ✨ Key Features
21
 
22
+ ### πŸ” **Dual Search Modes**
23
+ - **Semantic Search**: AI-powered understanding of natural language queries (e.g., "fantasy adventure with magic")
24
+ - **Literal Search**: Flexible keyword matching with partial word support (e.g., "harry" β†’ Harry Potter books)
25
 
26
+ ### 🎯 **Smart Filtering**
27
+ - **Category Filtering**: Browse by specific book genres
28
+ - **Emotional Tone Matching**: Find books by emotional experience (Happy, Surprising, Angry, Suspenseful, Sad)
29
+ - **Intelligent Sorting**: Results ranked by relevance and emotional scores
 
30
 
31
+ ### 🎨 **Modern Interface**
32
+ - Responsive card-based design with book covers
33
+ - Star ratings and reader statistics
34
+ - Direct download links when available
35
+ - Dark theme optimized for reading
36
 
37
+ ### ⚑ **Performance Optimized**
38
+ - Cached embedding models for fast startup
39
+ - Efficient ChromaDB vector database
40
+ - Fallback image handling for missing covers
41
+ - Robust error handling and regex search
42
 
43
  ## Installation
44
 
45
  ### Prerequisites
46
+ - Python 3.8+
 
47
  - pip package manager
48
 
49
+ ### Quick Setup
50
 
51
  1. **Clone the repository**
52
  ```bash
53
+ git clone https://github.com/nonsodev/semantic-book-recommender.git
54
  cd semantic-book-recommender
55
  ```
56
 
 
59
  pip install -r requirements.txt
60
  ```
61
 
62
+ 3. **Ensure required data files**
63
+ ```
64
+ β”œβ”€β”€ final_book_df.csv # Main book dataset
65
+ β”œβ”€β”€ tagged_description.txt # Book descriptions for embedding
66
+ └── chroma_books/ # Vector database (auto-created)
67
+ ```
68
 
69
+ 4. **Run the application**
70
+ ```bash
71
+ python app.py
72
+ ```
73
 
74
+ ## Usage Guide
 
 
75
 
76
+ ### Search Modes
77
 
78
+ #### 🧠 **Semantic Search**
79
+ Perfect for describing what you want in natural language:
80
+ - "Dark fantasy with dragons and magic"
81
+ - "Romantic comedy set in Paris"
82
+ - "Thrilling mystery in Victorian London"
83
+ - "Science fiction about artificial intelligence"
84
 
85
+ #### πŸ”€ **Literal Search**
86
+ Best for finding specific titles or authors:
87
+ - "harry" β†’ finds Harry Potter books
88
+ - "tolkien" β†’ finds J.R.R. Tolkien works
89
+ - "game thrones" β†’ finds Game of Thrones
90
+ - "stephen king" β†’ finds Stephen King novels
91
 
92
+ ### Advanced Features
 
 
 
93
 
94
+ #### **Category Filtering**
95
+ Narrow results by genre:
96
+ - Fiction, Non-fiction, Fantasy, Romance, Mystery, etc.
97
 
98
+ #### **Emotional Tone Matching**
99
+ Find books by mood:
100
+ - **Happy**: High joy scores
101
+ - **Surprising**: High surprise scores
102
+ - **Angry**: High anger scores
103
+ - **Suspenseful**: High fear scores
104
+ - **Sad**: High sadness scores
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
 
106
  ## How It Works
107
 
108
+ ### πŸ”¬ **Semantic Search Engine**
109
+ ```python
110
+ # Uses sentence-transformers for embedding generation
111
+ embeddings = HuggingFaceEmbeddings(
112
+ model_name="sentence-transformers/all-MiniLM-L6-v2",
113
+ model_kwargs={'device': 'cpu'},
114
+ encode_kwargs={'normalize_embeddings': True}
115
+ )
116
 
117
+ # ChromaDB for efficient similarity search
118
+ db_books = Chroma.from_documents(
119
+ documents, embedding=embeddings,
120
+ collection_name="books", persist_directory="chroma_books"
121
+ )
122
+ ```
123
+
124
+ ### πŸ” **Flexible Literal Search**
125
+ ```python
126
+ # Intelligent regex pattern matching
127
+ def retrieve_literal_recommendations(query, category=None, tone=None):
128
+ # Creates flexible patterns for partial word matching
129
+ # Handles special characters and multiple word combinations
130
+ # Falls back to simple string matching if regex fails
131
+ ```
132
 
133
+ ### 🎭 **Emotional Intelligence**
134
+ Books are analyzed and scored across five emotional dimensions:
135
+ - **Joy**: Happiness, humor, uplifting content
136
+ - **Surprise**: Plot twists, unexpected elements
137
+ - **Anger**: Conflict, tension, dramatic intensity
138
+ - **Fear**: Suspense, thriller elements, mystery
139
+ - **Sadness**: Emotional depth, tragic elements
140
 
141
+ ### 🎨 **Smart UI Components**
142
+ ```python
143
+ def create_book_card_html(row):
144
+ # Responsive card design with:
145
+ # - Book cover with fallback handling
146
+ # - Star ratings visualization
147
+ # - Author formatting (handles multiple authors)
148
+ # - Truncated descriptions with full content
149
+ # - Download links when available
150
+ ```
151
 
152
+ ## Project Structure
153
+
154
+ ```
155
+ semantic-book-recommender/
156
+ β”œβ”€β”€ app.py # Main application (your updated file)
157
+ β”œβ”€β”€ requirements.txt # Python dependencies
158
+ β”œβ”€β”€ final_book_df.csv # Book dataset with metadata
159
+ β”œβ”€β”€ tagged_description.txt # Book descriptions for embedding
160
+ β”œβ”€β”€ chroma_books/ # ChromaDB vector database
161
+ β”œβ”€β”€ demo.png # Interface screenshot
162
+ └── README.md # This file
163
+ ```
164
 
165
  ## Configuration
166
 
167
+ ### **Embedding Models**
168
+ Switch between models for different performance profiles:
169
 
170
  ```python
171
+ # Fast and efficient (default)
172
+ "sentence-transformers/all-MiniLM-L6-v2"
173
 
174
  # Higher quality, slower
175
+ "sentence-transformers/all-mpnet-base-v2"
176
+
177
+ # Multilingual support
178
+ "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
179
  ```
180
 
181
+ ### **Search Parameters**
182
+ Customize recommendation behavior:
183
 
184
  ```python
185
  def retrieve_semantic_recommendations(
186
  query: str,
187
+ initial_top_k: int = 50, # Initial retrieval size
188
+ final_top_k: int = 8, # Final recommendations shown
189
+ category: str = None, # Category filter
190
+ tone: str = None # Emotional tone filter
191
  )
192
  ```
193
 
194
+ ### **UI Customization**
195
+ Modify card display and styling:
196
+
197
+ ```python
198
+ # Book card dimensions
199
+ style="width: 80px; height: 120px"
200
 
201
+ # Description truncation
202
+ -webkit-line-clamp: 4
203
 
204
+ # Rating display
205
+ create_star_rating(rating) # β˜…β˜…β˜…β˜…β˜† format
206
+ ```
207
+
208
+ ## Data Schema
209
+
210
+ ### Book Dataset Columns
211
+ ```python
212
+ # Core metadata
213
+ 'isbn13', 'title_and_subtitle', 'authors', 'categories'
214
 
215
+ # Visual elements
216
+ 'thumbnail', 'large_thumbnail'
217
 
218
+ # Ratings and metrics
219
+ 'average_rating', 'ratings_count'
220
+
221
+ # Content
222
+ 'description'
223
+
224
+ # Emotional scores
225
+ 'joy', 'surprise', 'anger', 'fear', 'sadness'
226
+
227
+ # Access
228
+ 'url' # Download/purchase links
229
+ ```
230
+
231
+ ## API Reference
232
+
233
+ ### **Main Functions**
234
+
235
+ ```python
236
+ # Semantic search with AI understanding
237
+ retrieve_semantic_recommendations(query, category, tone, initial_top_k, final_top_k)
238
+
239
+ # Literal search with flexible matching
240
+ retrieve_literal_recommendations(query, category, tone, final_top_k)
241
+
242
+ # HTML card generation
243
+ create_book_card_html(row)
244
+
245
+ # Main Gradio interface function
246
+ recommend_books(query, category, tone, search_type)
247
+ ```
248
 
249
  ## Dependencies
250
 
251
+ ```python
252
+ # Core ML and Vector Database
253
+ langchain-chroma>=0.1.0
254
+ langchain-huggingface>=0.0.3
255
+ langchain-community>=0.2.0
256
+ sentence-transformers>=2.2.0
257
+
258
+ # Data Processing
259
+ pandas>=1.5.0
260
+ numpy>=1.21.0
261
+
262
+ # Web Interface
263
+ gradio>=4.0.0
264
+
265
+ # Text Processing
266
+ regex>=2022.0.0
267
+ ```
268
+
269
+ ## Performance Tips
270
+
271
+ ### **Startup Optimization**
272
+ ```python
273
+ # Model caching for faster restarts
274
+ os.environ["HF_HOME"] = "/tmp/hf_cache"
275
+ os.environ["TRANSFORMERS_CACHE"] = "/tmp/hf_cache"
276
+ ```
277
+
278
+ ### **Search Optimization**
279
+ - Use semantic search for exploratory queries
280
+ - Use literal search for known titles/authors
281
+ - Combine category and tone filters for precision
282
+ - Try variations if initial results aren't satisfactory
283
+
284
+ ### **Memory Management**
285
+ - ChromaDB persists to disk automatically
286
+ - Embeddings cached after first load
287
+ - Efficient pandas operations for filtering
288
 
289
  ## Contributing
290
 
291
+ 1. **Fork** the repository
292
+ 2. **Create** a feature branch (`git checkout -b feature/amazing-feature`)
293
+ 3. **Commit** changes (`git commit -m 'Add amazing feature'`)
294
+ 4. **Push** to branch (`git push origin feature/amazing-feature`)
295
+ 5. **Open** a Pull Request
296
+
297
+ ### Development Areas
298
+ - [ ] Additional emotional dimensions
299
+ - [ ] Multi-language support
300
+ - [ ] User preference learning
301
+ - [ ] Social features (reviews, ratings)
302
+ - [ ] Advanced filtering (publication year, page count)
303
+
304
+ ## Troubleshooting
305
+
306
+ ### **Common Issues**
307
+
308
+ **ChromaDB not found:**
309
+ ```bash
310
+ # The app will auto-create from tagged_description.txt
311
+ # Ensure this file exists in the project root
312
+ ```
313
+
314
+ **Model download slow:**
315
+ ```bash
316
+ # Models cache automatically after first download
317
+ # Subsequent starts will be much faster
318
+ ```
319
+
320
+ **No search results:**
321
+ ```bash
322
+ # Try switching between search modes
323
+ # Reduce filter constraints (category/tone)
324
+ # Use broader search terms
325
+ ```
326
 
327
  ## License
328
 
 
330
 
331
  ## Acknowledgments
332
 
333
+ - **Sentence Transformers** for powerful embedding models
334
+ - **ChromaDB** for efficient vector storage and retrieval
335
+ - **Gradio** for creating accessible ML interfaces
336
+ - **LangChain** for seamless AI integration
337
+ - **HuggingFace** for model hosting and ecosystem
338
 
339
  ---
340
 
341
+ ## 🎯 Example Queries to Try
342
+
343
+ ### Semantic Search
344
+ - "Epic fantasy with complex magic systems"
345
+ - "Cozy mystery in a small town setting"
346
+ - "Hard science fiction about space exploration"
347
+ - "Historical romance during the Regency era"
348
+
349
+ ### Literal Search
350
+ - "agatha christie" (find Agatha Christie novels)
351
+ - "dune" (find Dune series books)
352
+ - "pride prejudice" (find Pride and Prejudice)
353
+ - "lord rings" (find Lord of the Rings)
354
+
355
+ **Happy Reading! πŸ“–βœ¨**
demo.png CHANGED

Git LFS Details

  • SHA256: 54cf64741f61469733e54ae40ac597c3cfc4992ce6b55888d940d1269249a771
  • Pointer size: 131 Bytes
  • Size of remote file: 196 kB

Git LFS Details

  • SHA256: e2faea4c7b88695e4d6c7395a306af61f3102f0588534c6be2d4747226e70572
  • Pointer size: 131 Bytes
  • Size of remote file: 177 kB