nonsodev commited on
Commit
8efe0a8
Β·
1 Parent(s): d38101e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +190 -0
README.md ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Semantic Book Recommender πŸ“š
2
+
3
+ A smart book recommendation system that uses semantic search and emotional tone analysis to help users discover their next favorite read. Built with LangChain, ChromaDB, and Gradio for an intuitive web interface.
4
+
5
+ ## Features
6
+
7
+ - **Semantic Search**: Uses advanced sentence transformers to understand the meaning behind your book preferences
8
+ - **Category Filtering**: Browse recommendations by specific book categories
9
+ - **Emotional Tone Matching**: Find books that match your desired emotional experience (Happy, Surprising, Angry, Suspenseful, Sad)
10
+ - **Visual Gallery**: Browse recommendations with book covers and detailed descriptions
11
+ - **Fast Performance**: Optimized vector database for quick retrieval
12
+
13
+ ## Demo
14
+
15
+ ![Book Recommender Interface](demo_img.png)
16
+
17
+ Simply describe what you're looking for, select your preferred category and emotional tone, and get personalized book recommendations!
18
+
19
+ ## Installation
20
+
21
+ ### Prerequisites
22
+
23
+ - Python 3.8+
24
+ - pip package manager
25
+
26
+ ### Setup
27
+
28
+ 1. **Clone the repository**
29
+ ```bash
30
+ git clone <your-repo-url>
31
+ cd book-recommender-llm
32
+ ```
33
+
34
+ 2. **Install dependencies**
35
+ ```bash
36
+ pip install -r requirements.txt
37
+ ```
38
+
39
+ 3. **Ensure data files are present**
40
+ - `final_book_df.csv`: Main book dataset with metadata
41
+ - `chroma_books/`: ChromaDB vector database directory
42
+ - `cover-not-found.jpg`: Placeholder image for missing book covers
43
+
44
+ ## Usage
45
+
46
+ ### Running the Application
47
+
48
+ ```bash
49
+ python gradio_dashboard.py
50
+ ```
51
+
52
+ The application will launch a web interface (typically at `http://localhost:7860`) where you can:
53
+
54
+ 1. Enter a description of your ideal book
55
+ 2. Select a category (optional)
56
+ 3. Choose an emotional tone (optional)
57
+ 4. Click "Submit" to get recommendations
58
+
59
+ ### Example Queries
60
+
61
+ - "A thrilling mystery set in Victorian London"
62
+ - "Romantic comedy with strong female protagonist"
63
+ - "Science fiction about artificial intelligence"
64
+ - "Historical fiction during World War II"
65
+
66
+ ## Project Structure
67
+
68
+ ```
69
+ book-recommender-llm/
70
+ β”œβ”€β”€ gradio_dashboard.py # Main application file
71
+ β”œβ”€β”€ requirements.txt # Python dependencies
72
+ β”œβ”€β”€ final_book_df.csv # Book dataset
73
+ β”œβ”€β”€ cover-not-found.jpg # Placeholder image
74
+ β”œβ”€β”€ chroma_books/ # Vector database
75
+ β”œβ”€β”€ notebooks/
76
+ β”‚ β”œβ”€β”€ data-exploration.ipynb # Data analysis
77
+ β”‚ β”œβ”€β”€ download_url.ipynb # Data download utilities
78
+ β”‚ β”œβ”€β”€ final_df.ipynb # Data processing
79
+ β”‚ β”œβ”€β”€ sentiment_analysis.ipynb # Emotion analysis
80
+ β”‚ β”œβ”€β”€ supervised_clean.py # Data cleaning
81
+ β”‚ └── test_classification.ipynb # Model testing
82
+ └── data/
83
+ β”œβ”€β”€ books_cleaned.csv # Processed book data
84
+ β”œβ”€β”€ books_with_categories.csv
85
+ β”œβ”€β”€ books_with_sentiment.csv
86
+ β”œβ”€β”€ books_with_urls.csv
87
+ β”œβ”€β”€ search_progress.csv # Processing logs
88
+ β”œβ”€β”€ tagged_description.txt # Tagged descriptions
89
+ └── to_drop.txt # Items to exclude
90
+ ```
91
+
92
+ ## How It Works
93
+
94
+ ### 1. Semantic Search
95
+ - Uses `sentence-transformers/all-MiniLM-L6-v2` for fast, high-quality embeddings
96
+ - ChromaDB stores and retrieves similar books based on vector similarity
97
+ - Initial retrieval of top 50 matches, refined to top 16 recommendations
98
+
99
+ ### 2. Filtering & Ranking
100
+ - **Category Filter**: Narrows results to specific genres
101
+ - **Emotional Tone**: Ranks books by emotion scores (joy, surprise, anger, fear, sadness)
102
+ - **Relevance**: Maintains semantic relevance while applying filters
103
+
104
+ ### 3. User Interface
105
+ - Clean, modern design using Gradio's Glass theme
106
+ - Gallery view with book covers and descriptions
107
+ - Responsive layout for different screen sizes
108
+
109
+ ## Data Sources
110
+
111
+ The book dataset includes:
112
+ - **Metadata**: Title, authors, ISBN, categories, publication info
113
+ - **Content**: Descriptions, summaries
114
+ - **Visual**: Thumbnail images, large cover images
115
+ - **Emotional Scores**: Joy, surprise, anger, fear, sadness ratings
116
+
117
+ ## Configuration
118
+
119
+ ### Embedding Models
120
+ You can switch between different embedding models in `gradio_dashboard.py`:
121
+
122
+ ```python
123
+ # Fast and good quality (default)
124
+ embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
125
+
126
+ # Higher quality, slower
127
+ embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
128
+ ```
129
+
130
+ ### Search Parameters
131
+ Adjust recommendation parameters:
132
+
133
+ ```python
134
+ def retrieve_semantic_recommendations(
135
+ query: str,
136
+ category: str = None,
137
+ tone: str = None,
138
+ initial_top_k: int = 50, # Initial retrieval count
139
+ final_top_k: int = 16, # Final recommendation count
140
+ )
141
+ ```
142
+
143
+ ## Development
144
+
145
+ ### Adding New Features
146
+
147
+ 1. **New Emotional Tones**: Add emotion columns to your dataset and update the `tones` list
148
+ 2. **Additional Filters**: Extend the filtering logic in `retrieve_semantic_recommendations()`
149
+ 3. **UI Improvements**: Modify the Gradio interface in the `dashboard` block
150
+
151
+ ### Data Processing Pipeline
152
+
153
+ The project includes several notebooks for data processing:
154
+ - Data exploration and cleaning
155
+ - Sentiment analysis for emotional scoring
156
+ - URL processing for book covers
157
+ - Model testing and validation
158
+
159
+ ## Dependencies
160
+
161
+ Key libraries used:
162
+ - **LangChain**: Vector database integration
163
+ - **ChromaDB**: Vector storage and similarity search
164
+ - **Gradio**: Web interface
165
+ - **HuggingFace Transformers**: Sentence embeddings
166
+ - **Pandas**: Data manipulation
167
+ - **NumPy**: Numerical operations
168
+
169
+ ## Contributing
170
+
171
+ 1. Fork the repository
172
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
173
+ 3. Commit your changes (`git commit -m 'Add amazing feature'`)
174
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
175
+ 5. Open a Pull Request
176
+
177
+ ## License
178
+
179
+ This project is licensed under the MIT License - see the LICENSE file for details.
180
+
181
+ ## Acknowledgments
182
+
183
+ - Sentence Transformers for powerful embedding models
184
+ - ChromaDB for efficient vector storage
185
+ - Gradio for making ML interfaces accessible
186
+ - The open-source community for book metadata
187
+
188
+ ---
189
+
190
+ **Happy Reading! πŸ“–βœ¨**