Spaces:
Sleeping
Sleeping
nonsodev commited on
Commit Β·
8efe0a8
1
Parent(s): d38101e
Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,190 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Semantic Book Recommender π
|
| 2 |
+
|
| 3 |
+
A smart book recommendation system that uses semantic search and emotional tone analysis to help users discover their next favorite read. Built with LangChain, ChromaDB, and Gradio for an intuitive web interface.
|
| 4 |
+
|
| 5 |
+
## Features
|
| 6 |
+
|
| 7 |
+
- **Semantic Search**: Uses advanced sentence transformers to understand the meaning behind your book preferences
|
| 8 |
+
- **Category Filtering**: Browse recommendations by specific book categories
|
| 9 |
+
- **Emotional Tone Matching**: Find books that match your desired emotional experience (Happy, Surprising, Angry, Suspenseful, Sad)
|
| 10 |
+
- **Visual Gallery**: Browse recommendations with book covers and detailed descriptions
|
| 11 |
+
- **Fast Performance**: Optimized vector database for quick retrieval
|
| 12 |
+
|
| 13 |
+
## Demo
|
| 14 |
+
|
| 15 |
+

|
| 16 |
+
|
| 17 |
+
Simply describe what you're looking for, select your preferred category and emotional tone, and get personalized book recommendations!
|
| 18 |
+
|
| 19 |
+
## Installation
|
| 20 |
+
|
| 21 |
+
### Prerequisites
|
| 22 |
+
|
| 23 |
+
- Python 3.8+
|
| 24 |
+
- pip package manager
|
| 25 |
+
|
| 26 |
+
### Setup
|
| 27 |
+
|
| 28 |
+
1. **Clone the repository**
|
| 29 |
+
```bash
|
| 30 |
+
git clone <your-repo-url>
|
| 31 |
+
cd book-recommender-llm
|
| 32 |
+
```
|
| 33 |
+
|
| 34 |
+
2. **Install dependencies**
|
| 35 |
+
```bash
|
| 36 |
+
pip install -r requirements.txt
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
3. **Ensure data files are present**
|
| 40 |
+
- `final_book_df.csv`: Main book dataset with metadata
|
| 41 |
+
- `chroma_books/`: ChromaDB vector database directory
|
| 42 |
+
- `cover-not-found.jpg`: Placeholder image for missing book covers
|
| 43 |
+
|
| 44 |
+
## Usage
|
| 45 |
+
|
| 46 |
+
### Running the Application
|
| 47 |
+
|
| 48 |
+
```bash
|
| 49 |
+
python gradio_dashboard.py
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
The application will launch a web interface (typically at `http://localhost:7860`) where you can:
|
| 53 |
+
|
| 54 |
+
1. Enter a description of your ideal book
|
| 55 |
+
2. Select a category (optional)
|
| 56 |
+
3. Choose an emotional tone (optional)
|
| 57 |
+
4. Click "Submit" to get recommendations
|
| 58 |
+
|
| 59 |
+
### Example Queries
|
| 60 |
+
|
| 61 |
+
- "A thrilling mystery set in Victorian London"
|
| 62 |
+
- "Romantic comedy with strong female protagonist"
|
| 63 |
+
- "Science fiction about artificial intelligence"
|
| 64 |
+
- "Historical fiction during World War II"
|
| 65 |
+
|
| 66 |
+
## Project Structure
|
| 67 |
+
|
| 68 |
+
```
|
| 69 |
+
book-recommender-llm/
|
| 70 |
+
βββ gradio_dashboard.py # Main application file
|
| 71 |
+
βββ requirements.txt # Python dependencies
|
| 72 |
+
βββ final_book_df.csv # Book dataset
|
| 73 |
+
βββ cover-not-found.jpg # Placeholder image
|
| 74 |
+
βββ chroma_books/ # Vector database
|
| 75 |
+
βββ notebooks/
|
| 76 |
+
β βββ data-exploration.ipynb # Data analysis
|
| 77 |
+
β βββ download_url.ipynb # Data download utilities
|
| 78 |
+
β βββ final_df.ipynb # Data processing
|
| 79 |
+
β βββ sentiment_analysis.ipynb # Emotion analysis
|
| 80 |
+
β βββ supervised_clean.py # Data cleaning
|
| 81 |
+
β βββ test_classification.ipynb # Model testing
|
| 82 |
+
βββ data/
|
| 83 |
+
βββ books_cleaned.csv # Processed book data
|
| 84 |
+
βββ books_with_categories.csv
|
| 85 |
+
βββ books_with_sentiment.csv
|
| 86 |
+
βββ books_with_urls.csv
|
| 87 |
+
βββ search_progress.csv # Processing logs
|
| 88 |
+
βββ tagged_description.txt # Tagged descriptions
|
| 89 |
+
βββ to_drop.txt # Items to exclude
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
## How It Works
|
| 93 |
+
|
| 94 |
+
### 1. Semantic Search
|
| 95 |
+
- Uses `sentence-transformers/all-MiniLM-L6-v2` for fast, high-quality embeddings
|
| 96 |
+
- ChromaDB stores and retrieves similar books based on vector similarity
|
| 97 |
+
- Initial retrieval of top 50 matches, refined to top 16 recommendations
|
| 98 |
+
|
| 99 |
+
### 2. Filtering & Ranking
|
| 100 |
+
- **Category Filter**: Narrows results to specific genres
|
| 101 |
+
- **Emotional Tone**: Ranks books by emotion scores (joy, surprise, anger, fear, sadness)
|
| 102 |
+
- **Relevance**: Maintains semantic relevance while applying filters
|
| 103 |
+
|
| 104 |
+
### 3. User Interface
|
| 105 |
+
- Clean, modern design using Gradio's Glass theme
|
| 106 |
+
- Gallery view with book covers and descriptions
|
| 107 |
+
- Responsive layout for different screen sizes
|
| 108 |
+
|
| 109 |
+
## Data Sources
|
| 110 |
+
|
| 111 |
+
The book dataset includes:
|
| 112 |
+
- **Metadata**: Title, authors, ISBN, categories, publication info
|
| 113 |
+
- **Content**: Descriptions, summaries
|
| 114 |
+
- **Visual**: Thumbnail images, large cover images
|
| 115 |
+
- **Emotional Scores**: Joy, surprise, anger, fear, sadness ratings
|
| 116 |
+
|
| 117 |
+
## Configuration
|
| 118 |
+
|
| 119 |
+
### Embedding Models
|
| 120 |
+
You can switch between different embedding models in `gradio_dashboard.py`:
|
| 121 |
+
|
| 122 |
+
```python
|
| 123 |
+
# Fast and good quality (default)
|
| 124 |
+
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
|
| 125 |
+
|
| 126 |
+
# Higher quality, slower
|
| 127 |
+
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
|
| 128 |
+
```
|
| 129 |
+
|
| 130 |
+
### Search Parameters
|
| 131 |
+
Adjust recommendation parameters:
|
| 132 |
+
|
| 133 |
+
```python
|
| 134 |
+
def retrieve_semantic_recommendations(
|
| 135 |
+
query: str,
|
| 136 |
+
category: str = None,
|
| 137 |
+
tone: str = None,
|
| 138 |
+
initial_top_k: int = 50, # Initial retrieval count
|
| 139 |
+
final_top_k: int = 16, # Final recommendation count
|
| 140 |
+
)
|
| 141 |
+
```
|
| 142 |
+
|
| 143 |
+
## Development
|
| 144 |
+
|
| 145 |
+
### Adding New Features
|
| 146 |
+
|
| 147 |
+
1. **New Emotional Tones**: Add emotion columns to your dataset and update the `tones` list
|
| 148 |
+
2. **Additional Filters**: Extend the filtering logic in `retrieve_semantic_recommendations()`
|
| 149 |
+
3. **UI Improvements**: Modify the Gradio interface in the `dashboard` block
|
| 150 |
+
|
| 151 |
+
### Data Processing Pipeline
|
| 152 |
+
|
| 153 |
+
The project includes several notebooks for data processing:
|
| 154 |
+
- Data exploration and cleaning
|
| 155 |
+
- Sentiment analysis for emotional scoring
|
| 156 |
+
- URL processing for book covers
|
| 157 |
+
- Model testing and validation
|
| 158 |
+
|
| 159 |
+
## Dependencies
|
| 160 |
+
|
| 161 |
+
Key libraries used:
|
| 162 |
+
- **LangChain**: Vector database integration
|
| 163 |
+
- **ChromaDB**: Vector storage and similarity search
|
| 164 |
+
- **Gradio**: Web interface
|
| 165 |
+
- **HuggingFace Transformers**: Sentence embeddings
|
| 166 |
+
- **Pandas**: Data manipulation
|
| 167 |
+
- **NumPy**: Numerical operations
|
| 168 |
+
|
| 169 |
+
## Contributing
|
| 170 |
+
|
| 171 |
+
1. Fork the repository
|
| 172 |
+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
|
| 173 |
+
3. Commit your changes (`git commit -m 'Add amazing feature'`)
|
| 174 |
+
4. Push to the branch (`git push origin feature/amazing-feature`)
|
| 175 |
+
5. Open a Pull Request
|
| 176 |
+
|
| 177 |
+
## License
|
| 178 |
+
|
| 179 |
+
This project is licensed under the MIT License - see the LICENSE file for details.
|
| 180 |
+
|
| 181 |
+
## Acknowledgments
|
| 182 |
+
|
| 183 |
+
- Sentence Transformers for powerful embedding models
|
| 184 |
+
- ChromaDB for efficient vector storage
|
| 185 |
+
- Gradio for making ML interfaces accessible
|
| 186 |
+
- The open-source community for book metadata
|
| 187 |
+
|
| 188 |
+
---
|
| 189 |
+
|
| 190 |
+
**Happy Reading! πβ¨**
|