File size: 10,159 Bytes
14df041
 
 
 
2e6d9bc
14df041
 
 
449a71e
14df041
 
 
25e6afd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
---
title: Classroom AI Assistant
emoji: πŸŽ“
colorFrom: indigo
colorTo: red
sdk: docker
sdk_version: "1.0"
app_file: Dockerfile
app_port: 7860
pinned: false
---

# AI Teaching Assistant System

This project integrates emotion detection, voice-to-text, AI processing, and text-to-voice capabilities into a web-based teaching assistant system. Powered by FastAPI and LLaMA 3.1 3B, it bridges the gap between human emotions and AI responses, creating a real-time emotion-driven interaction experience.

## Problem Statement:
Modern classrooms lack real-time, interactive tools to address diverse student needs and keep them engaged. The objective is to create a multimodal AI assistant that:

- Accepts and processes text and voice queries from students in real-time.
- Provides contextual responses, including textual explanations, charts, and visual aids.
- Detects disengagement or confusion using facial expression analysis.

## Features

- **Emotion Detection**: Detects user's facial emotions in real-time using DeepFace and OpenCV
- **Voice-to-Text**: Converts user's speech to text for natural language input
- **AI Processing**: Processes user queries with emotion-aware AI responses
- **Image Search**: Finds relevant images based on contextual prompts
- **Text-to-Voice**: Converts AI responses to speech with emotion-appropriate voice synthesis
- **Web Interface**: Modern UI built with HTML, TailwindCSS, and JavaScript

## System Architecture

### Backend
- FastAPI server with asynchronous WebSocket implementation
- Connection management system handling multiple concurrent users
- Integration of multiple AI components
- Efficient state tracking of user emotions, interactions, and responses
- Asynchronous processing for better performance

### Frontend
- Responsive design with TailwindCSS
- Real-time WebSocket communication for seamless interactions
- Camera integration for emotion detection
- Dynamic UI updates based on emotion detection
- Speech recognition using Web Speech API

### System Flow Diagram

```mermaid
graph TD
    User[User] -->|Speaks & Shows Emotion| UI[Frontend UI]
    
    subgraph "Frontend"
        UI -->|Captures Video| EmotionCapture[Emotion Capture]
        UI -->|Records Audio| SpeechCapture[Speech Capture]
        EmotionCapture -->|Base64 Image| WebSocket
        SpeechCapture -->|Text| WebSocket[WebSocket Connection]
        WebSocket -->|Responses| UIUpdate[UI Updates]
        UIUpdate -->|Display| UI
    end
    
    WebSocket <-->|Bidirectional Communication| Server[FastAPI Server]
    
    subgraph "Backend"
        Server -->|Manages| ConnectionMgr[Connection Manager]
        Server -->|Processes Image| EmotionDetection[Emotion Detection]
        Server -->|Processes Text| AIProcessing[LLaMA 3.1 Processing]
        EmotionDetection -->|Emotion State| AIProcessing
        AIProcessing -->|Response Text| TextToSpeech[Text-to-Speech]
        AIProcessing -->|Image Prompt| ImageSearch[Image Search]
        TextToSpeech -->|Audio File| Server
        ImageSearch -->|Image URL| Server
    end
    
    Server -->|Audio & Images & Text| WebSocket
```

## Demo

### Screenshots

![AI Teaching Assistant Interface](asset/screenshot.jpeg)
*Screenshot: Main interface showing the emotion-aware Classroom AI assistant with real-time camera feed and chat interface*

### Screen Recording
See the AI Classroom Assistant in action:

[Watch Demo](asset/screenrecord.mp4)

*Video: Complete demonstration of the AI Teaching Assistant system showing emotion detection, voice interaction, and AI responses*

## Usage

1. Click the "Start Assistant" button to begin
2. Allow camera and microphone permissions when prompted
3. Speak clearly to interact with the assistant
4. The system will:
   - Detect your emotion
   - Convert your speech to text
   - Process your query with AI
   - Display relevant images
   - Speak the AI response

## Components

- **main.py**: FastAPI backend server
- **emotion_processor.py**: Handles facial emotion detection
- **voice_processor.py**: Manages speech-to-text conversion
- **img_and_ai.py**: Handles image search and AI processing
- **TextToVoice.py**: Manages text-to-speech conversion
- **index.html**: Main frontend interface
- **styles.css**: Custom styling
- **app.js**: Frontend JavaScript logic

## Project Structure

```
version1/
β”œβ”€β”€ README.md                     # Project documentation
β”œβ”€β”€ requirements.txt              # Python dependencies
β”œβ”€β”€ server.py                     # Main server entry point
β”œβ”€β”€ .env                          # Environment Variables
β”œβ”€β”€ asset/
β”‚   β”œβ”€β”€ screenrecord.mp4          # Demo video showing system functionality
β”‚   └── screenshot.jpeg           # Interface screenshot
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ emotion_processor.py      # Emotion detection processing
β”‚   β”œβ”€β”€ haarcascade_frontalface_default.xml  # Face detection model
β”‚   β”œβ”€β”€ img_and_ai.py             # Image processing utilities
β”‚   β”œβ”€β”€ main.py                   # FastAPI application logic
β”‚   β”œβ”€β”€ TextToVoice.py            # Text-to-speech functionality
β”‚   └── voice_processor.py        # Speech recognition functionality
└── frontend/
    β”œβ”€β”€ static/
    β”‚   β”œβ”€β”€ app.js                # Frontend JavaScript
    β”‚   β”œβ”€β”€ styles.css            # CSS styling
    β”‚   └── final_audio_*.mp3     # Generated audio responses
    └── templates/
        └── index.html            # Main web interface
```

## AI Core & Technical Details

### AI Model
- LLaMA 3.1 3B (8-bit) fine-tuned on 9,000+ emotion-labeled Q&A pairs
- Custom emotion-aware prompt engineering with context preservation
- Dedicated processing pipeline for each detected emotional state
- Local model deployment with optimized inference for responsive interactions

### Emotion Detection
- DeepFace & OpenCV with real-time webcam processing
- Base64 image encoding for efficient WebSocket transmission
- Continuous emotional state tracking with state management

### Voice Interaction
- Speech-to-Text for natural language input
- Edge Text-to-Speech with emotion-appropriate voice synthesis
- Unique audio file generation with UUID-based identification

### Image Search
- Contextual image sourcing driven by prompt
- Dynamic content generation that adapts to both query and detected emotion
- Integrated image processing with AI-generated responses

## Implementation Details

This isn't just an API wrapperβ€”it's a complete system with:
- Custom WebSocket architecture handling real-time bidirectional communication
- End-to-end emotion processing pipeline from detection to response generation
- Local model deployment with optimized inference for responsive interactions
- Comprehensive error handling and logging system
- No external LLM APIs were used due to project restrictionsβ€”everything runs locally

### Emotional Intelligence Architecture
- Real-time emotion detection feeding continuously into the AI decision matrix
- Calibrated to different emotional states
- Stateful conversation tracking that maintains emotional context
- Adaptive voice characteristics matching the detected emotional state
- Connection Manager tracking user state across multiple sessions

## Resources

### Dataset and Model
- Dataset: [Ques-Ans-with-Emotion](https://huggingface.co/datasets/0xarchit/Ques-Ans-with-Emotion) - 9,000+ emotion-labeled Q&A pairs
- Model: [AI Teaching Assistant](https://huggingface.co/0xarchit/ai_teaching_assistant) - Fine-tuned LLaMA 3.1 3B

## Setup Instructions

### Prerequisites

- Python 3.8 or higher
- Webcam for emotion detection
- Microphone for voice input
- Internet connection for image search

### Installation

1. Clone the repository or navigate to the project directory

2. Install dependencies:
   ```
   pip install -r requirements.txt
   ```

3. Set up environment variables:
   - Rename `.env.example` to `.env` (or create a new `.env` file)
   - Add your RapidAPI key for image search
   - Configure any other environment variables as needed
   
   Example `.env` file:
   ```
   RAPIDAPI_KEY="your_rapidapi_key_here"
   RAPIDAPI_HOST="real-time-image-search.p.rapidapi.com"
   AI_MODEL_URL="http://localhost:12345/v1/chat/completions"
   ```

4. Run the server:
   ```
   python server.py
   ```

5. Open your browser and navigate to:
   ```
   http://localhost:8000
   ```


## Troubleshooting

- If the camera doesn't work, check browser permissions
- If speech recognition fails, ensure your microphone is working
- If the server fails to start, check if port 8000 is available

## Future Goals

As we continue to develop this AI Classroom Teaching Assistant System, we plan to implement several enhancements to make the experience even more immersive and effective:

### Near-Term Enhancements
- **Emotion-Adaptive Voice Generation**: Integration of a specialized ML model for human-like voice generation that dynamically adapts tone, pitch, and speaking style based on detected student emotions
- **AI Image Generation**: Implementation of diffusion models to create custom educational illustrations and diagrams in real-time based on the educational content being discussed
- **Multi-Student Emotion Tracking**: Ability to simultaneously track and respond to multiple students emotional states in classroom settings
- **Personalized Learning Paths**: Development of student profiles that track engagement patterns and learning preferences to customize future interactions
- **Extended Language Support**: Integration of multilingual capabilities for global classroom deployment

### Long-Term Vision
- **AR/VR Integration**: Creation of immersive educational experiences with 3D visualizations of complex concepts
- **Collaborative Learning Features**: Facilitation of group activities and peer-to-peer learning with AI moderation
- **Advanced Analytics Dashboard**: Comprehensive insights for educators about student engagement, emotional patterns, and learning progress

## License

This project is part of the Intel Unnati program, completed by team Bitbots.

🌟 Give This Repo A Star If You Like 🌟