ContextFlow Architecture: Complete System Overview
Table of Contents
- System Vision
- High-Level Architecture
- Frontend Layer
- Backend Layer
- Agent Network
- Reinforcement Learning Pipeline
- Data Flow
- API Design
- Multi-Modal Detection
- Privacy & Security
- Deployment Architecture
1. System Vision
ContextFlow is an AI-powered learning intelligence engine that predicts when learners will get confused BEFORE it happens, enabling proactive intervention in educational settings.
Core Problem Solved
- Traditional learning systems are reactive - they respond after confusion occurs
- ContextFlow is proactive - it predicts confusion and intervenes before disengagement
Key Innovations
- Predictive AI - RL-based doubt prediction
- Gesture Control - Hands-free learning assistance
- Multi-Agent Orchestration - 9 specialized agents working in concert
- Privacy-First - Face blur for classroom deployment
2. High-Level Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USERS β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Students β β Teachers β β Researchers β β
β ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ β
βββββββββββΌββββββββββββββββββΌββββββββββββββββββΌββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PRESENTATION LAYER β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β React Frontend (Vite) β β
β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β
β β β Learn β β LLMFlow β βGestures β β Predict β ... β β
β β β Tab β β Tab β β Tab β β Tab β β β
β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β MediaPipe Camera Feed (Gesture + Face) β β β
β β β ββββββββββββ ββββββββββββ β β β
β β β β Hand β β Face β β β β
β β β β Detection β β Blur β β β β
β β β ββββββββββββ ββββββββββββ β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β REST API (JSON)
β WebSocket (Optional)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BACKEND LAYER (Flask) β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β API Gateway (Flask Blueprints) β β
β β /api/session/* /api/predict/* /api/gesture/* /api/* β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β STUDY ORCHESTRATOR (Central Coordinator) β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Agent Registry β β β
β β β DoubtPredictor β Behavioral β Gesture β Recall β β β
β β β KnowledgeGraph β PeerLearn β LLMOrch β Prompt β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββ¬ββββββββββββββΌββββββββββββββ¬ββββββββββββββββ β
β βΌ βΌ βΌ βΌ βΌ β
β βββββββ βββββββ βββββββ βββββββ βββββββ β
β β Q- β βBehavioralβ βGestureβ βRecallβ βLLM β β
β βNetworkβ βAgent β βAgent β βAgent β βOrch β β
β βββββββ βββββββ βββββββ βββββββ βββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA LAYER β
β ββββββββββββββ ββββββββββββββ ββββββββββββββ ββββββββββββββ β
β β Checkpoint β β Session β β Knowledge β β Real β β
β β (RL Model) β β State β β Graph β β Data β β
β β .pkl β β JSON β β NetworkX β β Collectionβ β
β ββββββββββββββ ββββββββββββββ ββββββββββββββ ββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3. Frontend Layer
3.1 Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Framework | React 18 | UI Components |
| Build Tool | Vite | Fast development |
| Styling | Tailwind CSS | Responsive design |
| Icons | Lucide React | Consistent icons |
| Camera | MediaPipe | Hand/Face detection |
3.2 Application Structure
frontend/src/
βββ App.jsx # Main application (9 tabs)
βββ main.jsx # Entry point
βββ index.css # Global styles
βββ BrowserLLMLauncher.js # AI chat launcher
βββ MediaPipeProcessor.js # Camera + gesture processing
3.3 Tab Interface
| Tab | Purpose |
|---|---|
| Learn | Dashboard with predictions, reviews, gamification |
| LLM Flow | Browser-based AI launcher (no API keys) |
| Gestures | Train custom hand gestures |
| Predict | RL doubt prediction visualization |
| Behavior | Behavioral signal tracking |
| Peer | Social learning insights |
| Stats | Learning statistics |
| Gamify | Fish/XP rewards system |
| Settings | AI provider configuration |
3.4 BrowserLLMLauncher.js
Opens AI chats directly in browser without API keys:
// Opens chat.openai.com with pre-filled context
openAIChat(context, model = 'gpt-4') {
const url = `https://chat.openai.com/?q=${encodeURIComponent(context)}`;
window.open(url, '_blank');
}
3.5 MediaPipeProcessor.js
Handles real-time camera processing:
βββββββββββββββββββ
β Camera Feed β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ βββββββββββββββββββ
β Hand Landmark β β Face Mesh β
β Detection β β Detection β
β (21 points) β β (468 points) β
ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β Gesture β β Face Blur β
β Recognition βββββΆβ (Privacy) β
ββββββββββ¬βββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Backend API β
β /api/gesture/ β
βββββββββββββββββββ
4. Backend Layer
4.1 Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Framework | Flask | REST API |
| Async | asyncio | Non-blocking I/O |
| ML | PyTorch | RL model |
| Data | NumPy | Feature extraction |
| Graphs | NetworkX | Knowledge graphs |
| Storage | JSON/SQLite | Session persistence |
4.2 Flask Application Structure
backend/
βββ run.py # Application entry point
βββ app/
β βββ __init__.py # Flask app factory
β βββ config.py # Configuration
β βββ api/
β β βββ __init__.py
β β βββ main.py # All API routes (889 lines)
β βββ agents/
β βββ __init__.py
β βββ study_orchestrator.py # Central coordinator
β βββ doubt_predictor.py # RL prediction
β βββ behavioral_agent.py # Signal processing
β βββ hand_gesture_agent.py # MediaPipe integration
β βββ recall_agent.py # Spaced repetition
β βββ knowledge_graph_agent.py # Concept mapping
β βββ peer_learning_agent.py # Social learning
β βββ llm_orchestrator_agent.py # Multi-AI
β βββ gesture_action_agent.py # GestureβAction
β βββ prompt_agent.py # Prompt templates
4.3 Flask App Factory
def create_app():
app = Flask(__name__)
# Load config
app.config.from_object('app.config.Config')
# Register blueprints
from app.api.main import api
app.register_blueprint(api, url_prefix='/api')
# Initialize agents
init_agents()
return app
5. Agent Network
5.1 Agent Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STUDY ORCHESTRATOR β
β (Central Coordinator) β
β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Doubt β β Behavioral β β Hand β β
β β Predictor ββββ Agent βββΆβ Gesture β β
β β Agent β β β β Agent β β
β ββββββββ¬βββββββ βββββββββββββββ ββββββββ¬βββββββ β
β β β β
β βΌ βΌ β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Knowledge β β Recall β β LLM β β
β β Graph ββββ Agent βββΆβ Orchestratorβ β
β β Agent β β β β β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β
β βββββββββββββββ βββββββββββββββ β
β β Peer β β Gesture β β
β β Learning β β Action β β
β β Agent β β Mapper β β
β βββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
5.2 StudyOrchestrator (Central Coordinator)
The orchestrator manages the learning lifecycle:
class StudyOrchestrator:
def __init__(self, user_id: str):
self.user_id = user_id
# Initialize all agents
self.doubt_predictor = DoubtPredictorAgent(user_id)
self.behavioral_agent = BehavioralAgent(user_id)
self.gesture_agent = HandGestureAgent(user_id)
self.recall_agent = RecallAgent(user_id)
self.knowledge_graph = KnowledgeGraphAgent(user_id)
self.peer_agent = PeerLearningAgent(user_id)
# State management
self.state = OrchestratorState()
Session Lifecycle:
- PRE_LEARNING - Load predictions, check recalls, get peer insights
- ACTIVE_LEARNING - Monitor signals, update predictions, capture doubts
- REVIEW - Trigger spaced repetition, update knowledge graph
- POST_LEARNING - Sync data, update gamification, generate summary
5.3 DoubtPredictorAgent (RL Core)
Predicts confusion before it happens:
class DoubtPredictorAgent:
def __init__(self, user_id: str, config: dict = None):
self.user_id = user_id
self.model = self._load_checkpoint()
self.feature_extractor = FeatureExtractor()
def predict_doubts(self, context: dict, top_k: int = 5):
# 1. Extract 64-dim state vector
state = self.feature_extractor.extract_state(context)
# 2. Get Q-values from RL model
q_values = self.model.predict(state)
# 3. Return top-k predictions
return self._format_predictions(q_values, top_k)
5.4 BehavioralAgent
Processes raw behavioral signals:
class BehavioralSignal:
mouse_hesitation: float # Pause frequency
scroll_reversals: int # Back-and-forth
time_on_page: float # Seconds
eye_tracking: Tuple[float, float]
click_frequency: int
def calculate_confusion_score(self) -> float:
# Weighted average of signals
weights = {
'hesitation': 0.3,
'reversals': 0.25,
'time_on_page': 0.2,
'tab_switches': 0.15,
'back_button': 0.1
}
return weighted_sum(signals, weights)
5.5 HandGestureAgent
MediaPipe integration for gesture recognition:
Camera Frame
β
βΌ
βββββββββββββββββββ
β MediaPipe Hands β
β (21 landmarks) β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Gesture Templateβ
β Matching β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Confidence ββββΆ Recognized Gesture
β Score (0-1) β
βββββββββββββββββββ
Pre-built Gestures:
| Gesture | Description |
|---|---|
| pinch | Thumb + Index |
| swipe_up | 2-finger up |
| swipe_down | 2-finger down |
| swipe_right | 2-finger right |
| swipe_left | 2-finger left |
| point | Index extended |
| wave | Open palm wave |
| thumbs_up | π confirmation |
| thumbs_down | π rejection |
| fist | Closed hand |
5.6 RecallAgent
SM-2 based spaced repetition:
class RecallCard:
front: str # Question
back: str # Answer
interval: int # Days until review
ease_factor: float # Difficulty (default 2.5)
repetitions: int # Successful reviews
def schedule_review(card: RecallCard, quality: int):
if quality >= 3: # Correct
if card.repetitions == 0:
card.interval = 1
elif card.repetitions == 1:
card.interval = 6
else:
card.interval *= card.ease_factor
card.repetitions += 1
else: # Incorrect
card.repetitions = 0
card.interval = 1
# Update ease factor
card.ease_factor += (0.1 - (5 - quality) * (0.08 + (5 - quality) * 0.02))
card.ease_factor = max(1.3, card.ease_factor)
5.7 KnowledgeGraphAgent
Concept mapping with NetworkX:
class KnowledgeGraphAgent:
def __init__(self, user_id: str):
self.graph = nx.MultiDiGraph()
def add_doubt_to_graph(self, doubt: dict):
# Create node
self.graph.add_node(
doubt['concept'],
type='concept',
topic=doubt['topic'],
timestamp=datetime.now()
)
# Connect to prerequisites
for prereq in doubt.get('prerequisites', []):
self.graph.add_edge(prereq, doubt['concept'], type='prerequisite')
# Connect to related concepts
for related in doubt.get('related', []):
self.graph.add_edge(doubt['concept'], related, type='related')
def find_learning_path(self, from_topic: str, to_topic: str):
try:
return nx.shortest_path(self.graph, from_topic, to_topic)
except nx.NetworkXNoPath:
return []
5.8 LLMOrchestrator
Multi-provider AI integration:
class LLMOrchestrator:
SUPPORTED_PROVIDERS = {
'chatgpt': LLMProvider.CHATGPT,
'gemini': LLMProvider.GEMINI,
'claude': LLMProvider.CLAUDE,
'deepseek': LLMProvider.DEEPSEEK,
'ollama': LLMProvider.OLLAMA,
'groq': LLMProvider.GROQ
}
async def query_parallel(self, request: LLMRequest):
tasks = []
for provider in request.providers:
task = self._query_provider(provider, request)
tasks.append(task)
# Execute all queries concurrently
responses = await asyncio.gather(*tasks, return_exceptions=True)
return [r for r in responses if not isinstance(r, Exception)]
5.9 GestureActionMapper
Maps gestures to system actions:
class GestureAction(Enum):
QUERY_MULTI_LLM = "query_multi_llm"
QUERY_CHATGPT = "query_chatgpt"
QUERY_GEMINI = "query_gemini"
TRIGGER_RL_LOOP = "trigger_rl_loop"
CAPTURE_CONTENT = "capture_content"
PAUSE_SESSION = "pause_session"
RESUME_SESSION = "resume_session"
class GestureActionMapper:
def __init__(self):
self.action_rules = {
GestureAction.QUERY_MULTI_LLM: {
"trigger": {"finger_count": 2, "swipe": "right"}
},
GestureAction.PAUSE_SESSION: {
"trigger": {"gesture": "open_palm"}
},
GestureAction.RESUME_SESSION: {
"trigger": {"gesture": "thumbs_up"}
}
}
5.10 PeerLearningAgent
Social learning insights:
class PeerLearningAgent:
def get_peer_insights(self, topic: str):
# Aggregate insights from "similar" students
insights = []
# Find students who learned this topic
similar_students = self._find_similar_students(topic)
for student in similar_students:
# What confused them?
insights.extend(student.difficult_concepts)
# Return aggregated insights
return self._aggregate_insights(insights)
6. Reinforcement Learning Pipeline
6.1 Problem Formulation
State Space (64 dimensions):
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Topic Embedding (32) β Progress β Confusion (16) β Gesture (14) β Time β
β TF-IDF of topic β 0.0-1.0 β Behavioral β Hand β 0-1 β
β β β signals β signals β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Action Space (10 doubt types):
what_is_backpropagationwhy_gradient_descenthow_overfitting_worksexplain_regularizationwhat_loss_functionhow_optimization_worksexplain_learning_ratewhat_regularizationhow_batch_norm_worksexplain_softmax
Reward Function:
| Event | Reward |
|---|---|
| Correct prediction | +1.0 |
| Helpful explanation | +0.5 |
| Engagement maintained | +0.3 |
| False positive | -0.5 |
| Missed confusion | -1.0 |
6.2 Q-Network Architecture
class QNetwork(nn.Module):
def __init__(self, state_dim=64, action_dim=10, hidden_dim=128):
super().__init__()
self.fc1 = nn.Linear(state_dim, hidden_dim) # 64 β 128
self.fc2 = nn.Linear(hidden_dim, hidden_dim) # 128 β 128
self.fc3 = nn.Linear(hidden_dim, action_dim) # 128 β 10
def forward(self, x):
x = F.relu(self.fc1(x)) # ReLU activation
x = F.relu(self.fc2(x))
return self.fc3(x) # Q-values for each action
6.3 Training Algorithm (GRPO)
class DoubtPredictionRL:
def train(self, epochs=10, batch_size=32):
for epoch in range(epochs):
for batch in self.dataloader:
# 1. Get current Q-values
q_values = self.q_network(batch.states)
# 2. Compute targets (GRPO-style)
with torch.no_grad():
next_q = self.target_network(batch.next_states).max(1)[0]
targets = batch.rewards + self.gamma * next_q * (~batch.dones)
# 3. Compute loss and update
loss = self.loss_fn(q_values.gather(1, batch.actions), targets)
loss.backward()
self.optimizer.step()
# 4. Update target network
self.update_target_network()
# 5. Decay epsilon (exploration)
self.epsilon *= self.epsilon_decay
6.4 Feature Extraction
class FeatureExtractor:
STATE_DIM = 64
def extract_state(self, context: dict) -> np.ndarray:
# Topic embedding (32 dims)
topic_emb = self._extract_topic_embedding(context['topic'])
# Progress (1 dim)
progress = np.array([context['progress']])
# Confusion signals (16 dims)
confusion = self._extract_confusion_signals(context['confusion_signals'])
# Gesture signals (14 dims)
gestures = self._extract_gesture_signals(context['gesture_signals'])
# Time spent (1 dim)
time_spent = np.array([context['time_spent'] / 1800])
# Concatenate
return np.concatenate([topic_emb, progress, confusion, gestures, time_spent])
7. Data Flow
7.1 Learning Session Flow
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER STARTS SESSION β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ORCHESTRATOR.START_SESSION() β
β 1. Create new LearningSession β
β 2. Load RL model checkpoint β
β 3. Build learning context β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββ βββββββββββββ βββββββββββββ
β Doubt β β Behavioralβ β Peer β
β Predictor β β Agent β β Learning β
β β β β β Agent β
β Predict β β Analyze β β Get β
β doubts β β signals β β insights β
βββββββ¬ββββββ βββββββ¬ββββββ βββββββ¬ββββββ
β β β
βββββββββββββββββΌββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RETURN INITIAL PREDICTIONS β
β - Top 5 predicted doubts β
β - Pending reviews β
β - Peer insights β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
7.2 Behavioral Signal Flow
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β REAL-TIME SIGNALS β
β β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
β β Mouse β β Scroll β βGesture β β Time β β
β βMovement β β Pattern β βCamera β β On β β
β ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ β
βββββββββΌββββββββββββΌββββββββββββΌββββββββββββΌββββββββββββββββββββββββ
β β β β
βββββββββββββ΄ββββββ¬ββββββ΄ββββββββββββ
βΌ
βββββββββββββββββββββββββ
β BEHAVIORAL AGENT β
β β
β calculate_confusion_ β
β score(signals) β
β β
β Returns: 0.0 - 1.0 β
βββββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β DOUBT PREDICTOR β
β β
β If score > 0.5: β
β Re-predict doubts β
β Trigger interventionβ
β β
βββββββββββββββββββββββββ
7.3 Gesture-to-Action Flow
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CAMERA FRAME β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MEDIAPIPE PROCESSING β
β β
β ββββββββββββββββββββββββ ββββββββββββββββββββββββ β
β β Hand Landmark β β Face Mesh β β
β β Detection β β (468 points) β β
β β (21 points) β β β β
β ββββββββββββ¬ββββββββββ ββββββββββββ¬ββββββββββββ β
βββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββββββ ββββββββββββββββββββββββ
β GESTURE TEMPLATE β β FACE BLUR β
β MATCHING β β (Privacy) β
β β β β
β Compare landmarks β β Blur regions with β
β to known gestures β β facial keypoints β
ββββββββββββ¬ββββββββββ βββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββ
β GESTURE RECOGNIZED ββββΆ Backend /api/gesture/recognize
β β
β { β
β "gesture": "pinch",β
β "confidence": 0.92β
β } β
ββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββ
β GESTURE ACTION MAPPER β
β β
β pinch βββββββββββββββΆβ TRIGGER_AI_HELP
β swipe_right βββββββββΆβ LAUNCH_BROWSER_CHAT
β open_palm βββββββββββΆβ PAUSE_SESSION
β thumbs_up βββββββββββΆβ MARK_UNDERSTOOD
ββββββββββββββββββββββββ
8. API Design
8.1 API Structure
| Category | Endpoints |
|---|---|
| Session | /session/start, /session/update, /session/end, /session/insights |
| Prediction | /predict/doubts, /recommendations |
| Behavior | /behavior/track, /behavior/heatmap |
| Graph | /graph/add, /graph/query, /graph/path |
| Review | /review/due, /review/complete, /review/stats |
| Peer | /peer/insights, /peer/doubts, /peer/trending |
| Gesture | /gesture/list, /gesture/recognize, /gesture/training/* |
| LLM | /llm/query, /llm/gesture-action, /llm/rl/* |
8.2 Session API
# POST /api/session/start
{
"user_id": "student123",
"topic": "Machine Learning",
"subtopic": "Neural Networks"
}
# Response
{
"session_id": "session_1699999999.123",
"topic": "Machine Learning",
"predictions": [
{
"doubt": "how_overfitting_works",
"confidence": 0.85,
"explanation": "Student showing signs of confusion...",
"priority": 1
}
],
"pending_reviews": 5,
"peer_insights_count": 3
}
8.3 Doubt Prediction API
# POST /api/predict/doubts
{
"context": {
"topic": "Neural Networks",
"progress": 0.5,
"confusion_signals": 0.7
}
}
# Response
{
"predictions": [
{
"doubt": "how_overfitting_works",
"confidence": 0.85,
"explanation": "...",
"priority": 1,
"estimated_time": "10 min",
"prerequisites": ["regularization", "bias-variance"]
}
]
}
9. Multi-Modal Detection
9.1 Supported Modalities
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MULTI-MODAL FUSION β
β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Audio β β Biometric β β Behavioral β β
β β β β β β β β
β β Speech rate β β Heart rate β β Mouse moves β β
β β Hesitations β β GSR β β Scroll β β
β β Pauses β β Eye trackingβ β Key presses β β
β ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ β
β β β β β
β ββββββββββββββββββΌβββββββββββββββββ β
β βΌ β
β βββββββββββββββββββββββββββ β
β β WEIGHTED FUSION β β
β β β β
β β audio_weight: 0.2 β β
β β biometric_weight: 0.3 β β
β β behavioral_weight: 0.5 β β
β βββββββββββββ¬ββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββ β
β β UNIFIED CONFUSION β β
β β SCORE β β
β β 0.0 - 1.0 β β
β βββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
9.2 Feature Extraction by Modality
Audio (7 features):
- Speech rate (WPM)
- Pause frequency
- Pause duration
- Pitch variation
- Volume level
- Hesitation count
- Question markers
Biometric (6 features):
- Heart rate (BPM)
- Heart rate variability
- Skin conductance (GSR)
- Skin temperature
- Eye blink rate
- Eye open duration
Behavioral (8 features):
- Mouse hesitation
- Scroll reversals
- Time on page
- Click frequency
- Back button usage
- Tab switches
- Copy attempts
- Search usage
10. Privacy & Security
10.1 Face Blur Implementation
class FaceBlurProcessor:
def __init__(self):
self.face_mesh = mp_face_mesh.FaceMesh(
static_image_mode=False,
max_num_faces=1,
refine_landmarks=True
)
def blur_face(self, frame):
# Detect face landmarks
results = self.face_mesh.process(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
if results.multi_face_landmarks:
# Get face region
face_region = self._get_face_region(frame, results)
# Apply Gaussian blur
blurred = cv2.GaussianBlur(face_region, (51, 51), 0)
# Replace face region
frame = self._replace_region(frame, blurred, results)
return frame
10.2 Data Privacy
| Data Type | Storage | Privacy |
|---|---|---|
| Video frames | None | Processed in-memory only |
| Face images | None | Auto-blurred |
| Hand landmarks | Optional | Anonymized |
| Session data | Local JSON | User-owned |
| Model weights | HuggingFace | Open |
11. Deployment Architecture
11.1 Development Setup
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DEVELOPMENT β
β β
β Terminal 1: Terminal 2: β
β βββββββββββββββββββ βββββββββββββββββββ β
β β cd backend β β cd frontend β β
β β python run.py β β npm run dev β β
β β β β β β
β β Flask :5001 β β Vite :5173 β β
β ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ β
βββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββββ
β β
β βββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BROWSER (localhost) β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Frontend (:5173) <βββββββ Proxy βββββββ> Backend (:5001)β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
11.2 Production Setup
βββββββββββββββββββ
β Load Balancer β
ββββββββββ¬βββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
β Flask Worker β β Flask Worker β β Flask Worker β
β (:5001) β β (:5001) β β (:5001) β
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
β β β
ββββββββββββββββββββββΌβββββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Redis Cache β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β PostgreSQL β
βββββββββββββββββββ
11.3 HuggingFace Model Hosting
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HuggingFace Hub β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β namish10/contextflow-rl β β
β β β β
β β checkpoint.pkl β Trained RL model β β
β β train_rl.py β Training script β β
β β feature_extractor.py β State extraction β β
β β online_learning.py β Continuous learning β β
β β data_collector.py β Real data collection β β
β β multimodal_detection.py β Audio/biometric fusion β β
β β demo.ipynb β Interactive demo β β
β β RESEARCH_PAPER.md β Full documentation β β
β β β β
β β app/ (9 agents + API) β β
β β frontend/ (React UI) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Summary
ContextFlow is a comprehensive system combining:
- Predictive AI - RL-based doubt prediction before confusion occurs
- Multi-Agent Architecture - 9 specialized agents coordinated by orchestrator
- Gesture Recognition - Privacy-first MediaPipe hand detection
- Multi-Modal Sensing - Audio + Biometric + Behavioral fusion
- Browser-Based AI - Direct AI chat launching without API keys
- Continuous Learning - Online learning from user feedback
The system is production-ready with all 9 API endpoints working, complete agent network, and trained RL model available on HuggingFace.