ContextFlow RL Training Guide

This guide explains how to train the RL model and upload it to Hugging Face.

Quick Start

1. Install Dependencies

cd research-app/backend
pip install torch numpy pickle
pip install huggingface_hub  # For uploading

2. Generate Training Data & Train

python train_rl.py --mode train --epochs 10 --samples 1000

3. Upload to Hugging Face

python train_rl.py --mode upload --hf_token YOUR_TOKEN --repo_name your-username/contextflow-rl

4. Or Do Both at Once

python train_rl.py --mode full --epochs 10 --hf_token YOUR_TOKEN --repo_name your-username/contextflow-rl

Training Options

Parameter	Description	Default
`--epochs`	Number of training epochs	10
`--samples`	Number of training samples to generate	1000
`--batch_size`	Training batch size	32
`--checkpoint_path`	Path to save/load checkpoint	checkpoint.pkl

Model Architecture

The RL model uses:

Q-Network: 3-layer neural network (64 → 128 → 128 → 10)
State Dimension: 64 features
Action Dimension: 10 doubt prediction actions
Training Algorithm: GRPO (Group Relative Policy Optimization)

Hugging Face Upload

After training, the model is uploaded as:

Repository: your-username/contextflow-rl
Files:
- checkpoint.pkl - Model weights
- README.md - Model documentation
- training_stats.json - Training history

Using the Model

import pickle

# Load checkpoint
with open("checkpoint.pkl", "rb") as f:
    checkpoint = pickle.load(f)

print(f"Policy version: {checkpoint.policy_version}")
print(f"Training samples: {checkpoint.training_stats['total_samples']}")

Citation

@software{contextflow_rl,
  title={ContextFlow RL Doubt Predictor},
  author={ContextFlow Team},
  year={2026},
  url={https://github.com/contextflow/research-app}
}