ContextFlow RL Training Guide
This guide explains how to train the RL model and upload it to Hugging Face.
Quick Start
1. Install Dependencies
cd research-app/backend
pip install torch numpy pickle
pip install huggingface_hub # For uploading
2. Generate Training Data & Train
python train_rl.py --mode train --epochs 10 --samples 1000
3. Upload to Hugging Face
python train_rl.py --mode upload --hf_token YOUR_TOKEN --repo_name your-username/contextflow-rl
4. Or Do Both at Once
python train_rl.py --mode full --epochs 10 --hf_token YOUR_TOKEN --repo_name your-username/contextflow-rl
Training Options
| Parameter | Description | Default |
|---|---|---|
--epochs |
Number of training epochs | 10 |
--samples |
Number of training samples to generate | 1000 |
--batch_size |
Training batch size | 32 |
--checkpoint_path |
Path to save/load checkpoint | checkpoint.pkl |
Model Architecture
The RL model uses:
- Q-Network: 3-layer neural network (64 → 128 → 128 → 10)
- State Dimension: 64 features
- Action Dimension: 10 doubt prediction actions
- Training Algorithm: GRPO (Group Relative Policy Optimization)
Hugging Face Upload
After training, the model is uploaded as:
- Repository:
your-username/contextflow-rl - Files:
checkpoint.pkl- Model weightsREADME.md- Model documentationtraining_stats.json- Training history
Using the Model
import pickle
# Load checkpoint
with open("checkpoint.pkl", "rb") as f:
checkpoint = pickle.load(f)
print(f"Policy version: {checkpoint.policy_version}")
print(f"Training samples: {checkpoint.training_stats['total_samples']}")
Citation
@software{contextflow_rl,
title={ContextFlow RL Doubt Predictor},
author={ContextFlow Team},
year={2026},
url={https://github.com/contextflow/research-app}
}