contextflow-rl / TRAINING.md
namish10's picture
Upload TRAINING.md with huggingface_hub
53e005b verified

ContextFlow RL Training Guide

This guide explains how to train the RL model and upload it to Hugging Face.

Quick Start

1. Install Dependencies

cd research-app/backend
pip install torch numpy pickle
pip install huggingface_hub  # For uploading

2. Generate Training Data & Train

python train_rl.py --mode train --epochs 10 --samples 1000

3. Upload to Hugging Face

python train_rl.py --mode upload --hf_token YOUR_TOKEN --repo_name your-username/contextflow-rl

4. Or Do Both at Once

python train_rl.py --mode full --epochs 10 --hf_token YOUR_TOKEN --repo_name your-username/contextflow-rl

Training Options

Parameter Description Default
--epochs Number of training epochs 10
--samples Number of training samples to generate 1000
--batch_size Training batch size 32
--checkpoint_path Path to save/load checkpoint checkpoint.pkl

Model Architecture

The RL model uses:

  • Q-Network: 3-layer neural network (64 → 128 → 128 → 10)
  • State Dimension: 64 features
  • Action Dimension: 10 doubt prediction actions
  • Training Algorithm: GRPO (Group Relative Policy Optimization)

Hugging Face Upload

After training, the model is uploaded as:

  • Repository: your-username/contextflow-rl
  • Files:
    • checkpoint.pkl - Model weights
    • README.md - Model documentation
    • training_stats.json - Training history

Using the Model

import pickle

# Load checkpoint
with open("checkpoint.pkl", "rb") as f:
    checkpoint = pickle.load(f)

print(f"Policy version: {checkpoint.policy_version}")
print(f"Training samples: {checkpoint.training_stats['total_samples']}")

Citation

@software{contextflow_rl,
  title={ContextFlow RL Doubt Predictor},
  author={ContextFlow Team},
  year={2026},
  url={https://github.com/contextflow/research-app}
}