BioRLHF Training on Cayuga HPC
Cluster: Cornell Cayuga HPC Target: GPU training with Mistral-7B + LoRA (SFT, DPO, GRPO)
Quick Start
# 1. SSH to Cayuga
ssh jak4013@cayuga-login1
# 2. Submit a GRPO training job
bash -l -c 'sbatch scripts/run_grpo_full.sh'
# 3. Monitor
squeue -u $USER
tail -f logs/grpo_full_*.log
Step 1: Transfer Files to HPC
From your local Mac:
rsync -avz --progress \
/Users/jak4013/Dropbox/Bioinformatics/Claude/BioRLHF/biorlhf/ \
jak4013@cayuga-login1:/athena/cayuga_0003/scratch/users/jak4013/otsuka/training/BioRLHF/
Step 2: Set Up Conda Environment (First Time Only)
# SSH to Cayuga
ssh jak4013@cayuga-login1
# Source conda (non-interactive shell requires explicit sourcing)
. /home/fs01/jak4013/miniconda3/miniconda3/etc/profile.d/conda.sh
# Create environment
conda create -n biorlhf python=3.10 -y
conda activate biorlhf
# Install PyTorch with CUDA support
conda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidia -y
# Install training dependencies
pip install transformers>=4.36.0 peft>=0.6.0 trl>=0.14.0
pip install bitsandbytes>=0.41.0 accelerate>=0.24.0 datasets>=2.14.0
pip install wandb scipy scikit-learn sentencepiece jsonlines
# Verify GPU access (on a GPU node)
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"
Step 3: Training Options
Option A: GRPO Training (Recommended)
GRPO with verifier-based multi-reward training from an SFT checkpoint:
# Submit via SLURM (use login shell for correct sbatch version)
bash -l -c 'sbatch scripts/run_grpo_full.sh'
Key config (configs/grpo_full_v2.json):
- G=16 generations per prompt
- V1-V4 verifiers with weights [0.35, 0.30, 0.15, 0.20]
- beta=0.02, 2 iterations per batch
- ~48h on A40
Option B: SFT Training
# Interactive session
srun -p scu-gpu --gres=gpu:1 --mem=48G -c 8 --time=4:00:00 --account=cayuga_0003 --pty bash
# Activate environment
. /home/fs01/jak4013/miniconda3/miniconda3/etc/profile.d/conda.sh
conda activate biorlhf
# Run SFT
cd /athena/cayuga_0003/scratch/users/jak4013/otsuka/training/BioRLHF
biorlhf-train --model mistralai/Mistral-7B-v0.3 --dataset data/kmp_sft_final.json --output ./my_sft_model
Option C: Interactive GPU Session
# Request GPU
srun -p scu-gpu --gres=gpu:1 --mem=48G -c 8 --time=4:00:00 --account=cayuga_0003 --pty bash
# Activate environment
. /home/fs01/jak4013/miniconda3/miniconda3/etc/profile.d/conda.sh
conda activate biorlhf
# Navigate and run
cd /athena/cayuga_0003/scratch/users/jak4013/otsuka/training/BioRLHF
biorlhf-grpo --config configs/grpo_full_v2.json
Step 4: Monitor Training
# Check job status
squeue -u $USER
# Tail logs
tail -f logs/grpo_full_*.log
# GPU usage (on compute node)
nvidia-smi
# WandB dashboard
# https://wandb.ai/jangkeun-weill-cornell-medicine/biogrpo
Environment Details
| Component | Version |
|---|---|
| Python | 3.10 |
| PyTorch | 2.5.1+cu121 |
| Transformers | 4.57.3 |
| TRL | 0.26.2 |
| PEFT | 0.18.0 |
GPU Options on Cayuga
| GPU | VRAM | Best For | SLURM Flag |
|---|---|---|---|
| A40 | 48GB | Standard GRPO/SFT with QLoRA | --gres=gpu:1 |
| A100 | 80GB | Larger batches, faster training | --gres=gpu:a100:1 |
Important Notes
SLURM Version
The default sbatch at /usr/bin/sbatch is outdated (v22.05.2). Use bash -l -c 'sbatch ...' to get the correct version (slurm/25.05.0) loaded via module.
Conda in Non-Interactive Shells
source ~/.bashrc does not work in non-interactive SSH. Always source conda directly:
. /home/fs01/jak4013/miniconda3/miniconda3/etc/profile.d/conda.sh
conda activate biorlhf
SFT Checkpoint Symlink
The SFT model adapter is stored at:
/athena/cayuga_0003/scratch/users/jak4013/otsuka/training/biorlhf/kmp_sft_model_final
GRPO scripts auto-symlink this into the working directory.
Batch Size with G=16
Both per_device_eval_batch_size and generation_batch_size must be divisible by num_generations. The TRL parameter is generation_batch_size, NOT per_device_generation_batch_size.
Eval Performance
GRPOTrainer's eval loop generates completions sequentially (~3 min/sample). With 107 eval samples, each eval pass takes ~5.3h. Set eval_steps=9999 to skip in-training eval; run post-hoc evaluation instead.
Troubleshooting
"CUDA out of memory"
Reduce batch size or gradient accumulation in the config JSON:
{
"batch_size": 1,
"gradient_accumulation_steps": 16
}
"No GPU available"
nvidia-smi # Check GPU allocation
squeue -u $USER # Verify you're on a GPU node
LoRA adapter loading fails
The SFT checkpoint is a LoRA adapter, not a full model. Load base model first:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.3")
model = PeftModel.from_pretrained(base, "path/to/kmp_sft_model_final")
model = model.merge_and_unload() # Merge for GRPO training
Key Paths
| Path | Description |
|---|---|
/athena/cayuga_0003/scratch/users/jak4013/otsuka/training/BioRLHF/ |
Working directory |
/athena/cayuga_0003/scratch/users/jak4013/otsuka/training/biorlhf/kmp_sft_model_final |
SFT checkpoint |
/athena/cayuga_0003/scratch/users/jak4013/otsuka/data/ |
Data directory |
/home/fs01/jak4013/miniconda3/miniconda3/etc/profile.d/conda.sh |
Conda init script |