Spaces:

hirann
/

immunoorg-v3

Paused

App Files Files Community

immunoorg-v3 / scripts /hpc /README.md

hirann

Upload scripts/hpc/README.md with huggingface_hub

119ef98 verified 13 days ago

preview code

raw

history blame contribute delete

3.92 kB

HPC training pipeline

Turnkey 4-stage pipeline for running real GRPO training (with optional SFT warm-start) on an HPC cluster, then auto-pushing every artifact (model, dataset, evidence PNGs, raw logs) to the Hugging Face Hub.

Layout

scripts/hpc/
├── HANDOFF.md            ← copy-paste-ready instructions for the cluster operator
├── README.md             ← this file (reference for repo viewers)
├── setup_env.sh          ← login-node env setup (uv, no conda, no sudo)
├── run_all.sh            ← submit all 4 stages with SLURM dependencies
├── pipeline/
│   ├── 00_generate_datasets.py   stage 0 — datasets (CPU)
│   ├── 01_sft_warmstart.py       stage 1 — SFT warm-start (1 GPU)
│   ├── 02_grpo_train.py          stage 2 — GRPO training (1+ GPUs)
│   ├── 03_evaluate.py            stage 3 — eval baseline vs trained (1 GPU)
│   └── 04_push_artifacts.py      stage 4 — push to HF Hub (CPU)
└── slurm/
    ├── 00_datasets.sbatch
    ├── 01_sft.sbatch
    ├── 02_grpo.sbatch
    ├── 03_eval.sbatch
    └── 04_push.sbatch

Quick reference

# One-time on login node
bash scripts/hpc/setup_env.sh

# Submit everything (returns immediately with 5 job IDs)
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
bash scripts/hpc/run_all.sh

# Watch
squeue -u $USER
tail -f logs/stage*-*.out

# Common overrides
bash scripts/hpc/run_all.sh --multigpu 4                    # data-parallel GRPO
bash scripts/hpc/run_all.sh --skip-sft                      # GRPO only
bash scripts/hpc/run_all.sh --partition gpu-a100            # custom partition
export IMMUNOORG_MODEL=Qwen/Qwen2.5-14B-Instruct            # bigger model
export HF_PUSH_REPO=your-org/your-defender-repo             # custom destination

What the pipeline guarantees

Stage 0 → 1 → 2 → 3 → 4 runs in order via --dependency=afterok:. If any stage fails, downstream stages are cancelled automatically.
Idempotent: re-running stages 1-4 reuses cached datasets / adapters.
Falls back gracefully: no Unsloth → plain HF; no flash-attn → SDPA; no GPU → CPU mode (slow but works).
Auto-detects multi-GPU: if SLURM gives stage 2 more than 1 GPU, it launches via accelerate launch for data-parallel GRPO.

Reading the artifacts after a run

When stage 4 finishes:

Artifact	Where to find it
Trained LoRA adapter	`https://huggingface.co/$HF_PUSH_REPO`
`evidence_grpo_training.png`	committed to the model repo, also in repo root on the cluster
`evidence_eval_*.png`	same
`evaluation_results.json`	model repo, full per-policy / per-family numbers
`grpo_log.json`	model repo, raw `trainer.state.log_history`
Training datasets	`https://huggingface.co/datasets/$HF_DATASET_REPO`

To pull just the PNGs back to your dev machine:

huggingface-cli download $HF_PUSH_REPO evidence_grpo_training.png \
    evidence_eval_per_family.png evidence_eval_summary.png \
    --local-dir . --local-dir-use-symlinks False

git add evidence_*.png
git commit -m "evidence: add real GRPO + eval curves from HPC run"
git push origin master

That commit is what closes the "Improvement in Rewards" judging gap.

Default training config (overridable per-stage)

Stage	Model	Dataset	Time on 1× A100
0 datasets	(CPU only)	builds 1700+ scenarios + 200 trajectories	~25 min
1 SFT	Qwen2.5-7B + LoRA r=16	sft_warmstart.jsonl.gz (~2000 samples)	~25 min
2 GRPO	Qwen2.5-7B + LoRA (warm-started)	500 elite-mix prompts × 4 generations	~90-120 min
3 eval	trained adapter	100 episodes/family × 3 policies	~30 min
4 push	(CPU only)	adapter + 6+ PNGs + raw logs	~10 min

Total: ~3-4 hours on a single A100. ~1-1.5 hours with --multigpu 4.