Spaces:
Sleeping
Sleeping
Commit ·
7d32451
1
Parent(s): 632c145
feat: update README with GPU-utilization tuning instructions, enhance modal training script with run name parameter, and modify GRPO configuration for trace logging and vLLM settings
Browse files- README.md +40 -0
- scripts/modal_train_grpo.py +2 -1
- training/configs/grpo_small.yaml +3 -0
README.md
CHANGED
|
@@ -267,6 +267,46 @@ uv run --extra modal modal run scripts/modal_train_grpo.py \
|
|
| 267 |
--difficulty 0
|
| 268 |
```
|
| 269 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 270 |
If running from a public repository and you do not want Modal to package the
|
| 271 |
local workspace, use public source mode:
|
| 272 |
|
|
|
|
| 267 |
--difficulty 0
|
| 268 |
```
|
| 269 |
|
| 270 |
+
For GPU-utilization tuning on the same single L4, start with a larger but still
|
| 271 |
+
bounded no-code trial:
|
| 272 |
+
|
| 273 |
+
```bash
|
| 274 |
+
uv run --extra modal modal run scripts/modal_train_grpo.py \
|
| 275 |
+
--max-steps 30 \
|
| 276 |
+
--dataset-size 64 \
|
| 277 |
+
--num-generations 8 \
|
| 278 |
+
--max-completion-length 256 \
|
| 279 |
+
--difficulty 0
|
| 280 |
+
```
|
| 281 |
+
|
| 282 |
+
The launcher exposes GRPO throughput knobs for follow-up trials:
|
| 283 |
+
|
| 284 |
+
```bash
|
| 285 |
+
# larger generation group, no vLLM
|
| 286 |
+
uv run --extra modal modal run scripts/modal_train_grpo.py \
|
| 287 |
+
--max-steps 30 --dataset-size 64 --num-generations 8 \
|
| 288 |
+
--max-completion-length 256 --trace-log-every 5
|
| 289 |
+
|
| 290 |
+
# vLLM colocate on the same L4
|
| 291 |
+
uv run --extra modal modal run scripts/modal_train_grpo.py \
|
| 292 |
+
--max-steps 30 --dataset-size 64 --num-generations 8 \
|
| 293 |
+
--max-completion-length 256 --use-vllm \
|
| 294 |
+
--vllm-gpu-memory-utilization 0.35 --trace-log-every 5
|
| 295 |
+
|
| 296 |
+
# larger microbatch if the vLLM trial does not OOM
|
| 297 |
+
uv run --extra modal modal run scripts/modal_train_grpo.py \
|
| 298 |
+
--max-steps 30 --dataset-size 64 --num-generations 8 \
|
| 299 |
+
--per-device-train-batch-size 2 --gradient-accumulation-steps 4 \
|
| 300 |
+
--max-completion-length 256 --use-vllm \
|
| 301 |
+
--vllm-gpu-memory-utilization 0.45 --trace-log-every 5
|
| 302 |
+
```
|
| 303 |
+
|
| 304 |
+
`per_device_train_batch_size * gradient_accumulation_steps * world_size` must
|
| 305 |
+
be divisible by `num_generations`; the launcher validates this before the GPU
|
| 306 |
+
container starts. Scalar Trackio metrics still log every reward callback, while
|
| 307 |
+
sample trace tables and Trace objects are throttled by `--trace-log-every`
|
| 308 |
+
(`1` restores every-callback logging, `0` disables trace artifacts).
|
| 309 |
+
|
| 310 |
If running from a public repository and you do not want Modal to package the
|
| 311 |
local workspace, use public source mode:
|
| 312 |
|
scripts/modal_train_grpo.py
CHANGED
|
@@ -1477,6 +1477,7 @@ def main(
|
|
| 1477 |
trace_log_every: int = 5,
|
| 1478 |
seed_start: int = 0,
|
| 1479 |
git_sha: str = "nogit",
|
|
|
|
| 1480 |
source_mode: str = "local",
|
| 1481 |
repo_url: str = PUBLIC_REPO_URL,
|
| 1482 |
repo_branch: str = PUBLIC_REPO_BRANCH,
|
|
@@ -1565,7 +1566,7 @@ def main(
|
|
| 1565 |
|
| 1566 |
model_slug = model_name.replace("/", "-")
|
| 1567 |
local_stamp = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
|
| 1568 |
-
run_name = (
|
| 1569 |
f"CyberSecurity_OWASP-{model_slug}-grpo-level{difficulty}-"
|
| 1570 |
f"{local_stamp}-{git_sha[:8]}"
|
| 1571 |
)
|
|
|
|
| 1477 |
trace_log_every: int = 5,
|
| 1478 |
seed_start: int = 0,
|
| 1479 |
git_sha: str = "nogit",
|
| 1480 |
+
run_name: str = "",
|
| 1481 |
source_mode: str = "local",
|
| 1482 |
repo_url: str = PUBLIC_REPO_URL,
|
| 1483 |
repo_branch: str = PUBLIC_REPO_BRANCH,
|
|
|
|
| 1566 |
|
| 1567 |
model_slug = model_name.replace("/", "-")
|
| 1568 |
local_stamp = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
|
| 1569 |
+
run_name = run_name or (
|
| 1570 |
f"CyberSecurity_OWASP-{model_slug}-grpo-level{difficulty}-"
|
| 1571 |
f"{local_stamp}-{git_sha[:8]}"
|
| 1572 |
)
|
training/configs/grpo_small.yaml
CHANGED
|
@@ -6,6 +6,9 @@ episodes: 10
|
|
| 6 |
num_generations: 6
|
| 7 |
per_device_train_batch_size: 1
|
| 8 |
gradient_accumulation_steps: 32
|
|
|
|
|
|
|
|
|
|
| 9 |
learning_rate: 0.000005
|
| 10 |
report_to: trackio
|
| 11 |
trackio_space_id: Humanlearning/CyberSecurity_OWASP-trackio
|
|
|
|
| 6 |
num_generations: 6
|
| 7 |
per_device_train_batch_size: 1
|
| 8 |
gradient_accumulation_steps: 32
|
| 9 |
+
use_vllm: false
|
| 10 |
+
vllm_gpu_memory_utilization: 0.2
|
| 11 |
+
trace_log_every: 5
|
| 12 |
learning_rate: 0.000005
|
| 13 |
report_to: trackio
|
| 14 |
trackio_space_id: Humanlearning/CyberSecurity_OWASP-trackio
|