Spaces:

Humanlearning
/

Cyber_analyst-round1

Sleeping

App Files Files Community

Humanlearning commited on 12 days ago

Commit

7d32451

1 Parent(s): 632c145

feat: update README with GPU-utilization tuning instructions, enhance modal training script with run name parameter, and modify GRPO configuration for trace logging and vLLM settings

Browse files

Files changed (3) hide show

README.md +40 -0
scripts/modal_train_grpo.py +2 -1
training/configs/grpo_small.yaml +3 -0

README.md CHANGED Viewed

@@ -267,6 +267,46 @@ uv run --extra modal modal run scripts/modal_train_grpo.py \
   --difficulty 0
 ```
 If running from a public repository and you do not want Modal to package the
 local workspace, use public source mode:

   --difficulty 0
 ```
+For GPU-utilization tuning on the same single L4, start with a larger but still
+bounded no-code trial:
+```bash
+uv run --extra modal modal run scripts/modal_train_grpo.py \
+  --max-steps 30 \
+  --dataset-size 64 \
+  --num-generations 8 \
+  --max-completion-length 256 \
+  --difficulty 0
+```
+The launcher exposes GRPO throughput knobs for follow-up trials:
+```bash
+# larger generation group, no vLLM
+uv run --extra modal modal run scripts/modal_train_grpo.py \
+  --max-steps 30 --dataset-size 64 --num-generations 8 \
+  --max-completion-length 256 --trace-log-every 5
+# vLLM colocate on the same L4
+uv run --extra modal modal run scripts/modal_train_grpo.py \
+  --max-steps 30 --dataset-size 64 --num-generations 8 \
+  --max-completion-length 256 --use-vllm \
+  --vllm-gpu-memory-utilization 0.35 --trace-log-every 5
+# larger microbatch if the vLLM trial does not OOM
+uv run --extra modal modal run scripts/modal_train_grpo.py \
+  --max-steps 30 --dataset-size 64 --num-generations 8 \
+  --per-device-train-batch-size 2 --gradient-accumulation-steps 4 \
+  --max-completion-length 256 --use-vllm \
+  --vllm-gpu-memory-utilization 0.45 --trace-log-every 5
+```
+`per_device_train_batch_size * gradient_accumulation_steps * world_size` must
+be divisible by `num_generations`; the launcher validates this before the GPU
+container starts. Scalar Trackio metrics still log every reward callback, while
+sample trace tables and Trace objects are throttled by `--trace-log-every`
+(`1` restores every-callback logging, `0` disables trace artifacts).
 If running from a public repository and you do not want Modal to package the
 local workspace, use public source mode:

scripts/modal_train_grpo.py CHANGED Viewed

@@ -1477,6 +1477,7 @@ def main(
     trace_log_every: int = 5,
     seed_start: int = 0,
     git_sha: str = "nogit",
     source_mode: str = "local",
     repo_url: str = PUBLIC_REPO_URL,
     repo_branch: str = PUBLIC_REPO_BRANCH,
@@ -1565,7 +1566,7 @@ def main(
     model_slug = model_name.replace("/", "-")
     local_stamp = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
-    run_name = (
         f"CyberSecurity_OWASP-{model_slug}-grpo-level{difficulty}-"
         f"{local_stamp}-{git_sha[:8]}"
     )

     trace_log_every: int = 5,
     seed_start: int = 0,
     git_sha: str = "nogit",
+    run_name: str = "",
     source_mode: str = "local",
     repo_url: str = PUBLIC_REPO_URL,
     repo_branch: str = PUBLIC_REPO_BRANCH,
     model_slug = model_name.replace("/", "-")
     local_stamp = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
+    run_name = run_name or (
         f"CyberSecurity_OWASP-{model_slug}-grpo-level{difficulty}-"
         f"{local_stamp}-{git_sha[:8]}"
     )

training/configs/grpo_small.yaml CHANGED Viewed

@@ -6,6 +6,9 @@ episodes: 10
 num_generations: 6
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 32
 learning_rate: 0.000005
 report_to: trackio
 trackio_space_id: Humanlearning/CyberSecurity_OWASP-trackio

 num_generations: 6
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 32
+use_vllm: false
+vllm_gpu_memory_utilization: 0.2
+trace_log_every: 5
 learning_rate: 0.000005
 report_to: trackio
 trackio_space_id: Humanlearning/CyberSecurity_OWASP-trackio