Spaces:

Siddeshwar1625
/

OSINT

Paused

App Files Files Community

ritishshrirao commited on 12 days ago

Commit

e44cdee

1 Parent(s): d822755

Update training config, add checkpointing on HF

Browse files

Files changed (7) hide show

README.md +15 -11
config/self_play_training_hf_l40s_full.json +97 -0
docs/adversarial_self_play.md +3 -1
scripts/space_start.sh +5 -2
src/osint_env/training/hf_jobs.py +2 -2
src/osint_env/training/rewards.py +1 -1
src/osint_env/training/self_play.py +150 -0

README.md CHANGED Viewed

@@ -180,7 +180,7 @@ For a standalone Linux server or SSH box, there is also a wrapper script that ac
 VENV_PATH="$HOME/arl" \
 INSTALL_TRAIN_DEPS=1 \
 TRAIN_ENV_CONFIG_PATH="config/shared_config.json" \
-TRAIN_SELF_PLAY_CONFIG_PATH="config/self_play_training_hf_a10g_smoke.json" \
 TRAIN_SELF_PLAY_OUTPUT_DIR="artifacts/self_play_server" \
 bash scripts/train_self_play_standalone.sh
 ```
@@ -194,12 +194,12 @@ Useful overrides for the standalone script:
 The training config also supports `"model_topology": "dual"|"shared"`, `"phase_schedule": "generator_answerer"|"answerer_generator_answerer"`, `"tuning_mode": "full"|"lora"`, and `"canonical_graph_mode": "generate"|"fixed"` so you can switch between two-model vs single-model self-play, full fine-tuning vs LoRA adapters, and whether canonical graph structure is generated each round or kept fixed while training question/answer behavior.
-### Hugging Face Job A10G Run (Separate From The Space)
-For a short verification run (enough to confirm W&B logging before scaling up), use:
 ```bash
-osint-env train-self-play --config config/shared_config.json --train-config config/self_play_training_hf_a10g_smoke.json
 ```
 This config:
@@ -207,8 +207,8 @@ This config:
 - uses `Qwen/Qwen2.5-0.5B-Instruct`
 - enables W&B reporting (`wandb_enabled: true`)
 - uses `pipeline_mode: "swarm_v2"` with `canonical_graph_mode: "fixed"` to keep canonical graph candidates stable while training question/answer behavior
-- keeps training intentionally short (`rounds=2`, `max_steps=50` per phase)
-- uses full fine-tuning plus fused AdamW, bf16/tf32, larger generation batches, and extra dataloader workers to make better use of an A10G
 To enable canonical graph generation during swarm_v2 training, switch `"canonical_graph_mode"` to `"generate"` in the training config.
@@ -220,25 +220,29 @@ osint-env-launch-hf-job \
   --job-image "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel" \
   --repo-url "https://github.com/your-org/meta-knowledge-graph.git" \
   --repo-ref "main" \
-  --flavor "a10g-small" \
   --env-config "config/shared_config.json" \
-  --train-config "config/self_play_training_hf_a10g_smoke.json" \
   --output-bucket "your-hf-bucket" \
   --wait
 ```
-The launcher talks to the Hugging Face Jobs API through `huggingface_hub`, so the Space can remain on CPU while the training job runs on separate A10G compute.
 Optional Space startup wiring still exists if you want it:
 1. Keep the Space on CPU if it is serving inference/UI only.
-2. Set `RUN_SELF_PLAY_TRAINING=1` only if you intentionally want startup-time training inside the Space container.
 3. Optional overrides:
-   - `TRAIN_SELF_PLAY_CONFIG_PATH` (default: `config/self_play_training_hf_a10g_smoke.json`)
    - `TRAIN_ENV_CONFIG_PATH` (default: `config/shared_config.json`)
    - `TRAIN_SELF_PLAY_OUTPUT_DIR` to override where artifacts land
    - `RUN_SELF_PLAY_DRY_RUN=1` to test startup wiring without GRPO updates
    - `RUN_SELF_PLAY_BACKGROUND=1` to keep the API up while startup-time training runs
    - `OSINT_TRAIN_STRICT_ASSERTS=1` to fail fast when reward variance, KL, loss, grad norms, or parameter updates stay zero
 W&B run naming is controlled by `wandb_run_name_prefix` and will emit phase-specific runs like `...-r001-generator` and `...-r001-answerer`.

 VENV_PATH="$HOME/arl" \
 INSTALL_TRAIN_DEPS=1 \
 TRAIN_ENV_CONFIG_PATH="config/shared_config.json" \
+TRAIN_SELF_PLAY_CONFIG_PATH="config/self_play_training_hf_l40s_full.json" \
 TRAIN_SELF_PLAY_OUTPUT_DIR="artifacts/self_play_server" \
 bash scripts/train_self_play_standalone.sh
 ```
 The training config also supports `"model_topology": "dual"|"shared"`, `"phase_schedule": "generator_answerer"|"answerer_generator_answerer"`, `"tuning_mode": "full"|"lora"`, and `"canonical_graph_mode": "generate"|"fixed"` so you can switch between two-model vs single-model self-play, full fine-tuning vs LoRA adapters, and whether canonical graph structure is generated each round or kept fixed while training question/answer behavior.
+### Hugging Face Job L40S Run (Separate From The Space)
+For a budgeted full fine-tuning run on `l40s` hardware, use:
 ```bash
+osint-env train-self-play --config config/shared_config.json --train-config config/self_play_training_hf_l40s_full.json
 ```
 This config:
 - uses `Qwen/Qwen2.5-0.5B-Instruct`
 - enables W&B reporting (`wandb_enabled: true`)
 - uses `pipeline_mode: "swarm_v2"` with `canonical_graph_mode: "fixed"` to keep canonical graph candidates stable while training question/answer behavior
+- keeps the VRAM-heavy settings aligned with the smoke config while extending runtime (`rounds=6`, `max_steps=120` per phase)
+- uses full fine-tuning plus fused AdamW, bf16/tf32, larger generation batches, and extra dataloader workers to make better use of an L40S
 To enable canonical graph generation during swarm_v2 training, switch `"canonical_graph_mode"` to `"generate"` in the training config.
   --job-image "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel" \
   --repo-url "https://github.com/your-org/meta-knowledge-graph.git" \
   --repo-ref "main" \
+  --flavor "l40s" \
   --env-config "config/shared_config.json" \
+  --train-config "config/self_play_training_hf_l40s_full.json" \
   --output-bucket "your-hf-bucket" \
   --wait
 ```
+The launcher talks to the Hugging Face Jobs API through `huggingface_hub`, so the Space can remain on CPU while the training job runs on separate L40S compute.
 Optional Space startup wiring still exists if you want it:
 1. Keep the Space on CPU if it is serving inference/UI only.
+2. The startup script now defaults to running self-play on boot with the full L40S config.
 3. Optional overrides:
+   - `RUN_SELF_PLAY_TRAINING=0` to disable startup-time training
+   - `TRAIN_SELF_PLAY_CONFIG_PATH` (default: `config/self_play_training_hf_l40s_full.json`)
    - `TRAIN_ENV_CONFIG_PATH` (default: `config/shared_config.json`)
    - `TRAIN_SELF_PLAY_OUTPUT_DIR` to override where artifacts land
    - `RUN_SELF_PLAY_DRY_RUN=1` to test startup wiring without GRPO updates
    - `RUN_SELF_PLAY_BACKGROUND=1` to keep the API up while startup-time training runs
+   - `OSINT_HF_CHECKPOINT_REPO_ID` to force uploads into a specific HF model repo
+   - `OSINT_HF_CHECKPOINT_REPO_TYPE` to switch repo type (`model` by default)
+   - `OSINT_HF_CHECKPOINT_REPO_PRIVATE=0` to create/update a public checkpoint repo
    - `OSINT_TRAIN_STRICT_ASSERTS=1` to fail fast when reward variance, KL, loss, grad norms, or parameter updates stay zero
 W&B run naming is controlled by `wandb_run_name_prefix` and will emit phase-specific runs like `...-r001-generator` and `...-r001-answerer`.

config/self_play_training_hf_l40s_full.json ADDED Viewed

	@@ -0,0 +1,97 @@

+{
+  "rounds": 6,
+  "output_dir": "artifacts/self_play_hf_l40s_full",
+  "dry_run": false,
+  "wandb_enabled": true,
+  "wandb_project": "osint-self-play-train",
+  "wandb_entity": "",
+  "wandb_run_name_prefix": "qwen25-05b-instruct-l40s-full",
+  "pipeline_mode": "swarm_v2",
+  "canonical_graph_mode": "fixed",
+  "model_topology": "shared",
+  "phase_schedule": "generator_answerer",
+  "tuning_mode": "full",
+  "shared_model_name_or_path": "Qwen/Qwen2.5-0.5B-Instruct",
+  "seed_tasks_per_round": 16,
+  "generated_tasks_per_round": 24,
+  "generator_prompts_per_round": 24,
+  "max_graph_context_nodes": 24,
+  "max_graph_context_edges": 24,
+  "max_support_edges": 6,
+  "answerer_judge_max_new_tokens": 32,
+  "generated_task_max_new_tokens": 640,
+  "post_training_eval_questions": 24,
+  "post_training_eval_answer_max_new_tokens": 128,
+  "generator_phase": {
+    "model_name_or_path": "Qwen/Qwen2.5-0.5B-Instruct",
+    "learning_rate": 5e-06,
+    "max_steps": 120,
+    "per_device_train_batch_size": 4,
+    "gradient_accumulation_steps": 1,
+    "num_generations": 4,
+    "max_completion_length": 384,
+    "max_prompt_length": 768,
+    "generation_batch_size": 16,
+    "temperature": 0.9,
+    "top_p": 0.95,
+    "repetition_penalty": 1.1,
+    "beta": 0.01,
+    "epsilon": 0.2,
+    "num_iterations": 1,
+    "loss_type": "dapo",
+    "scale_rewards": "group",
+    "logging_steps": 5,
+    "save_steps": 30,
+    "save_total_limit": 4,
+    "optim": "adamw_torch_fused",
+    "bf16": true,
+    "tf32": true,
+    "gradient_checkpointing": false,
+    "dataloader_num_workers": 4,
+    "dataloader_persistent_workers": true,
+    "dataloader_prefetch_factor": 4,
+    "output_subdir": "generator_train",
+    "use_vllm": false,
+    "vllm_mode": "colocate"
+  },
+  "answerer_phase": {
+    "model_name_or_path": "Qwen/Qwen2.5-0.5B-Instruct",
+    "learning_rate": 3e-06,
+    "max_steps": 120,
+    "per_device_train_batch_size": 4,
+    "gradient_accumulation_steps": 1,
+    "num_generations": 4,
+    "max_completion_length": 256,
+    "max_prompt_length": 768,
+    "generation_batch_size": 16,
+    "temperature": 0.7,
+    "top_p": 0.95,
+    "repetition_penalty": 1.1,
+    "beta": 0.01,
+    "epsilon": 0.2,
+    "num_iterations": 1,
+    "loss_type": "dapo",
+    "scale_rewards": "group",
+    "logging_steps": 5,
+    "save_steps": 30,
+    "save_total_limit": 4,
+    "optim": "adamw_torch_fused",
+    "bf16": true,
+    "tf32": true,
+    "gradient_checkpointing": false,
+    "dataloader_num_workers": 4,
+    "dataloader_persistent_workers": true,
+    "dataloader_prefetch_factor": 4,
+    "output_subdir": "answerer_train",
+    "use_vllm": false,
+    "vllm_mode": "colocate"
+  },
+  "lora": {
+    "r": 8,
+    "alpha": 16,
+    "dropout": 0.05,
+    "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj"],
+    "bias": "none",
+    "task_type": "CAUSAL_LM"
+  }
+}

docs/adversarial_self_play.md CHANGED Viewed

@@ -122,6 +122,8 @@ Per round and phase you will now find:
 - `self_play_summary.json`: top-level run summary.
 - `post_training_evaluation.json`: generated-question evaluation written after training.
 ## Compute Mode
 When compute is available:
@@ -142,7 +144,7 @@ Example:
 VENV_PATH="$HOME/arl" \
 INSTALL_TRAIN_DEPS=1 \
 TRAIN_ENV_CONFIG_PATH="config/shared_config.json" \
-TRAIN_SELF_PLAY_CONFIG_PATH="config/self_play_training_hf_a10g_smoke.json" \
 TRAIN_SELF_PLAY_OUTPUT_DIR="artifacts/self_play_server" \
 bash scripts/train_self_play_standalone.sh
 ```

 - `self_play_summary.json`: top-level run summary.
 - `post_training_evaluation.json`: generated-question evaluation written after training.
+If `HF_TOKEN` is available, the trainer can also mirror phase folders and summary artifacts to a Hugging Face repo. By default it derives a repo on the same account as the Space using `SPACE_ID`/`HF_SPACE_ID` and a `-checkpoints` suffix. You can override that with `OSINT_HF_CHECKPOINT_REPO_ID`.
 ## Compute Mode
 When compute is available:
 VENV_PATH="$HOME/arl" \
 INSTALL_TRAIN_DEPS=1 \
 TRAIN_ENV_CONFIG_PATH="config/shared_config.json" \
+TRAIN_SELF_PLAY_CONFIG_PATH="config/self_play_training_hf_l40s_full.json" \
 TRAIN_SELF_PLAY_OUTPUT_DIR="artifacts/self_play_server" \
 bash scripts/train_self_play_standalone.sh
 ```

scripts/space_start.sh CHANGED Viewed

@@ -9,9 +9,9 @@ _is_true() {
 }
 ENV_CONFIG_PATH="${TRAIN_ENV_CONFIG_PATH:-config/shared_config.json}"
-TRAIN_CONFIG_PATH="${TRAIN_SELF_PLAY_CONFIG_PATH:-config/self_play_training_hf_a10g_smoke.json}"
 TRAIN_OUTPUT_DIR="${TRAIN_SELF_PLAY_OUTPUT_DIR:-}"
-RUN_FLAG="${RUN_SELF_PLAY_TRAINING:-0}"
 DRY_RUN_FLAG="${RUN_SELF_PLAY_DRY_RUN:-0}"
 BACKGROUND_FLAG="${RUN_SELF_PLAY_BACKGROUND:-1}"
@@ -44,6 +44,9 @@ if _is_true "$RUN_FLAG"; then
   if [ -n "${TRAIN_OUTPUT_DIR}" ]; then
     echo "[space_start] Train output dir: ${TRAIN_OUTPUT_DIR}"
   fi
   if _is_true "$BACKGROUND_FLAG"; then
     echo "[space_start] Launching self-play in background so the Space API can stay online."
     _train_self_play &

 }
 ENV_CONFIG_PATH="${TRAIN_ENV_CONFIG_PATH:-config/shared_config.json}"
+TRAIN_CONFIG_PATH="${TRAIN_SELF_PLAY_CONFIG_PATH:-config/self_play_training_hf_l40s_full.json}"
 TRAIN_OUTPUT_DIR="${TRAIN_SELF_PLAY_OUTPUT_DIR:-}"
+RUN_FLAG="${RUN_SELF_PLAY_TRAINING:-1}"
 DRY_RUN_FLAG="${RUN_SELF_PLAY_DRY_RUN:-0}"
 BACKGROUND_FLAG="${RUN_SELF_PLAY_BACKGROUND:-1}"
   if [ -n "${TRAIN_OUTPUT_DIR}" ]; then
     echo "[space_start] Train output dir: ${TRAIN_OUTPUT_DIR}"
   fi
+  if [ -n "${OSINT_HF_CHECKPOINT_REPO_ID:-}" ]; then
+    echo "[space_start] HF checkpoint repo: ${OSINT_HF_CHECKPOINT_REPO_ID}"
+  fi
   if _is_true "$BACKGROUND_FLAG"; then
     echo "[space_start] Launching self-play in background so the Space API can stay online."
     _train_self_play &

src/osint_env/training/hf_jobs.py CHANGED Viewed

@@ -265,10 +265,10 @@ def build_parser() -> argparse.ArgumentParser:
     )
     parser.add_argument(
         "--train-config",
-        default=os.getenv("TRAIN_SELF_PLAY_CONFIG_PATH", "config/self_play_training_hf_a10g_smoke.json"),
         help="Training config path inside the training image or checked-out repo.",
     )
-    parser.add_argument("--flavor", default=os.getenv("HF_JOB_FLAVOR", "a10g-small"))
     parser.add_argument("--timeout", default=os.getenv("HF_JOB_TIMEOUT", "8h"))
     parser.add_argument("--namespace", default=os.getenv("HF_JOB_NAMESPACE", ""))
     parser.add_argument("--run-name", default=os.getenv("HF_JOB_RUN_NAME", "osint-self-play-job"))

     )
     parser.add_argument(
         "--train-config",
+        default=os.getenv("TRAIN_SELF_PLAY_CONFIG_PATH", "config/self_play_training_hf_l40s_full.json"),
         help="Training config path inside the training image or checked-out repo.",
     )
+    parser.add_argument("--flavor", default=os.getenv("HF_JOB_FLAVOR", "l40s"))
     parser.add_argument("--timeout", default=os.getenv("HF_JOB_TIMEOUT", "8h"))
     parser.add_argument("--namespace", default=os.getenv("HF_JOB_NAMESPACE", ""))
     parser.add_argument("--run-name", default=os.getenv("HF_JOB_RUN_NAME", "osint-self-play-job"))

src/osint_env/training/rewards.py CHANGED Viewed

@@ -953,7 +953,7 @@ class GeneratorRewardFunction:
         context_pressure = self._context_pressure_score(validation_result)
         parl_parallel, parl_finish = self._parl_scores(candidate)
         hardness_component = max(0.0, min(1.0, (hardness + 0.4) / 1.4))
-        consistency_component = max(
             0.0,
             min(
                 1.0,

         context_pressure = self._context_pressure_score(validation_result)
         parl_parallel, parl_finish = self._parl_scores(candidate)
         hardness_component = max(0.0, min(1.0, (hardness + 0.4) / 1.4))
+        consistency_component = max(
             0.0,
             min(
                 1.0,

src/osint_env/training/self_play.py CHANGED Viewed

@@ -3,6 +3,7 @@ from __future__ import annotations
 import inspect
 import json
 import os
 from dataclasses import dataclass
 from pathlib import Path
 import random
@@ -46,6 +47,111 @@ class _RoundArtifacts:
     generated_tasks_path: str
 def _require_training_stack() -> tuple[Any, Any, Any]:
     try:
@@ -1489,6 +1595,11 @@ def _run_adversarial_self_play_swarm_v2(
     run_dir = Path(training_config.output_dir)
     run_dir.mkdir(parents=True, exist_ok=True)
     env = OSINTEnvironment(env_config, llm=build_llm_client(env_config.llm))
     seed_tasks = list(env.tasks)
@@ -1566,6 +1677,11 @@ def _run_adversarial_self_play_swarm_v2(
                     report_to=answerer_pre_report_to,
                     run_name=answerer_pre_run_name,
                 )
                 answerer_model = str(answerer_pre_train_result["model_path"])
                 if topology == "shared":
                     generator_model = answerer_model
@@ -1614,6 +1730,11 @@ def _run_adversarial_self_play_swarm_v2(
                 report_to=generator_report_to,
                 run_name=generator_run_name,
             )
             generator_model = str(generator_train_result["model_path"])
             if topology == "shared":
                 answerer_model = generator_model
@@ -1719,6 +1840,11 @@ def _run_adversarial_self_play_swarm_v2(
                 report_to=answerer_report_to,
                 run_name=answerer_run_name,
             )
             answerer_model = str(answerer_train_result["model_path"])
             if topology == "shared":
                 generator_model = answerer_model
@@ -1790,6 +1916,8 @@ def _run_adversarial_self_play_swarm_v2(
     summary_path = run_dir / "self_play_summary.json"
     summary_path.write_text(json.dumps(final_payload, indent=2, sort_keys=True), encoding="utf-8")
     final_payload["summary_path"] = str(summary_path)
     return final_payload
@@ -1813,6 +1941,11 @@ def run_adversarial_self_play(
     run_dir = Path(training_config.output_dir)
     run_dir.mkdir(parents=True, exist_ok=True)
     env = OSINTEnvironment(env_config, llm=build_llm_client(env_config.llm))
     seed_tasks = list(env.tasks)
@@ -1878,6 +2011,11 @@ def run_adversarial_self_play(
                     report_to=answerer_pre_report_to,
                     run_name=answerer_pre_run_name,
                 )
                 answerer_model = str(answerer_pre_train_result["model_path"])
                 if topology == "shared":
                     generator_model = answerer_model
@@ -1921,6 +2059,11 @@ def run_adversarial_self_play(
                 report_to=generator_report_to,
                 run_name=generator_run_name,
             )
             generator_model = str(generator_train_result["model_path"])
             if topology == "shared":
                 answerer_model = generator_model
@@ -1993,6 +2136,11 @@ def run_adversarial_self_play(
                 report_to=answerer_report_to,
                 run_name=answerer_run_name,
             )
             answerer_model = str(answerer_train_result["model_path"])
             if topology == "shared":
                 generator_model = answerer_model
@@ -2067,5 +2215,7 @@ def run_adversarial_self_play(
     summary_path = run_dir / "self_play_summary.json"
     summary_path.write_text(json.dumps(final_payload, indent=2, sort_keys=True), encoding="utf-8")
     final_payload["summary_path"] = str(summary_path)
     return final_payload

 import inspect
 import json
 import os
+import re
 from dataclasses import dataclass
 from pathlib import Path
 import random
     generated_tasks_path: str
+def _is_true_env(value: str | None) -> bool:
+    token = str(value or "").strip().lower()
+    return token in {"1", "true", "yes", "y", "on"}
+def _resolve_hf_upload_token() -> str:
+    for env_name in ("HF_TOKEN", "HUGGINGFACE_HUB_TOKEN", "HUGGING_FACE_HUB_TOKEN"):
+        token = str(os.getenv(env_name, "")).strip()
+        if token:
+            return token
+    return ""
+def _slugify_hf_repo_name(value: str) -> str:
+    token = re.sub(r"[^a-zA-Z0-9._-]+", "-", str(value).strip().lower())
+    token = re.sub(r"-{2,}", "-", token).strip("-.")
+    return token
+def _default_hf_checkpoint_repo_id(run_dir: Path) -> str:
+    explicit = str(os.getenv("OSINT_HF_CHECKPOINT_REPO_ID", "")).strip()
+    if explicit:
+        return explicit
+    space_id = str(os.getenv("SPACE_ID") or os.getenv("HF_SPACE_ID") or "").strip()
+    if "/" not in space_id:
+        return ""
+    owner, _, space_name = space_id.partition("/")
+    suffix = str(os.getenv("OSINT_HF_CHECKPOINT_REPO_SUFFIX", "-checkpoints")).strip() or "-checkpoints"
+    repo_name = _slugify_hf_repo_name(f"{space_name}{suffix}") or "osint-self-play-checkpoints"
+    return f"{owner}/{repo_name}"
+def _hf_checkpoint_repo_prefix(run_dir: Path) -> str:
+    explicit = str(os.getenv("OSINT_HF_CHECKPOINT_PATH_PREFIX", "")).strip().strip("/")
+    if explicit:
+        return explicit
+    return _slugify_hf_repo_name(run_dir.name) or "self-play"
+def _hf_relative_repo_path(local_path: Path, run_dir: Path) -> str:
+    prefix = _hf_checkpoint_repo_prefix(run_dir)
+    try:
+        relative = local_path.relative_to(run_dir).as_posix()
+    except ValueError:
+        relative = local_path.name
+    return f"{prefix}/{relative}".strip("/")
+def _maybe_upload_folder_to_hf(local_dir: Path, run_dir: Path, commit_message: str) -> None:
+    repo_id = _default_hf_checkpoint_repo_id(run_dir)
+    token = _resolve_hf_upload_token()
+    if not repo_id or not token or not local_dir.exists():
+        return
+    try:
+        from huggingface_hub import HfApi
+    except ImportError:
+        print("[self_play][hf_upload] huggingface_hub missing; skipping checkpoint upload.")
+        return
+    repo_type = str(os.getenv("OSINT_HF_CHECKPOINT_REPO_TYPE", "model")).strip() or "model"
+    private = _is_true_env(os.getenv("OSINT_HF_CHECKPOINT_REPO_PRIVATE", "1"))
+    path_in_repo = _hf_relative_repo_path(local_dir, run_dir)
+    api = HfApi(token=token)
+    api.create_repo(repo_id=repo_id, repo_type=repo_type, private=private, exist_ok=True)
+    api.upload_folder(
+        folder_path=str(local_dir),
+        repo_id=repo_id,
+        repo_type=repo_type,
+        path_in_repo=path_in_repo,
+        commit_message=commit_message,
+        ignore_patterns=["*.pyc", "__pycache__", ".DS_Store"],
+    )
+    print(f"[self_play][hf_upload] uploaded {local_dir} -> {repo_type}:{repo_id}/{path_in_repo}")
+def _maybe_upload_file_to_hf(local_file: Path, run_dir: Path, commit_message: str) -> None:
+    repo_id = _default_hf_checkpoint_repo_id(run_dir)
+    token = _resolve_hf_upload_token()
+    if not repo_id or not token or not local_file.exists():
+        return
+    try:
+        from huggingface_hub import HfApi
+    except ImportError:
+        print("[self_play][hf_upload] huggingface_hub missing; skipping artifact upload.")
+        return
+    repo_type = str(os.getenv("OSINT_HF_CHECKPOINT_REPO_TYPE", "model")).strip() or "model"
+    private = _is_true_env(os.getenv("OSINT_HF_CHECKPOINT_REPO_PRIVATE", "1"))
+    path_in_repo = _hf_relative_repo_path(local_file, run_dir)
+    api = HfApi(token=token)
+    api.create_repo(repo_id=repo_id, repo_type=repo_type, private=private, exist_ok=True)
+    api.upload_file(
+        path_or_fileobj=str(local_file),
+        repo_id=repo_id,
+        repo_type=repo_type,
+        path_in_repo=path_in_repo,
+        commit_message=commit_message,
+    )
+    print(f"[self_play][hf_upload] uploaded {local_file} -> {repo_type}:{repo_id}/{path_in_repo}")
 def _require_training_stack() -> tuple[Any, Any, Any]:
     try:
     run_dir = Path(training_config.output_dir)
     run_dir.mkdir(parents=True, exist_ok=True)
+    checkpoint_repo_id = _default_hf_checkpoint_repo_id(run_dir)
+    if checkpoint_repo_id and _resolve_hf_upload_token():
+        print(f"[self_play][hf_upload] checkpoint uploads enabled -> {checkpoint_repo_id}")
+    else:
+        print("[self_play][hf_upload] checkpoint uploads disabled; set HF token and/or OSINT_HF_CHECKPOINT_REPO_ID.")
     env = OSINTEnvironment(env_config, llm=build_llm_client(env_config.llm))
     seed_tasks = list(env.tasks)
                     report_to=answerer_pre_report_to,
                     run_name=answerer_pre_run_name,
                 )
+                _maybe_upload_folder_to_hf(
+                    round_dir / f"{training_config.answerer_phase.output_subdir}_pre",
+                    run_dir,
+                    f"Upload answerer-pre checkpoints for round {round_index:03d}",
+                )
                 answerer_model = str(answerer_pre_train_result["model_path"])
                 if topology == "shared":
                     generator_model = answerer_model
                 report_to=generator_report_to,
                 run_name=generator_run_name,
             )
+            _maybe_upload_folder_to_hf(
+                round_dir / training_config.generator_phase.output_subdir,
+                run_dir,
+                f"Upload generator checkpoints for round {round_index:03d}",
+            )
             generator_model = str(generator_train_result["model_path"])
             if topology == "shared":
                 answerer_model = generator_model
                 report_to=answerer_report_to,
                 run_name=answerer_run_name,
             )
+            _maybe_upload_folder_to_hf(
+                round_dir / training_config.answerer_phase.output_subdir,
+                run_dir,
+                f"Upload answerer checkpoints for round {round_index:03d}",
+            )
             answerer_model = str(answerer_train_result["model_path"])
             if topology == "shared":
                 generator_model = answerer_model
     summary_path = run_dir / "self_play_summary.json"
     summary_path.write_text(json.dumps(final_payload, indent=2, sort_keys=True), encoding="utf-8")
     final_payload["summary_path"] = str(summary_path)
+    _maybe_upload_file_to_hf(summary_path, run_dir, "Upload self-play summary")
+    _maybe_upload_file_to_hf(run_dir / "post_training_evaluation.json", run_dir, "Upload post-training evaluation")
     return final_payload
     run_dir = Path(training_config.output_dir)
     run_dir.mkdir(parents=True, exist_ok=True)
+    checkpoint_repo_id = _default_hf_checkpoint_repo_id(run_dir)
+    if checkpoint_repo_id and _resolve_hf_upload_token():
+        print(f"[self_play][hf_upload] checkpoint uploads enabled -> {checkpoint_repo_id}")
+    else:
+        print("[self_play][hf_upload] checkpoint uploads disabled; set HF token and/or OSINT_HF_CHECKPOINT_REPO_ID.")
     env = OSINTEnvironment(env_config, llm=build_llm_client(env_config.llm))
     seed_tasks = list(env.tasks)
                     report_to=answerer_pre_report_to,
                     run_name=answerer_pre_run_name,
                 )
+                _maybe_upload_folder_to_hf(
+                    round_dir / f"{training_config.answerer_phase.output_subdir}_pre",
+                    run_dir,
+                    f"Upload answerer-pre checkpoints for round {round_index:03d}",
+                )
                 answerer_model = str(answerer_pre_train_result["model_path"])
                 if topology == "shared":
                     generator_model = answerer_model
                 report_to=generator_report_to,
                 run_name=generator_run_name,
             )
+            _maybe_upload_folder_to_hf(
+                round_dir / training_config.generator_phase.output_subdir,
+                run_dir,
+                f"Upload generator checkpoints for round {round_index:03d}",
+            )
             generator_model = str(generator_train_result["model_path"])
             if topology == "shared":
                 answerer_model = generator_model
                 report_to=answerer_report_to,
                 run_name=answerer_run_name,
             )
+            _maybe_upload_folder_to_hf(
+                round_dir / training_config.answerer_phase.output_subdir,
+                run_dir,
+                f"Upload answerer checkpoints for round {round_index:03d}",
+            )
             answerer_model = str(answerer_train_result["model_path"])
             if topology == "shared":
                 generator_model = answerer_model
     summary_path = run_dir / "self_play_summary.json"
     summary_path.write_text(json.dumps(final_payload, indent=2, sort_keys=True), encoding="utf-8")
     final_payload["summary_path"] = str(summary_path)
+    _maybe_upload_file_to_hf(summary_path, run_dir, "Upload self-play summary")
+    _maybe_upload_file_to_hf(run_dir / "post_training_evaluation.json", run_dir, "Upload post-training evaluation")
     return final_payload