Imsachin010 commited on
Commit
4ef2798
·
1 Parent(s): 29acf31

Fix trl/pytorch version incompatibility + indentation bugs

Browse files

Root cause: trl==1.3.0 requires FSDPModule (PyTorch>=2.5) but
Dockerfile installed PyTorch 2.4.0. Pinned to trl==0.11.0.

Changes:
- Pin trl==0.11.0, transformers==4.44.2, peft==0.11.1, torch==2.4.0
- Fix GRPOTrainer param: processing_class -> tokenizer (trl 0.11 API)
- Fix indentation bug at GRPOTrainer call site
- Fix preflight_check.py: total_mem -> total_memory attribute

guidess.txt ADDED
@@ -0,0 +1,247 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ to run the entire SalesPath training pipeline on Hugging Face Spaces (paid GPU) without running into the same roadblocks you faced.
2
+
3
+ ---
4
+
5
+ ## 1. Repository structure
6
+
7
+ Create a clean project folder with this layout:
8
+
9
+ ```
10
+ salespath-training/
11
+ ├── Dockerfile
12
+ ├── scripts/
13
+ │ └── run_training.sh
14
+ ├── training/
15
+ │ ├── train_sft.py
16
+ │ ├── train_grpo.py
17
+ │ ├── eval_baseline_vs_trained.py
18
+ │ ├── plot_rewards.py
19
+ │ └── preflight_check.py
20
+ ├── salespath_env/ # your environment code
21
+ ├── pyproject.toml
22
+ ├── requirements.txt
23
+ ├── .dockerignore
24
+ └── README.md
25
+ ```
26
+
27
+ ---
28
+
29
+ ## 2. The Dockerfile (GPU‑ready, health‑check safe)
30
+
31
+ Create a `Dockerfile` that:
32
+
33
+ - Starts from a CUDA base image.
34
+ - Pins all dependencies (`torch`, `transformers`, `trl`, `peft`, etc.).
35
+ - **Disables the health check** or uses a background server to keep the Space alive.
36
+ - Runs a robust entrypoint script.
37
+
38
+ ```dockerfile
39
+ FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04
40
+
41
+ ENV DEBIAN_FRONTEND=noninteractive
42
+ ENV PYTHONUNBUFFERED=1
43
+ ENV PYTHONDONTWRITEBYTECODE=1
44
+ ENV PORT=7860
45
+
46
+ RUN apt-get update && apt-get install -y --no-install-recommends \
47
+ python3 python3-pip python3-dev git curl \
48
+ && ln -sf /usr/bin/python3 /usr/bin/python \
49
+ && rm -rf /var/lib/apt/lists/*
50
+
51
+ # Pin NumPy to avoid breakage
52
+ RUN pip install --no-cache-dir --upgrade pip && \
53
+ pip install "numpy<2"
54
+
55
+ # Install PyTorch (adjust CUDA version if needed)
56
+ RUN pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu121
57
+
58
+ # Install core ML libraries (compatible versions)
59
+ RUN pip install transformers==4.44.2 trl==0.11.0 peft==0.11.1 datasets==2.20.0 \
60
+ accelerate bitsandbytes huggingface_hub[cli] hf_transfer uvicorn
61
+
62
+ # Copy project files
63
+ WORKDIR /app
64
+ COPY pyproject.toml requirements.txt ./
65
+ COPY salespath_env ./salespath_env
66
+ COPY training ./training
67
+ COPY scripts/run_training.sh /app/run_training.sh
68
+
69
+ # Install the package (no unsloth by default)
70
+ RUN pip install -e . --no-deps || true
71
+
72
+ RUN chmod +x /app/run_training.sh
73
+
74
+ # 🔥 NO HEALTHCHECK – we'll use a background HTTP server in the entrypoint
75
+ # The Space will not kill the container if no /health endpoint exists.
76
+ # (Alternatively, you can keep HEALTHCHECK with a very long start‑period)
77
+
78
+ CMD ["/app/run_training.sh"]
79
+ ```
80
+
81
+ ---
82
+
83
+ ## 3. The entrypoint script (`scripts/run_training.sh`)
84
+
85
+ This script:
86
+
87
+ - Starts a minimal background HTTP server (answers `/health` immediately).
88
+ - Logs in to HF Hub (if token is provided).
89
+ - Runs SFT, GRPO, evaluation, and plotting.
90
+ - Uploads all artifacts to the model repo.
91
+ - Finally, kills the background server and starts the main keepalive server.
92
+
93
+ ```bash
94
+ #!/usr/bin/env bash
95
+ set -euo pipefail
96
+ cd /app
97
+
98
+ export PORT="${PORT:-7860}"
99
+ export SFT_CHECKPOINT="${SFT_CHECKPOINT:-./sft_checkpoint}"
100
+ export OUTPUT_DIR="${OUTPUT_DIR:-./grpo_checkpoint}"
101
+
102
+ # ----------------------------------------------------------------------
103
+ # 1. Background HTTP server (answers /health, keeps HF happy)
104
+ # ----------------------------------------------------------------------
105
+ python3 - <<EOF &
106
+ import http.server
107
+ import socketserver
108
+ import os
109
+
110
+ PORT = int(os.environ.get("PORT", 7860))
111
+ class HealthHandler(http.server.SimpleHTTPRequestHandler):
112
+ def do_GET(self):
113
+ if self.path == '/health':
114
+ self.send_response(200)
115
+ self.end_headers()
116
+ self.wfile.write(b'OK')
117
+ else:
118
+ self.send_response(404)
119
+ self.end_headers()
120
+ with socketserver.TCPServer(("", PORT), HealthHandler) as httpd:
121
+ httpd.serve_forever()
122
+ EOF
123
+ sleep 2
124
+
125
+ # ----------------------------------------------------------------------
126
+ # 2. HF login (if token is set as secret)
127
+ # ----------------------------------------------------------------------
128
+ if [[ -n "${HF_TOKEN:-}" ]]; then
129
+ huggingface-cli login --token "$HF_TOKEN" --add-to-git-credential
130
+ fi
131
+
132
+ # ----------------------------------------------------------------------
133
+ # 3. Run training steps
134
+ # ----------------------------------------------------------------------
135
+ echo "=== 1/3 SFT ==="
136
+ python training/train_sft.py
137
+
138
+ echo "=== 2/3 GRPO ==="
139
+ python training/train_grpo.py
140
+
141
+ echo "=== 3/3 Eval ==="
142
+ python training/eval_baseline_vs_trained.py \
143
+ --base "$SFT_CHECKPOINT" \
144
+ --trained "$OUTPUT_DIR" \
145
+ --episodes-per-level "${EVAL_EPISODES_PER_LEVEL:-4}"
146
+
147
+ echo "=== 4/4 Plots ==="
148
+ python training/plot_rewards.py --log ./reward_log.jsonl --out ./plots || echo "Plotting skipped"
149
+
150
+ # ----------------------------------------------------------------------
151
+ # 4. Upload model + artifacts to Hugging Face Hub
152
+ # ----------------------------------------------------------------------
153
+ if [[ -n "${HF_MODEL_REPO:-}" && -n "${HF_TOKEN:-}" ]]; then
154
+ echo "=== Upload GRPO adapters to $HF_MODEL_REPO ==="
155
+ huggingface-cli upload "$HF_MODEL_REPO" "$OUTPUT_DIR" . --repo-type model || true
156
+
157
+ # Also upload logs and plots
158
+ for f in reward_log.jsonl eval_results.md eval_results.json; do
159
+ if [[ -f "./$f" ]]; then
160
+ huggingface-cli upload "$HF_MODEL_REPO" "./$f" "$f" --repo-type model || true
161
+ fi
162
+ done
163
+ if [[ -d "./plots" ]]; then
164
+ huggingface-cli upload "$HF_MODEL_REPO" "./plots" "plots" --repo-type model || true
165
+ fi
166
+ fi
167
+
168
+ # ----------------------------------------------------------------------
169
+ # 5. Kill background health server and start real keepalive server
170
+ # ----------------------------------------------------------------------
171
+ kill %1 || true
172
+ exec uvicorn training.hf_keepalive_app:app --host 0.0.0.0 --port "$PORT"
173
+ ```
174
+
175
+ ---
176
+
177
+ ## 4. Environment variables and secrets (Space settings)
178
+
179
+ Your friend must set these in the Space **Settings** under **Variables and secrets**:
180
+
181
+ ### Secrets (hidden)
182
+ | Name | Value |
183
+ |------|-------|
184
+ | `HF_TOKEN` | Hugging Face write token (from settings/tokens) |
185
+
186
+ ### Variables (plain text)
187
+ | Name | Recommended value | Purpose |
188
+ |------|------------------|---------|
189
+ | `HF_MODEL_REPO` | `YourUsername/salespath-grpo` | Target model repo |
190
+ | `ROLLOUTS_PER_DIFFICULTY` | `16` | Collect more rollout data |
191
+ | `NUM_GENERATIONS` | `4` | GRPO group size |
192
+ | `PER_DEVICE_BATCH` | `2` | Batch size (adjust for GPU memory) |
193
+ | `LR` | `8e-7` | Learning rate (stable) |
194
+ | `GAMMA` | `0.98` | Discount factor |
195
+ | `WARMUP_RATIO` | `0.1` | Warmup steps |
196
+ | `NUM_REWARD_WORKERS` | `4` | Parallel reward workers (keep low) |
197
+ | `MAX_SEQ_LEN` | `1024` | Reduce if OOM |
198
+ | `EVAL_EPISODES_PER_LEVEL` | `4` | Number of eval episodes per difficulty |
199
+
200
+ ---
201
+
202
+ ## 5. Critical fixes to avoid your issues
203
+
204
+ | Your issue | The fix |
205
+ |------------|---------|
206
+ | Launch timeout (30 min) | Background HTTP server answers `/health` immediately. |
207
+ | Logs lost after restart | Use `fetch_space_logs` client‑side **during** training. |
208
+ | OOM / crash during rollouts | Reduce `ROLLOUTS_PER_DIFFICULTY` (16), `NUM_REWARD_WORKERS` (4), `MAX_SEQ_LEN` (1024). |
209
+ | Unsloth version conflicts | **Do not install unsloth** – use plain `transformers` + `peft`. |
210
+ | TRL import error (`FSDPModule`) | Pin `trl==0.11.0` and `transformers==4.44.2`. |
211
+ | PEFT adapter not loaded in eval | Use the `load_model` function that detects `adapter_config.json` and merges. |
212
+ | Dense rewards overshadow closing | Increase terminal reward (e.g., +5.0) and add epsilon‑greedy exploration. |
213
+ | Health check kills container | Either remove `HEALTHCHECK` or set `start-period` to 4+ hours. |
214
+
215
+ ---
216
+
217
+ ## 6. Step‑by‑step instructions for your friend
218
+
219
+ 1. **Create a new Space** on HF → Docker → GPU (T4 or L4).
220
+ 2. **Clone** the empty Space locally.
221
+ 3. **Copy** the project files (the structure above) into the clone.
222
+ 4. **Set secrets and variables** in the Space settings (as listed).
223
+ 5. **Push** the code to the Space (`git push origin main`).
224
+ 6. **Monitor** the logs via CLI:
225
+ ```bash
226
+ hf spaces logs YourUsername/space-name -f
227
+ ```
228
+ 7. **When finished**, the model and all artifacts are automatically uploaded to `HF_MODEL_REPO`.
229
+ 8. **Stop the Space** manually to avoid further billing.
230
+
231
+ ---
232
+
233
+ ## 7. Bonus: pre‑flight dependency check
234
+
235
+ Create a small `training/preflight_check.py` that runs at the very beginning of `run_training.sh`:
236
+
237
+ ```python
238
+ import torch, transformers, trl, peft
239
+ print(f"torch: {torch.__version__}")
240
+ print(f"transformers: {transformers.__version__}")
241
+ print(f"trl: {trl.__version__}")
242
+ print(f"peft: {peft.__version__}")
243
+ assert trl.__version__ == "0.11.0", "trl version mismatch"
244
+ # etc.
245
+ ```
246
+
247
+ This catches version mismatches early.
requirements.txt CHANGED
@@ -2,10 +2,10 @@ fastapi>=0.110.0
2
  uvicorn[standard]>=0.29.0
3
  pydantic>=2.0
4
  openenv-core>=0.2.3
5
- transformers>=4.44.0
6
  datasets>=2.20.0
7
- trl>=0.11.0
8
- peft>=0.11.0
9
  httpx
10
  matplotlib
11
  accelerate>=0.33.0
@@ -13,4 +13,4 @@ bitsandbytes>=0.43.0
13
  huggingface_hub[cli]>=0.24.0
14
  hf_transfer>=0.1.8
15
  numpy<2
16
- torch>=2.0.0
 
2
  uvicorn[standard]>=0.29.0
3
  pydantic>=2.0
4
  openenv-core>=0.2.3
5
+ transformers==4.44.2
6
  datasets>=2.20.0
7
+ trl==0.11.0
8
+ peft==0.11.1
9
  httpx
10
  matplotlib
11
  accelerate>=0.33.0
 
13
  huggingface_hub[cli]>=0.24.0
14
  hf_transfer>=0.1.8
15
  numpy<2
16
+ torch==2.4.0
training/__pycache__/grpo_train.cpython-313.pyc CHANGED
Binary files a/training/__pycache__/grpo_train.cpython-313.pyc and b/training/__pycache__/grpo_train.cpython-313.pyc differ
 
training/__pycache__/preflight_check.cpython-313.pyc ADDED
Binary file (3.12 kB). View file
 
training/grpo_train.py CHANGED
@@ -320,7 +320,7 @@ def run_grpo(args):
320
  reward_funcs=salespath_reward_func,
321
  args=config,
322
  train_dataset=train_dataset,
323
- processing_class=tokenizer,
324
  )
325
 
326
  trainer.train()
 
320
  reward_funcs=salespath_reward_func,
321
  args=config,
322
  train_dataset=train_dataset,
323
+ tokenizer=tokenizer,
324
  )
325
 
326
  trainer.train()
training/preflight_check.py CHANGED
@@ -37,7 +37,9 @@ try:
37
  if torch.cuda.is_available():
38
  print(f"CUDA version: {torch.version.cuda}")
39
  print(f"GPU: {torch.cuda.get_device_name(0)}")
40
- print(f"VRAM: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB")
 
 
41
  except Exception as e:
42
  print(f"PyTorch: ERROR — {e}")
43
  all_ok = False
 
37
  if torch.cuda.is_available():
38
  print(f"CUDA version: {torch.version.cuda}")
39
  print(f"GPU: {torch.cuda.get_device_name(0)}")
40
+ props = torch.cuda.get_device_properties(0)
41
+ vram_gb = getattr(props, 'total_memory', getattr(props, 'total_mem', 0)) / 1e9
42
+ print(f"VRAM: {vram_gb:.1f} GB")
43
  except Exception as e:
44
  print(f"PyTorch: ERROR — {e}")
45
  all_ok = False