train_grpo: prebuilt flash-attn wheel + verbose training rollouts 1d82571 anuragredbus commited on 12 days ago
training/train_grpo.ipynb: add Kaggle (/kaggle/working) fresh-clone branch 9536a33 anuragredbus commited on 12 days ago
training: use -- and bash -c to bypass hf CLI typer flag stealing bcc27a5 anuragredbus commited on 12 days ago
Merge branch 'main' of github.com:VaibhavKhandare/viral-posts-env 360c721 vaibhav12332112312 commited on 12 days ago
Merge branch 'main' of https://github.com/VaibhavKhandare/viral-posts-env b7ef274 anuragredbus commited on 12 days ago
training: default flavor a10g-largex4 (4xA10G, 96GB VRAM) ef79012 vaibhav12332112312 commited on 12 days ago
training: smoke-mode + hardcoded peak hint + valid tool IDs 1f72457 vaibhav12332112312 commited on 12 days ago
Merge branch 'main' of github.com:VaibhavKhandare/viral-posts-env 037fe15 vaibhav12332112312 commited on 12 days ago
Strip heatmap leak from prompt; let model discover peak hours via tools e82b235 vaibhav12332112312 commited on 13 days ago
Inject peak hours + history + post-mandate, run SFT every round 30614d3 vaibhav12332112312 commited on 13 days ago
ReAct two-pass per day so model sees current-day tool results b1c1732 vaibhav12332112312 commited on 13 days ago
Strip leaked peak-hour info from observation, force tool discovery afbf541 vaibhav12332112312 commited on 13 days ago
Mandate tool calls in system prompt to debug zero-tool collapse 4299c91 vaibhav12332112312 commited on 13 days ago
Match eval sampling to training, log all I/O, single round 271bf42 vaibhav12332112312 commited on 13 days ago
train(grpo): unified hint prompt, no-history chat, positive-advantage filter 3326716 vaibhav12332112312 commited on 13 days ago
fix: align notebook with 15-day horizon, drop unused replies field f7b5241 vaibhav12332112312 commited on 13 days ago
Merge branch 'main' of https://huggingface.co/spaces/vaibhavkhandare/train-bhai-train 21edd7d vaibhav12332112312 commited on 13 days ago
train: batched parallel rollouts on Qwen2.5-3B + parser hardening a6b8df0 vaibhav12332112312 commited on 13 days ago
Stop tracking plots/*.png with Git LFS; use small inline PNGs for HF Hub. 81cdb34 anuragredbus commited on 13 days ago
Default repo clone branch to main for training notebooks and HF script. ad48770 anuragredbus commited on 13 days ago
Set TASK_HORIZON to 15 days and align graders, UI, and training prompts. 99717c2 anuragredbus commited on 13 days ago
fix(env): tolerate malformed predict_engagement scheduled_actions 4bfe286 vaibhav12332112312 commited on 13 days ago
train: default HF Job flavor l4x1 -> l40sx1 (48GB VRAM) 76b19bd vaibhav12332112312 commited on 13 days ago
train: per-step credit + drop replies + larger batches 9ee7a09 vaibhav12332112312 commited on 13 days ago
fix(notebook): py3.11 f-string backslash error in format_obs 56f70b1 vaibhav12332112312 commited on 13 days ago
Merge branch 'main' of https://huggingface.co/spaces/vaibhavkhandare/train-bhai-train 383294c vaibhav12332112312 commited on 13 days ago