Strip heatmap leak from prompt; let model discover peak hours via tools e82b235 vaibhav12332112312 commited on 12 days ago
Inject peak hours + history + post-mandate, run SFT every round 30614d3 vaibhav12332112312 commited on 12 days ago
ReAct two-pass per day so model sees current-day tool results b1c1732 vaibhav12332112312 commited on 12 days ago
Strip leaked peak-hour info from observation, force tool discovery afbf541 vaibhav12332112312 commited on 12 days ago
Mandate tool calls in system prompt to debug zero-tool collapse 4299c91 vaibhav12332112312 commited on 12 days ago
Match eval sampling to training, log all I/O, single round 271bf42 vaibhav12332112312 commited on 12 days ago
train(grpo): unified hint prompt, no-history chat, positive-advantage filter 3326716 vaibhav12332112312 commited on 12 days ago
fix: align notebook with 15-day horizon, drop unused replies field f7b5241 vaibhav12332112312 commited on 12 days ago
Merge branch 'main' of https://huggingface.co/spaces/vaibhavkhandare/train-bhai-train 21edd7d vaibhav12332112312 commited on 12 days ago
train: batched parallel rollouts on Qwen2.5-3B + parser hardening a6b8df0 vaibhav12332112312 commited on 12 days ago
Stop tracking plots/*.png with Git LFS; use small inline PNGs for HF Hub. 81cdb34 anuragredbus commited on 12 days ago
Default repo clone branch to main for training notebooks and HF script. ad48770 anuragredbus commited on 12 days ago
Set TASK_HORIZON to 15 days and align graders, UI, and training prompts. 99717c2 anuragredbus commited on 12 days ago
fix(env): tolerate malformed predict_engagement scheduled_actions 4bfe286 vaibhav12332112312 commited on 12 days ago
train: default HF Job flavor l4x1 -> l40sx1 (48GB VRAM) 76b19bd vaibhav12332112312 commited on 12 days ago
train: per-step credit + drop replies + larger batches 9ee7a09 vaibhav12332112312 commited on 12 days ago
fix(notebook): py3.11 f-string backslash error in format_obs 56f70b1 vaibhav12332112312 commited on 12 days ago
Merge branch 'main' of https://huggingface.co/spaces/vaibhavkhandare/train-bhai-train 383294c vaibhav12332112312 commited on 12 days ago
fix(notebook): pin typing_extensions>=4.13.0 to fix pydantic Sentinel ImportError b1bd9cc vaibhav12332112312 commited on 12 days ago
fix: restore parse_model_output exception parity with original bare except aeedd8d anuragredbus commited on 13 days ago
chore: align train_grpo.ipynb with smoke/syntax patterns for Colab 0587f05 anuragredbus commited on 13 days ago
add training/syntax_only.ipynb — kernel + Python syntax only (no project logic) 0e50d91 anuragredbus commited on 13 days ago
add train_grpo_smoke notebook; quote pip versions in train_grpo b55c1ff anuragredbus commited on 13 days ago
fix: notebook loads Qwen without bitsandbytes on Mac; optional training deps eb1d764 anuragredbus commited on 13 days ago
fix: robust notebook setup (no magic shell) + local CWD auto-detect 8d09986 anuragredbus commited on 13 days ago
Merge branch 'hack1' of github.com:VaibhavKhandare/viral-posts-env into hack1 6c01076 vaibhav12332112312 commited on 13 days ago
fix: rewrite training notebook for real LoRA fine-tuning on Colab 4a29e22 anuragredbus commited on 13 days ago