update: results table, 0.5B model links, citation year 2026 293f2e4 Jayant-Kernel commited on 12 days ago
docs: detailed README with curriculum, reward table, results, usage a7c6973 Jayant-Kernel commited on 12 days ago
rollback: revert to last working Dockerfile and train.py e30d685 unverified Jayant-Kernel commited on 12 days ago
fix: proper GRPO with trl 0.12.2 no-deps + force hub downgrade 0efac4a unverified Jayant-Kernel commited on 12 days ago
fix: custom training loop without TRL dependency 5232a98 unverified Jayant-Kernel commited on 12 days ago
fix: force reinstall huggingface_hub 0.24.7 after deceit_env 54fc539 unverified Jayant-Kernel commited on 12 days ago
fix: pin huggingface_hub 0.24.7, install trl with --no-deps a0058bb unverified Jayant-Kernel commited on 12 days ago
fix: trl 0.12.2 has GRPOTrainer, pin all deps before trl install 430098b unverified Jayant-Kernel commited on 12 days ago
fix: try multiple import paths for GRPOConfig 2cdce1f unverified Jayant-Kernel commited on 12 days ago
fix: install transformers 4.46.0 BEFORE trl so trl doesnt upgrade it 9264b56 unverified Jayant-Kernel commited on 12 days ago
fix: bust docker cache force reinstall trl 0.11.4 e9971fb unverified Jayant-Kernel commited on 12 days ago
fix: trl 0.11.4 + transformers 4.46.0 + processing_class e8f541c unverified Jayant-Kernel commited on 12 days ago
fix: trl 0.9.4 + transformers 4.41.2 compatible versions e48f580 unverified Jayant-Kernel commited on 12 days ago
fix: tokenizer not processing_class, torch cu121 for GPU 56567fd unverified Jayant-Kernel commited on 12 days ago
fix: cu124 not cu118 for A100 CUDA 12.9 driver 74138e3 unverified Jayant-Kernel commited on 12 days ago
fix: trl 0.8.6 has GRPOConfig, compatible with torch 2.1.2 4f33e83 Jayant-Kernel commited on 12 days ago
fix: find deceit_env package location and copy data correctly 11baf5d Jayant-Kernel commited on 12 days ago
fix: back to python:3.10-slim for GPU, fix deceit_env path 1058c6b Jayant-Kernel commited on 12 days ago
fix: use huggingface transformers-pytorch-gpu base image 73c82af Jayant-Kernel commited on 12 days ago
fix: revert to torch 2.1.0 cu121 with trl 0.7.4 - versions that worked before 10648d1 Jayant-Kernel commited on 12 days ago
fix: trl 0.12.0 has GRPOTrainer, compatible with torch 2.4.0 84d05af Jayant-Kernel commited on 12 days ago
improve: abstention penalty, better prompt, mixed curriculum, more steps 253d1ff Jayant-Kernel commited on 12 days ago
evaluate: switch to 0.5B model comparison, 200 episodes 6b64fd2 Jayant-Kernel commited on 12 days ago
fix: parse_action confidence bug, numeric answers bug, missing reasoning field bug 66bdd16 Jayant-Kernel commited on 13 days ago
add: evaluate 1.5B base vs trained, upload chart to HF Hub 77e0352 Jayant-Kernel commited on 13 days ago
update: 500 steps L1 + 300 steps L2, higher lr for 1.5B f788873 Jayant-Kernel commited on 13 days ago