Commit History

update: results table, 0.5B model links, citation year 2026
293f2e4

Jayant-Kernel commited on

docs: detailed README with curriculum, reward table, results, usage
a7c6973

Jayant-Kernel commited on

rollback: revert to last working Dockerfile and train.py
e30d685
unverified

Jayant-Kernel commited on

fix: proper GRPO with trl 0.12.2 no-deps + force hub downgrade
0efac4a
unverified

Jayant-Kernel commited on

fix: custom training loop without TRL dependency
5232a98
unverified

Jayant-Kernel commited on

fix: force reinstall huggingface_hub 0.24.7 after deceit_env
54fc539
unverified

Jayant-Kernel commited on

fix: pin huggingface_hub 0.24.7, install trl with --no-deps
a0058bb
unverified

Jayant-Kernel commited on

fix: trl 0.12.2 has GRPOTrainer, pin all deps before trl install
430098b
unverified

Jayant-Kernel commited on

fix: try multiple import paths for GRPOConfig
2cdce1f
unverified

Jayant-Kernel commited on

fix: install transformers 4.46.0 BEFORE trl so trl doesnt upgrade it
9264b56
unverified

Jayant-Kernel commited on

fix: bust docker cache force reinstall trl 0.11.4
e9971fb
unverified

Jayant-Kernel commited on

fix: trl 0.11.4 + transformers 4.46.0 + processing_class
e8f541c
unverified

Jayant-Kernel commited on

fix: trl 0.9.4 + transformers 4.41.2 compatible versions
e48f580
unverified

Jayant-Kernel commited on

fix: add torch version check in Dockerfile
391a47a
unverified

Jayant-Kernel commited on

fix: remove tokenizer arg from GRPOTrainer
f3d865a
unverified

Jayant-Kernel commited on

fix: tokenizer not processing_class, torch cu121 for GPU
56567fd
unverified

Jayant-Kernel commited on

fix: correct trl version with GRPOConfig
83f6afa
unverified

Jayant-Kernel commited on

fix: trl 0.13.0, remove verify steps
29f2767
unverified

Jayant-Kernel commited on

fix: cu124 not cu118 for A100 CUDA 12.9 driver
74138e3
unverified

Jayant-Kernel commited on

fix: trl 0.12.2 + torch 2.4.0
bc4c6b4
unverified

Jayant-Kernel commited on

fix: CPU fallback when no GPU detected
4c4c68a
unverified

Jayant-Kernel commited on

fix: trl 0.15.0 definitely has GRPOConfig
1c058a2
unverified

Jayant-Kernel commited on

fix: trl 0.9.6 + bitsandbytes 0.43.1 cu118
787e377
unverified

Jayant-Kernel commited on

fix: trl 0.8.6 has GRPOConfig, compatible with torch 2.1.2
4f33e83

Jayant-Kernel commited on

fix: remove multiline python heredoc from Dockerfile
6452e7e

Jayant-Kernel commited on

fix: find deceit_env package location and copy data correctly
11baf5d

Jayant-Kernel commited on

fix: back to python:3.10-slim for GPU, fix deceit_env path
1058c6b

Jayant-Kernel commited on

fix: deceit_env module path and PYTHONPATH
845f95d

Jayant-Kernel commited on

fix: nvidia cuda base with python3.10 installed
cbaf9f7

Jayant-Kernel commited on

fix: use huggingface transformers-pytorch-gpu base image
73c82af

Jayant-Kernel commited on

fix: revert to torch 2.1.0 cu121 with trl 0.7.4 - versions that worked before
10648d1

Jayant-Kernel commited on

fix: simplify dockerfile no version pinning
bcc84d6

Jayant-Kernel commited on

fix: accelerate 0.34.2 exists, 0.35.0 does not
09ab990

Jayant-Kernel commited on

fix: trl 0.12.0 has GRPOTrainer, compatible with torch 2.4.0
84d05af

Jayant-Kernel commited on

fix: pin trl 0.11.0 compatible with torch 2.4.0
3bced27

Jayant-Kernel commited on

fix: upgrade torch to 2.4.0 with CUDA 12.4 support
0862a5f

Jayant-Kernel commited on

fix: run train.py instead of evaluate.py
32e8cc3

Jayant-Kernel commited on

improve: abstention penalty, better prompt, mixed curriculum, more steps
253d1ff

Jayant-Kernel commited on

evaluate: switch to 0.5B model comparison, 200 episodes
6b64fd2

Jayant-Kernel commited on

fix: set N_EPISODES=200 constant (was still 30)
e662a77

Jayant-Kernel commited on

update: increase evaluation to 200 episodes per model
a5be204

Jayant-Kernel commited on

fix: parse_action confidence bug, numeric answers bug, missing reasoning field bug
66bdd16

Jayant-Kernel commited on

fix: debug model output parsing in evaluation
3d9195a

Jayant-Kernel commited on

add: evaluate 1.5B base vs trained, upload chart to HF Hub
77e0352

Jayant-Kernel commited on

update: 500 steps L1 + 300 steps L2, higher lr for 1.5B
f788873

Jayant-Kernel commited on

fix: add matplotlib, split COPY into separate lines
354d3fd

Jayant-Kernel commited on

fix: copy evaluate.py into Docker image
88fb03e

Jayant-Kernel commited on

add: evaluate 1.5B base vs trained, upload charts
68e5af2

Jayant-Kernel commited on

fix: remove misplaced import inside GRPOConfig args
e4aea5d

Jayant-Kernel commited on

fix: auto-detect bf16 support
d34e286

Jayant-Kernel commited on