Spaces:

Ajsaxena
/

deceit1

Paused

App Files Files Community

deceit1

Commit History

update: results table, 0.5B model links, citation year 2026

293f2e4

Jayant-Kernel commited on 12 days ago

docs: detailed README with curriculum, reward table, results, usage

a7c6973

Jayant-Kernel commited on 12 days ago

rollback: revert to last working Dockerfile and train.py

e30d685
unverified

Jayant-Kernel commited on 12 days ago

fix: proper GRPO with trl 0.12.2 no-deps + force hub downgrade

0efac4a
unverified

Jayant-Kernel commited on 12 days ago

fix: custom training loop without TRL dependency

5232a98
unverified

Jayant-Kernel commited on 12 days ago

fix: force reinstall huggingface_hub 0.24.7 after deceit_env

54fc539
unverified

Jayant-Kernel commited on 12 days ago

fix: pin huggingface_hub 0.24.7, install trl with --no-deps

a0058bb
unverified

Jayant-Kernel commited on 12 days ago

fix: trl 0.12.2 has GRPOTrainer, pin all deps before trl install

430098b
unverified

Jayant-Kernel commited on 12 days ago

fix: try multiple import paths for GRPOConfig

2cdce1f
unverified

Jayant-Kernel commited on 12 days ago

fix: install transformers 4.46.0 BEFORE trl so trl doesnt upgrade it

9264b56
unverified

Jayant-Kernel commited on 12 days ago

fix: bust docker cache force reinstall trl 0.11.4

e9971fb
unverified

Jayant-Kernel commited on 12 days ago

fix: trl 0.11.4 + transformers 4.46.0 + processing_class

e8f541c
unverified

Jayant-Kernel commited on 12 days ago

fix: trl 0.9.4 + transformers 4.41.2 compatible versions

e48f580
unverified

Jayant-Kernel commited on 12 days ago

fix: add torch version check in Dockerfile

391a47a
unverified

Jayant-Kernel commited on 12 days ago

fix: remove tokenizer arg from GRPOTrainer

f3d865a
unverified

Jayant-Kernel commited on 12 days ago

fix: tokenizer not processing_class, torch cu121 for GPU

56567fd
unverified

Jayant-Kernel commited on 12 days ago

fix: correct trl version with GRPOConfig

83f6afa
unverified

Jayant-Kernel commited on 12 days ago

fix: trl 0.13.0, remove verify steps

29f2767
unverified

Jayant-Kernel commited on 12 days ago

fix: cu124 not cu118 for A100 CUDA 12.9 driver

74138e3
unverified

Jayant-Kernel commited on 12 days ago

fix: trl 0.12.2 + torch 2.4.0

bc4c6b4
unverified

Jayant-Kernel commited on 12 days ago

fix: CPU fallback when no GPU detected

4c4c68a
unverified

Jayant-Kernel commited on 12 days ago

fix: trl 0.15.0 definitely has GRPOConfig

1c058a2
unverified

Jayant-Kernel commited on 12 days ago

fix: trl 0.9.6 + bitsandbytes 0.43.1 cu118

787e377
unverified

Jayant-Kernel commited on 12 days ago

fix: trl 0.8.6 has GRPOConfig, compatible with torch 2.1.2

4f33e83

Jayant-Kernel commited on 12 days ago

fix: remove multiline python heredoc from Dockerfile

6452e7e

Jayant-Kernel commited on 12 days ago

fix: find deceit_env package location and copy data correctly

11baf5d

Jayant-Kernel commited on 12 days ago

fix: back to python:3.10-slim for GPU, fix deceit_env path

1058c6b

Jayant-Kernel commited on 12 days ago

fix: deceit_env module path and PYTHONPATH

845f95d

Jayant-Kernel commited on 12 days ago

fix: nvidia cuda base with python3.10 installed

cbaf9f7

Jayant-Kernel commited on 12 days ago

fix: use huggingface transformers-pytorch-gpu base image

73c82af

Jayant-Kernel commited on 12 days ago

fix: revert to torch 2.1.0 cu121 with trl 0.7.4 - versions that worked before

10648d1

Jayant-Kernel commited on 12 days ago

fix: simplify dockerfile no version pinning

bcc84d6

Jayant-Kernel commited on 12 days ago

fix: accelerate 0.34.2 exists, 0.35.0 does not

09ab990

Jayant-Kernel commited on 12 days ago

fix: trl 0.12.0 has GRPOTrainer, compatible with torch 2.4.0

84d05af

Jayant-Kernel commited on 12 days ago

fix: pin trl 0.11.0 compatible with torch 2.4.0

3bced27

Jayant-Kernel commited on 12 days ago

fix: upgrade torch to 2.4.0 with CUDA 12.4 support

0862a5f

Jayant-Kernel commited on 12 days ago

fix: run train.py instead of evaluate.py

32e8cc3

Jayant-Kernel commited on 12 days ago

improve: abstention penalty, better prompt, mixed curriculum, more steps

253d1ff

Jayant-Kernel commited on 12 days ago

evaluate: switch to 0.5B model comparison, 200 episodes

6b64fd2

Jayant-Kernel commited on 12 days ago

fix: set N_EPISODES=200 constant (was still 30)

e662a77

Jayant-Kernel commited on 13 days ago

update: increase evaluation to 200 episodes per model

a5be204

Jayant-Kernel commited on 13 days ago

fix: parse_action confidence bug, numeric answers bug, missing reasoning field bug

66bdd16

Jayant-Kernel commited on 13 days ago

fix: debug model output parsing in evaluation

3d9195a

Jayant-Kernel commited on 13 days ago

add: evaluate 1.5B base vs trained, upload chart to HF Hub

77e0352

Jayant-Kernel commited on 13 days ago

update: 500 steps L1 + 300 steps L2, higher lr for 1.5B

f788873

Jayant-Kernel commited on 13 days ago

fix: add matplotlib, split COPY into separate lines

354d3fd

Jayant-Kernel commited on 13 days ago

fix: copy evaluate.py into Docker image

88fb03e

Jayant-Kernel commited on 13 days ago

add: evaluate 1.5B base vs trained, upload charts

68e5af2

Jayant-Kernel commited on 13 days ago

fix: remove misplaced import inside GRPOConfig args

e4aea5d

Jayant-Kernel commited on 13 days ago

fix: auto-detect bf16 support

d34e286

Jayant-Kernel commited on 13 days ago

Commit History

update: results table, 0.5B model links, citation year 2026 293f2e4

docs: detailed README with curriculum, reward table, results, usage a7c6973

rollback: revert to last working Dockerfile and train.py e30d685 unverified

fix: proper GRPO with trl 0.12.2 no-deps + force hub downgrade 0efac4a unverified

fix: custom training loop without TRL dependency 5232a98 unverified

fix: force reinstall huggingface_hub 0.24.7 after deceit_env 54fc539 unverified

fix: pin huggingface_hub 0.24.7, install trl with --no-deps a0058bb unverified

fix: trl 0.12.2 has GRPOTrainer, pin all deps before trl install 430098b unverified

fix: try multiple import paths for GRPOConfig 2cdce1f unverified

fix: install transformers 4.46.0 BEFORE trl so trl doesnt upgrade it 9264b56 unverified

fix: bust docker cache force reinstall trl 0.11.4 e9971fb unverified

fix: trl 0.11.4 + transformers 4.46.0 + processing_class e8f541c unverified

fix: trl 0.9.4 + transformers 4.41.2 compatible versions e48f580 unverified

fix: add torch version check in Dockerfile 391a47a unverified

fix: remove tokenizer arg from GRPOTrainer f3d865a unverified

fix: tokenizer not processing_class, torch cu121 for GPU 56567fd unverified

fix: correct trl version with GRPOConfig 83f6afa unverified

fix: trl 0.13.0, remove verify steps 29f2767 unverified

fix: cu124 not cu118 for A100 CUDA 12.9 driver 74138e3 unverified

fix: trl 0.12.2 + torch 2.4.0 bc4c6b4 unverified

fix: CPU fallback when no GPU detected 4c4c68a unverified

fix: trl 0.15.0 definitely has GRPOConfig 1c058a2 unverified

fix: trl 0.9.6 + bitsandbytes 0.43.1 cu118 787e377 unverified

fix: trl 0.8.6 has GRPOConfig, compatible with torch 2.1.2 4f33e83

fix: remove multiline python heredoc from Dockerfile 6452e7e

fix: find deceit_env package location and copy data correctly 11baf5d

fix: back to python:3.10-slim for GPU, fix deceit_env path 1058c6b

fix: deceit_env module path and PYTHONPATH 845f95d

fix: nvidia cuda base with python3.10 installed cbaf9f7

fix: use huggingface transformers-pytorch-gpu base image 73c82af

fix: revert to torch 2.1.0 cu121 with trl 0.7.4 - versions that worked before 10648d1

fix: simplify dockerfile no version pinning bcc84d6

fix: accelerate 0.34.2 exists, 0.35.0 does not 09ab990

fix: trl 0.12.0 has GRPOTrainer, compatible with torch 2.4.0 84d05af

fix: pin trl 0.11.0 compatible with torch 2.4.0 3bced27

fix: upgrade torch to 2.4.0 with CUDA 12.4 support 0862a5f

fix: run train.py instead of evaluate.py 32e8cc3

improve: abstention penalty, better prompt, mixed curriculum, more steps 253d1ff

evaluate: switch to 0.5B model comparison, 200 episodes 6b64fd2

fix: set N_EPISODES=200 constant (was still 30) e662a77

update: increase evaluation to 200 episodes per model a5be204

fix: parse_action confidence bug, numeric answers bug, missing reasoning field bug 66bdd16

fix: debug model output parsing in evaluation 3d9195a

add: evaluate 1.5B base vs trained, upload chart to HF Hub 77e0352

update: 500 steps L1 + 300 steps L2, higher lr for 1.5B f788873

fix: add matplotlib, split COPY into separate lines 354d3fd

fix: copy evaluate.py into Docker image 88fb03e

add: evaluate 1.5B base vs trained, upload charts 68e5af2

fix: remove misplaced import inside GRPOConfig args e4aea5d

fix: auto-detect bf16 support d34e286

update: results table, 0.5B model links, citation year 2026

293f2e4

docs: detailed README with curriculum, reward table, results, usage

a7c6973

rollback: revert to last working Dockerfile and train.py

e30d685
unverified

fix: proper GRPO with trl 0.12.2 no-deps + force hub downgrade

0efac4a
unverified

fix: custom training loop without TRL dependency

5232a98
unverified

fix: force reinstall huggingface_hub 0.24.7 after deceit_env

54fc539
unverified

fix: pin huggingface_hub 0.24.7, install trl with --no-deps

a0058bb
unverified

fix: trl 0.12.2 has GRPOTrainer, pin all deps before trl install

430098b
unverified

fix: try multiple import paths for GRPOConfig

2cdce1f
unverified

fix: install transformers 4.46.0 BEFORE trl so trl doesnt upgrade it

9264b56
unverified

fix: bust docker cache force reinstall trl 0.11.4

e9971fb
unverified

fix: trl 0.11.4 + transformers 4.46.0 + processing_class

e8f541c
unverified

fix: trl 0.9.4 + transformers 4.41.2 compatible versions

e48f580
unverified

fix: add torch version check in Dockerfile

391a47a
unverified

fix: remove tokenizer arg from GRPOTrainer

f3d865a
unverified

fix: tokenizer not processing_class, torch cu121 for GPU

56567fd
unverified

fix: correct trl version with GRPOConfig

83f6afa
unverified

fix: trl 0.13.0, remove verify steps

29f2767
unverified

fix: cu124 not cu118 for A100 CUDA 12.9 driver

74138e3
unverified

fix: trl 0.12.2 + torch 2.4.0

bc4c6b4
unverified

fix: CPU fallback when no GPU detected

4c4c68a
unverified

fix: trl 0.15.0 definitely has GRPOConfig

1c058a2
unverified

fix: trl 0.9.6 + bitsandbytes 0.43.1 cu118

787e377
unverified

fix: trl 0.8.6 has GRPOConfig, compatible with torch 2.1.2

4f33e83

fix: remove multiline python heredoc from Dockerfile

6452e7e

fix: find deceit_env package location and copy data correctly

11baf5d

fix: back to python:3.10-slim for GPU, fix deceit_env path

1058c6b

fix: deceit_env module path and PYTHONPATH

845f95d

fix: nvidia cuda base with python3.10 installed

cbaf9f7

fix: use huggingface transformers-pytorch-gpu base image

73c82af

fix: revert to torch 2.1.0 cu121 with trl 0.7.4 - versions that worked before

10648d1

fix: simplify dockerfile no version pinning

bcc84d6

fix: accelerate 0.34.2 exists, 0.35.0 does not

09ab990

fix: trl 0.12.0 has GRPOTrainer, compatible with torch 2.4.0

84d05af

fix: pin trl 0.11.0 compatible with torch 2.4.0

3bced27

fix: upgrade torch to 2.4.0 with CUDA 12.4 support

0862a5f

fix: run train.py instead of evaluate.py

32e8cc3

improve: abstention penalty, better prompt, mixed curriculum, more steps

253d1ff

evaluate: switch to 0.5B model comparison, 200 episodes

6b64fd2

fix: set N_EPISODES=200 constant (was still 30)

e662a77

update: increase evaluation to 200 episodes per model

a5be204

fix: parse_action confidence bug, numeric answers bug, missing reasoning field bug

66bdd16

fix: debug model output parsing in evaluation

3d9195a

add: evaluate 1.5B base vs trained, upload chart to HF Hub

77e0352

update: 500 steps L1 + 300 steps L2, higher lr for 1.5B

f788873

fix: add matplotlib, split COPY into separate lines

354d3fd

fix: copy evaluate.py into Docker image

88fb03e

add: evaluate 1.5B base vs trained, upload charts

68e5af2

fix: remove misplaced import inside GRPOConfig args

e4aea5d

fix: auto-detect bf16 support

d34e286