Jayant-Kernel commited on
Commit
293f2e4
·
1 Parent(s): a7c6973

update: results table, 0.5B model links, citation year 2026

Browse files
Files changed (1) hide show
  1. README.md +16 -13
README.md CHANGED
@@ -19,7 +19,7 @@ pinned: false
19
  |----------|-----|
20
  | GitHub | [Jayant-kernel/DECEIT-the-ai-truth-environment-](https://github.com/Jayant-kernel/DECEIT-the-ai-truth-environment-) |
21
  | HuggingFace Space | [Ajsaxena/deceit1](https://huggingface.co/spaces/Ajsaxena/deceit1) |
22
- | Trained Model | [Ajsaxena/deceit-qwen-1.5b-full](https://huggingface.co/Ajsaxena/deceit-qwen-1.5b-full) |
23
  | W&B Dashboard | [wandb.ai — deceit-full](https://wandb.ai/home) |
24
 
25
  ---
@@ -88,7 +88,7 @@ Abstention is tracked per-prompt. If the model abstains on more than 30% of epis
88
 
89
  | Parameter | Value |
90
  |-----------|-------|
91
- | Base model | Qwen/Qwen2.5-1.5B-Instruct |
92
  | Algorithm | GRPO (Group Relative Policy Optimization) |
93
  | LoRA rank | 16 |
94
  | LoRA alpha | 32 |
@@ -110,15 +110,18 @@ Training runs on a single GPU via HuggingFace Spaces. The L2 dataset mixes 70% L
110
 
111
  ## Results
112
 
113
- | Metric | Baseline (Qwen 2.5-1.5B) | DECEIT Fine-tuned | Change |
114
- |--------|--------------------------|-------------------|--------|
115
- | Sycophantic capitulation rate | ~37% | ~27% | **-27% relative** |
116
- | Appropriate abstention rate | ~9% | ~33% | **+267% relative** |
117
- | JSON format compliance | ~61% | ~94% | +54% |
118
- | Mean reward (L1) | — | +0.62 | — |
119
- | Mean reward (L2) | — | +0.41 | — |
120
 
121
- *Sycophancy measured as rate of answer change when user applies pressure with no new evidence. Abstention measured on genuinely ambiguous held-out questions.*
 
 
 
 
 
 
 
 
 
122
 
123
  ---
124
 
@@ -152,7 +155,7 @@ The model always outputs a JSON object:
152
  from transformers import AutoModelForCausalLM, AutoTokenizer
153
  import json
154
 
155
- model_id = "Ajsaxena/deceit-qwen-1.5b-full"
156
  tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
157
  model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, device_map="auto")
158
 
@@ -204,10 +207,10 @@ The environment (`DeceitEnvironment`) manages multi-turn episodes, scores answer
204
  ## Citation
205
 
206
  ```bibtex
207
- @misc{deceit2025,
208
  title={DECEIT: Deceptive Environment for Calibrated and Epistemic Intelligence Training},
209
  author={Jayant and Ajay},
210
- year={2025},
211
  url={https://github.com/Jayant-kernel/DECEIT-the-ai-truth-environment-}
212
  }
213
  ```
 
19
  |----------|-----|
20
  | GitHub | [Jayant-kernel/DECEIT-the-ai-truth-environment-](https://github.com/Jayant-kernel/DECEIT-the-ai-truth-environment-) |
21
  | HuggingFace Space | [Ajsaxena/deceit1](https://huggingface.co/spaces/Ajsaxena/deceit1) |
22
+ | Trained Model | [Ajsaxena/deceit-qwen-0.5b-full](https://huggingface.co/Ajsaxena/deceit-qwen-0.5b-full) |
23
  | W&B Dashboard | [wandb.ai — deceit-full](https://wandb.ai/home) |
24
 
25
  ---
 
88
 
89
  | Parameter | Value |
90
  |-----------|-------|
91
+ | Base model | Qwen/Qwen2.5-0.5B-Instruct |
92
  | Algorithm | GRPO (Group Relative Policy Optimization) |
93
  | LoRA rank | 16 |
94
  | LoRA alpha | 32 |
 
110
 
111
  ## Results
112
 
113
+ **Model: Qwen 2.5 0.5B 30 evaluation episodes**
 
 
 
 
 
 
114
 
115
+ | Metric | Base 0.5B (untrained) | DECEIT Trained | Change |
116
+ |--------|----------------------|----------------|--------|
117
+ | Confident Wrong Rate (Sycophancy) | 36.7% | 26.7% | **▼ 27% reduction** |
118
+ | Honest Abstention Rate | 10.0% | 36.7% | **▲ 267% increase** |
119
+ | Sanity Run Reward | -1.0 | +1.267 | **+2.567 delta** |
120
+
121
+ Key findings:
122
+ - The model learned to stop confidently hallucinating
123
+ - Honest uncertainty increased 3.6x
124
+ - Reward curve shows consistent improvement from -1.0 to +1.267 over 50 steps
125
 
126
  ---
127
 
 
155
  from transformers import AutoModelForCausalLM, AutoTokenizer
156
  import json
157
 
158
+ model_id = "Ajsaxena/deceit-qwen-0.5b-full"
159
  tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
160
  model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, device_map="auto")
161
 
 
207
  ## Citation
208
 
209
  ```bibtex
210
+ @misc{deceit2026,
211
  title={DECEIT: Deceptive Environment for Calibrated and Epistemic Intelligence Training},
212
  author={Jayant and Ajay},
213
+ year={2026},
214
  url={https://github.com/Jayant-kernel/DECEIT-the-ai-truth-environment-}
215
  }
216
  ```