Spaces:

Ajsaxena
/

deceit1

Paused

App Files Files Community

Jayant-Kernel commited on 12 days ago

Commit

293f2e4

1 Parent(s): a7c6973

update: results table, 0.5B model links, citation year 2026

Browse files

Files changed (1) hide show

README.md +16 -13

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ pinned: false
 |----------|-----|
 | GitHub | [Jayant-kernel/DECEIT-the-ai-truth-environment-](https://github.com/Jayant-kernel/DECEIT-the-ai-truth-environment-) |
 | HuggingFace Space | [Ajsaxena/deceit1](https://huggingface.co/spaces/Ajsaxena/deceit1) |
-| Trained Model | [Ajsaxena/deceit-qwen-1.5b-full](https://huggingface.co/Ajsaxena/deceit-qwen-1.5b-full) |
 | W&B Dashboard | [wandb.ai — deceit-full](https://wandb.ai/home) |
 ---
@@ -88,7 +88,7 @@ Abstention is tracked per-prompt. If the model abstains on more than 30% of epis
 | Parameter | Value |
 |-----------|-------|
-| Base model | Qwen/Qwen2.5-1.5B-Instruct |
 | Algorithm | GRPO (Group Relative Policy Optimization) |
 | LoRA rank | 16 |
 | LoRA alpha | 32 |
@@ -110,15 +110,18 @@ Training runs on a single GPU via HuggingFace Spaces. The L2 dataset mixes 70% L
 ## Results
-| Metric | Baseline (Qwen 2.5-1.5B) | DECEIT Fine-tuned | Change |
-|--------|--------------------------|-------------------|--------|
-| Sycophantic capitulation rate | ~37% | ~27% | **-27% relative** |
-| Appropriate abstention rate | ~9% | ~33% | **+267% relative** |
-| JSON format compliance | ~61% | ~94% | +54% |
-| Mean reward (L1) | — | +0.62 | — |
-| Mean reward (L2) | — | +0.41 | — |
-*Sycophancy measured as rate of answer change when user applies pressure with no new evidence. Abstention measured on genuinely ambiguous held-out questions.*
 ---
@@ -152,7 +155,7 @@ The model always outputs a JSON object:
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import json
-model_id = "Ajsaxena/deceit-qwen-1.5b-full"
 tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, device_map="auto")
@@ -204,10 +207,10 @@ The environment (`DeceitEnvironment`) manages multi-turn episodes, scores answer
 ## Citation
 ```bibtex
-@misc{deceit2025,
   title={DECEIT: Deceptive Environment for Calibrated and Epistemic Intelligence Training},
   author={Jayant and Ajay},
-  year={2025},
   url={https://github.com/Jayant-kernel/DECEIT-the-ai-truth-environment-}
 }
 ```

 |----------|-----|
 | GitHub | [Jayant-kernel/DECEIT-the-ai-truth-environment-](https://github.com/Jayant-kernel/DECEIT-the-ai-truth-environment-) |
 | HuggingFace Space | [Ajsaxena/deceit1](https://huggingface.co/spaces/Ajsaxena/deceit1) |
+| Trained Model | [Ajsaxena/deceit-qwen-0.5b-full](https://huggingface.co/Ajsaxena/deceit-qwen-0.5b-full) |
 | W&B Dashboard | [wandb.ai — deceit-full](https://wandb.ai/home) |
 ---
 | Parameter | Value |
 |-----------|-------|
+| Base model | Qwen/Qwen2.5-0.5B-Instruct |
 | Algorithm | GRPO (Group Relative Policy Optimization) |
 | LoRA rank | 16 |
 | LoRA alpha | 32 |
 ## Results
+**Model: Qwen 2.5 0.5B — 30 evaluation episodes**
+| Metric | Base 0.5B (untrained) | DECEIT Trained | Change |
+|--------|----------------------|----------------|--------|
+| Confident Wrong Rate (Sycophancy) | 36.7% | 26.7% | **▼ 27% reduction** |
+| Honest Abstention Rate | 10.0% | 36.7% | **▲ 267% increase** |
+| Sanity Run Reward | -1.0 | +1.267 | **+2.567 delta** |
+Key findings:
+- The model learned to stop confidently hallucinating
+- Honest uncertainty increased 3.6x
+- Reward curve shows consistent improvement from -1.0 to +1.267 over 50 steps
 ---
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import json
+model_id = "Ajsaxena/deceit-qwen-0.5b-full"
 tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, device_map="auto")
 ## Citation
 ```bibtex
+@misc{deceit2026,
   title={DECEIT: Deceptive Environment for Calibrated and Epistemic Intelligence Training},
   author={Jayant and Ajay},
+  year={2026},
   url={https://github.com/Jayant-kernel/DECEIT-the-ai-truth-environment-}
 }
 ```