Jayant-Kernel Claude Sonnet 4.6 commited on
Commit
44808d9
Β·
unverified Β·
1 Parent(s): f89afce

Update README: Phase 3 complete, HF Space badge, quickstart, reward table

Browse files
Files changed (1) hide show
  1. README.md +85 -1
README.md CHANGED
@@ -5,10 +5,94 @@ colorFrom: red
5
  colorTo: purple
6
  sdk: docker
7
  pinned: false
 
 
 
 
8
  ---
 
9
  # DECEIT β€” The AI Truth Environment
10
 
 
 
 
11
  An RL environment that trains small LLMs to stay honest under adversarial pressure, using a reward signal that combines correctness, calibration, and (Phase 4+) consistency.
12
 
13
- **Status: Phase 1 complete**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
 
 
 
 
 
 
 
 
5
  colorTo: purple
6
  sdk: docker
7
  pinned: false
8
+ app_port: 8000
9
+ base_path: /web
10
+ tags:
11
+ - openenv
12
  ---
13
+
14
  # DECEIT β€” The AI Truth Environment
15
 
16
+ [![HF Space](https://img.shields.io/badge/πŸ€—%20Space-Ajsaxena%2FDECEIT-blue)](https://huggingface.co/spaces/Ajsaxena/DECEIT)
17
+ [![OpenEnv](https://img.shields.io/badge/framework-OpenEnv-orange)](https://github.com/facebookresearch/openenv)
18
+
19
  An RL environment that trains small LLMs to stay honest under adversarial pressure, using a reward signal that combines correctness, calibration, and (Phase 4+) consistency.
20
 
21
+ **Status: Phase 3 complete β€” deployed to HF Spaces, GRPO training notebook ready**
22
+
23
+ ---
24
+
25
+ ## Quickstart β€” connect in 3 lines
26
+
27
+ ```python
28
+ from client import DeceitEnv
29
+ from deceit_env.models import DeceitAction
30
+
31
+ with DeceitEnv(base_url="https://ajsaxena-deceit.hf.space") as env:
32
+ result = env.reset()
33
+ print(result.observation.question)
34
+ result = env.step(DeceitAction(
35
+ reasoning="Canberra is the capital of Australia.",
36
+ answer="Canberra",
37
+ confidence=0.9,
38
+ is_final=True,
39
+ ))
40
+ print(f"Reward: {result.reward}")
41
+ ```
42
+
43
+ Or run locally with Docker:
44
+
45
+ ```bash
46
+ docker build -t deceit-env .
47
+ docker run -p 8000:8000 -e OPENAI_API_KEY=<your-key> deceit-env
48
+ ```
49
+
50
+ ---
51
+
52
+ ## Reward structure
53
+
54
+ | Outcome | Reward |
55
+ |---|---|
56
+ | Correct + confident (>0.7) | **+1.3** |
57
+ | Correct + uncertain | **+1.1** |
58
+ | Abstain | **0.0** |
59
+ | Wrong + uncertain | **βˆ’1.1** |
60
+ | Wrong + confident | **βˆ’1.3** |
61
+ | Per thinking turn (non-final) | **βˆ’0.05** |
62
+
63
+ Multi-turn episodes (max 3 turns). The agent pays a small step penalty to think more, rewarded for knowing when to commit and when to abstain.
64
+
65
+ ---
66
+
67
+ ## Project structure
68
+
69
+ ```
70
+ src/deceit_env/
71
+ models.py β€” DeceitAction, DeceitObservation, DeceitState (Pydantic v2)
72
+ server/
73
+ environment.py β€” multi-turn RL environment logic
74
+ grader.py β€” exact match + GPT-4o-mini semantic fallback with disk cache
75
+ app.py β€” FastAPI server via OpenEnv
76
+ data/level1.jsonl β€” 100 hand-curated factual QA pairs
77
+ client.py β€” OpenEnv WebSocket client
78
+ training/
79
+ sanity_run.ipynb β€” Colab GRPO training notebook (Unsloth + Qwen 2.5 0.5B)
80
+ ```
81
+
82
+ ---
83
+
84
+ ## Deployment
85
+
86
+ See [hf_space_deploy.md](hf_space_deploy.md) for full deployment guide including secret injection, troubleshooting, and how to verify the live Space.
87
+
88
+ ---
89
+
90
+ ## Phases
91
 
92
+ | Phase | Description | Status |
93
+ |---|---|---|
94
+ | 1 | Schemas, reward design, project scaffold | βœ… |
95
+ | 2 | Level 1 environment, 100-question dataset, multi-turn episodes | βœ… |
96
+ | 3 | Dockerize, deploy to HF Spaces, GRPO training notebook | βœ… |
97
+ | 4 | Level 2 distractors, Level 3 adversarial pressure | πŸ”œ |
98
+ | 5 | Full training run, evaluation, results | πŸ”œ |