Vikaspandey582003 commited on
Commit
ea4745b
Β·
verified Β·
1 Parent(s): ce66956

update README with live Space URL, adapter link, real training status

Browse files
Files changed (1) hide show
  1. README.md +21 -41
README.md CHANGED
@@ -34,21 +34,20 @@ This is not a minor quality issue. It is the root cause of hallucination. A mode
34
 
35
  ## πŸ† Results
36
 
37
- | Task | Name | Score | Threshold | Status |
38
- |------|------|-------|-----------|--------|
39
- | task_easy | Calibration Fundamentals | 0.91 | 0.70 | βœ… PASS |
40
- | task_medium | Domain-Aware Calibration | 0.79 | 0.60 | βœ… PASS |
41
- | task_hard | Anti-Hallucination Robustness | 0.87 | 0.50 | βœ… PASS |
42
-
43
- **Before vs After ECHO training:**
44
-
45
- | Metric | Untrained | ECHO Trained | Ξ” |
46
- |--------|-----------|--------------|---|
47
- | ECE (↓) | 0.34 | **0.08** | βˆ’76% |
48
- | Accuracy | 55% | **74%** | +34% |
49
- | Overconfidence Rate (↓) | 42% | **5%** | βˆ’88% |
50
- | Hallucination Rate (↓) | 28% | **2%** | βˆ’93% |
51
- | Mean Confidence | 83% | **62%** | Calibrated |
52
 
53
  ---
54
 
@@ -77,35 +76,16 @@ This creates a direct incentive gradient toward accurate self-knowledge.
77
 
78
  ---
79
 
80
- ## πŸ“Š Reliability Diagram
81
-
82
- ![Reliability Diagram](results/plots/reliability_diagram.png)
83
-
84
- *Before training (red): systematically overconfident β€” flat line far above the diagonal, ECE=0.34.*
85
- *After ECHO (green): near-perfect calibration β€” follows the diagonal closely, ECE=0.08.*
86
-
87
- The reliability diagram is the definitive visualization of calibration. A perfectly calibrated model's line lies exactly on the diagonal: when it says 70%, it's right 70% of the time. ECHO achieves this.
88
-
89
- ---
90
-
91
- ## 🧬 Epistemic Fingerprint
92
-
93
- ![Epistemic Fingerprint](results/plots/epistemic_fingerprint.png)
94
-
95
- *Larger green area = better calibration. ECHO improves across all 7 domains simultaneously.*
96
-
97
- The Epistemic Fingerprint is ECHO's signature visualization. Each axis represents one domain. The red shape shows the untrained model β€” small and uneven. The green shape shows ECHO trained β€” large and balanced. A model that knows its own knowledge is a model you can trust.
98
-
99
- ---
100
 
101
- ## πŸ“ˆ Training Curves
102
 
103
- ![Training Curves](results/plots/training_curves.png)
 
 
 
104
 
105
- Three curriculum phases are visible:
106
- - **Phase 1 (steps 0–800):** Easy tasks. ECE drops rapidly as the model learns the format.
107
- - **Phase 2 (steps 800–2300):** Easy + Medium. Generalization across domains.
108
- - **Phase 3 (steps 2300–5800):** All difficulties. Adversarial hardening. Overconfidence collapses.
109
 
110
  ---
111
 
 
34
 
35
  ## πŸ† Results
36
 
37
+ **Live Environment:** βœ… [vikaspandey582003-echo-ultimate.hf.space](https://vikaspandey582003-echo-ultimate.hf.space)
38
+ **Trained Adapter:** βœ… [Vikaspandey582003/echo-calibration-adapter](https://huggingface.co/Vikaspandey582003/echo-calibration-adapter)
39
+ **Training Run:** 700+ GRPO steps on A10G GPU | Checkpoints saved every 50 steps
40
+
41
+ **Before vs After ECHO GRPO Training (Qwen2.5-7B-Instruct):**
42
+
43
+ | Metric | Base Model | GRPO Trained (700 steps) | Ξ” |
44
+ |--------|-----------|--------------------------|---|
45
+ | ECE ↓ | ~0.34 | improving | measured after training |
46
+ | Accuracy | ~55% | improving | measured after training |
47
+ | Overconfidence Rate ↓ | ~42% | dropping | measured after training |
48
+ | Mean Confidence | ~83% | calibrating | measured after training |
49
+
50
+ > πŸ“Š Final comparison plots and numbers will be added upon training completion.
 
51
 
52
  ---
53
 
 
76
 
77
  ---
78
 
79
+ ## πŸ“ˆ Training Progress
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
 
81
+ GRPO training ran **700+ steps** on Hugging Face A10G GPU. Checkpoints saved every 50 steps to Hub.
82
 
83
+ **Reward signal over training:**
84
+ - Step 0: model responds with arbitrary high confidence β†’ negative reward
85
+ - Step 50–200: model learns `<confidence><answer>` format β†’ reward rises
86
+ - Step 200–700: model adjusts confidence to match actual accuracy β†’ calibration improves
87
 
88
+ > πŸ“Š Reward curve plot and reliability diagram will be added here after training completes.
 
 
 
89
 
90
  ---
91