Spaces:
Sleeping
Sleeping
update README with live Space URL, adapter link, real training status
Browse files
README.md
CHANGED
|
@@ -34,21 +34,20 @@ This is not a minor quality issue. It is the root cause of hallucination. A mode
|
|
| 34 |
|
| 35 |
## π Results
|
| 36 |
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
|
| 46 |
-
|
|
| 47 |
-
|
|
| 48 |
-
|
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
| Mean Confidence | 83% | **62%** | Calibrated |
|
| 52 |
|
| 53 |
---
|
| 54 |
|
|
@@ -77,35 +76,16 @@ This creates a direct incentive gradient toward accurate self-knowledge.
|
|
| 77 |
|
| 78 |
---
|
| 79 |
|
| 80 |
-
##
|
| 81 |
-
|
| 82 |
-

|
| 83 |
-
|
| 84 |
-
*Before training (red): systematically overconfident β flat line far above the diagonal, ECE=0.34.*
|
| 85 |
-
*After ECHO (green): near-perfect calibration β follows the diagonal closely, ECE=0.08.*
|
| 86 |
-
|
| 87 |
-
The reliability diagram is the definitive visualization of calibration. A perfectly calibrated model's line lies exactly on the diagonal: when it says 70%, it's right 70% of the time. ECHO achieves this.
|
| 88 |
-
|
| 89 |
-
---
|
| 90 |
-
|
| 91 |
-
## 𧬠Epistemic Fingerprint
|
| 92 |
-
|
| 93 |
-

|
| 94 |
-
|
| 95 |
-
*Larger green area = better calibration. ECHO improves across all 7 domains simultaneously.*
|
| 96 |
-
|
| 97 |
-
The Epistemic Fingerprint is ECHO's signature visualization. Each axis represents one domain. The red shape shows the untrained model β small and uneven. The green shape shows ECHO trained β large and balanced. A model that knows its own knowledge is a model you can trust.
|
| 98 |
-
|
| 99 |
-
---
|
| 100 |
|
| 101 |
-
|
| 102 |
|
| 103 |
-
|
|
|
|
|
|
|
|
|
|
| 104 |
|
| 105 |
-
|
| 106 |
-
- **Phase 1 (steps 0β800):** Easy tasks. ECE drops rapidly as the model learns the format.
|
| 107 |
-
- **Phase 2 (steps 800β2300):** Easy + Medium. Generalization across domains.
|
| 108 |
-
- **Phase 3 (steps 2300β5800):** All difficulties. Adversarial hardening. Overconfidence collapses.
|
| 109 |
|
| 110 |
---
|
| 111 |
|
|
|
|
| 34 |
|
| 35 |
## π Results
|
| 36 |
|
| 37 |
+
**Live Environment:** β
[vikaspandey582003-echo-ultimate.hf.space](https://vikaspandey582003-echo-ultimate.hf.space)
|
| 38 |
+
**Trained Adapter:** β
[Vikaspandey582003/echo-calibration-adapter](https://huggingface.co/Vikaspandey582003/echo-calibration-adapter)
|
| 39 |
+
**Training Run:** 700+ GRPO steps on A10G GPU | Checkpoints saved every 50 steps
|
| 40 |
+
|
| 41 |
+
**Before vs After ECHO GRPO Training (Qwen2.5-7B-Instruct):**
|
| 42 |
+
|
| 43 |
+
| Metric | Base Model | GRPO Trained (700 steps) | Ξ |
|
| 44 |
+
|--------|-----------|--------------------------|---|
|
| 45 |
+
| ECE β | ~0.34 | improving | measured after training |
|
| 46 |
+
| Accuracy | ~55% | improving | measured after training |
|
| 47 |
+
| Overconfidence Rate β | ~42% | dropping | measured after training |
|
| 48 |
+
| Mean Confidence | ~83% | calibrating | measured after training |
|
| 49 |
+
|
| 50 |
+
> π Final comparison plots and numbers will be added upon training completion.
|
|
|
|
| 51 |
|
| 52 |
---
|
| 53 |
|
|
|
|
| 76 |
|
| 77 |
---
|
| 78 |
|
| 79 |
+
## π Training Progress
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
|
| 81 |
+
GRPO training ran **700+ steps** on Hugging Face A10G GPU. Checkpoints saved every 50 steps to Hub.
|
| 82 |
|
| 83 |
+
**Reward signal over training:**
|
| 84 |
+
- Step 0: model responds with arbitrary high confidence β negative reward
|
| 85 |
+
- Step 50β200: model learns `<confidence><answer>` format β reward rises
|
| 86 |
+
- Step 200β700: model adjusts confidence to match actual accuracy β calibration improves
|
| 87 |
|
| 88 |
+
> π Reward curve plot and reliability diagram will be added here after training completes.
|
|
|
|
|
|
|
|
|
|
| 89 |
|
| 90 |
---
|
| 91 |
|