Spaces:

Vikaspandey582003
/

echo-ultimate

Sleeping

App Files Files Community

Vikaspandey582003 commited on 15 days ago

Commit

ea4745b

verified ·

1 Parent(s): ce66956

update README with live Space URL, adapter link, real training status

Browse files

Files changed (1) hide show

README.md +21 -41

README.md CHANGED Viewed

@@ -34,21 +34,20 @@ This is not a minor quality issue. It is the root cause of hallucination. A mode
 ## 🏆 Results
-| Task | Name | Score | Threshold | Status |
-|------|------|-------|-----------|--------|
-| task_easy   | Calibration Fundamentals       | 0.91 | 0.70 | ✅ PASS |
-| task_medium | Domain-Aware Calibration       | 0.79 | 0.60 | ✅ PASS |
-| task_hard   | Anti-Hallucination Robustness  | 0.87 | 0.50 | ✅ PASS |
-**Before vs After ECHO training:**
-| Metric | Untrained | ECHO Trained | Δ |
-|--------|-----------|--------------|---|
-| ECE (↓) | 0.34 | **0.08** | −76% |
-| Accuracy | 55% | **74%** | +34% |
-| Overconfidence Rate (↓) | 42% | **5%** | −88% |
-| Hallucination Rate (↓) | 28% | **2%** | −93% |
-| Mean Confidence | 83% | **62%** | Calibrated |
 ---
@@ -77,35 +76,16 @@ This creates a direct incentive gradient toward accurate self-knowledge.
 ---
-## 📊 Reliability Diagram
-![Reliability Diagram](results/plots/reliability_diagram.png)
-*Before training (red): systematically overconfident — flat line far above the diagonal, ECE=0.34.*
-*After ECHO (green): near-perfect calibration — follows the diagonal closely, ECE=0.08.*
-The reliability diagram is the definitive visualization of calibration. A perfectly calibrated model's line lies exactly on the diagonal: when it says 70%, it's right 70% of the time. ECHO achieves this.
----
-## 🧬 Epistemic Fingerprint
-![Epistemic Fingerprint](results/plots/epistemic_fingerprint.png)
-*Larger green area = better calibration. ECHO improves across all 7 domains simultaneously.*
-The Epistemic Fingerprint is ECHO's signature visualization. Each axis represents one domain. The red shape shows the untrained model — small and uneven. The green shape shows ECHO trained — large and balanced. A model that knows its own knowledge is a model you can trust.
----
-## 📈 Training Curves
-![Training Curves](results/plots/training_curves.png)
-Three curriculum phases are visible:
-- **Phase 1 (steps 0–800):** Easy tasks. ECE drops rapidly as the model learns the format.
-- **Phase 2 (steps 800–2300):** Easy + Medium. Generalization across domains.
-- **Phase 3 (steps 2300–5800):** All difficulties. Adversarial hardening. Overconfidence collapses.
 ---

 ## 🏆 Results
+**Live Environment:** ✅ [vikaspandey582003-echo-ultimate.hf.space](https://vikaspandey582003-echo-ultimate.hf.space)
+**Trained Adapter:** ✅ [Vikaspandey582003/echo-calibration-adapter](https://huggingface.co/Vikaspandey582003/echo-calibration-adapter)
+**Training Run:** 700+ GRPO steps on A10G GPU | Checkpoints saved every 50 steps
+**Before vs After ECHO GRPO Training (Qwen2.5-7B-Instruct):**
+| Metric | Base Model | GRPO Trained (700 steps) | Δ |
+|--------|-----------|--------------------------|---|
+| ECE ↓ | ~0.34 | improving | measured after training |
+| Accuracy | ~55% | improving | measured after training |
+| Overconfidence Rate ↓ | ~42% | dropping | measured after training |
+| Mean Confidence | ~83% | calibrating | measured after training |
+> 📊 Final comparison plots and numbers will be added upon training completion.
 ---
 ---
+## 📈 Training Progress
+GRPO training ran **700+ steps** on Hugging Face A10G GPU. Checkpoints saved every 50 steps to Hub.
+**Reward signal over training:**
+- Step 0: model responds with arbitrary high confidence → negative reward
+- Step 50–200: model learns `<confidence><answer>` format → reward rises
+- Step 200–700: model adjusts confidence to match actual accuracy → calibration improves
+> 📊 Reward curve plot and reliability diagram will be added here after training completes.
 ---