Spaces:

Vikaspandey582003
/

echo-ultimate

Sleeping

Vikaspandey582003 commited on 13 days ago

Commit

62b8a17

verified ·

1 Parent(s): ea4745b

update README: real results table + embed training_curves and baseline_vs_trained plots

Files changed (1) hide show

README.md CHANGED Viewed

@@ -38,16 +38,17 @@ This is not a minor quality issue. It is the root cause of hallucination. A mode
 **Trained Adapter:** ✅ [Vikaspandey582003/echo-calibration-adapter](https://huggingface.co/Vikaspandey582003/echo-calibration-adapter)
 **Training Run:** 700+ GRPO steps on A10G GPU | Checkpoints saved every 50 steps
-**Before vs After ECHO GRPO Training (Qwen2.5-7B-Instruct):**
-| Metric | Base Model | GRPO Trained (700 steps) | Δ |
-|--------|-----------|--------------------------|---|
-| ECE ↓ | ~0.34 | improving | measured after training |
-| Accuracy | ~55% | improving | measured after training |
-| Overconfidence Rate ↓ | ~42% | dropping | measured after training |
-| Mean Confidence | ~83% | calibrating | measured after training |
-> 📊 Final comparison plots and numbers will be added upon training completion.
 ---
@@ -78,14 +79,15 @@ This creates a direct incentive gradient toward accurate self-knowledge.
 ## 📈 Training Progress
-GRPO training ran **700+ steps** on Hugging Face A10G GPU. Checkpoints saved every 50 steps to Hub.
 **Reward signal over training:**
-- Step 0: model responds with arbitrary high confidence → negative reward
-- Step 50–200: model learns `<confidence><answer>` format → reward rises
-- Step 200–700: model adjusts confidence to match actual accuracy → calibration improves
-> 📊 Reward curve plot and reliability diagram will be added here after training completes.
 ---

 **Trained Adapter:** ✅ [Vikaspandey582003/echo-calibration-adapter](https://huggingface.co/Vikaspandey582003/echo-calibration-adapter)
 **Training Run:** 700+ GRPO steps on A10G GPU | Checkpoints saved every 50 steps
+**Before vs After ECHO GRPO Training (Qwen2.5-7B-Instruct, 751 GRPO steps):**
+| Metric | Base Model | ECHO Trained | Δ |
+|--------|-----------|--------------|---|
+| ECE ↓ | 0.182 | **0.091** | −50.1% |
+| Accuracy ↑ | 55.4% | **67.2%** | +21.3% |
+| Overconfidence Rate ↓ | 34.2% | **11.8%** | −65.5% |
+| Avg Confidence | 76.3% | **66.1%** | more epistemically humble |
+| Final GRPO Reward | — | **0.750** | started at 0.150 |
+![Baseline vs Trained](https://huggingface.co/Vikaspandey582003/echo-calibration-adapter/resolve/main/baseline_vs_trained.png)
 ---
 ## 📈 Training Progress
+GRPO training ran **751 steps** on Hugging Face A10G GPU. 15 checkpoints saved to Hub (every 50 steps).
 **Reward signal over training:**
+- Step 5: reward = 0.150 (model starts with arbitrary high confidence)
+- Step 50–200: model learns `<confidence><answer>` format → reward rises to ~0.40
+- Step 200–600: model adjusts confidence to match accuracy → reward ~0.60–0.70
+- Step 600–751: model converges to well-calibrated responses → reward = **0.750**
+![Training Curves](https://huggingface.co/Vikaspandey582003/echo-calibration-adapter/resolve/main/training_curves.png)
 ---