Update README.md
Browse files
README.md
CHANGED
|
@@ -27,5 +27,5 @@ The project tracked performance gains and losses across multiple iterations:
|
|
| 27 |
* **SFT v3 (released)**: **0.671 (+6.8%)** — achieved through precise loss calculation and data cleaning.
|
| 28 |
* **DPO Merged**: < 0.628 — highlighting the extreme sensitivity of code models to preference data quality.
|
| 29 |
|
| 30 |
-
⚠️ Status & Roadmap
|
| 31 |
This project is actively under development. Currently, the DPO alignment exhibits performance regression (Pass@1 < 0.628) due to preference data sensitivity. We are investigating advanced filtering and reward modeling to resolve this. Optimized weights will be uploaded as soon as the alignment bottleneck is cleared.
|
|
|
|
| 27 |
* **SFT v3 (released)**: **0.671 (+6.8%)** — achieved through precise loss calculation and data cleaning.
|
| 28 |
* **DPO Merged**: < 0.628 — highlighting the extreme sensitivity of code models to preference data quality.
|
| 29 |
|
| 30 |
+
## ⚠️ Status & Roadmap
|
| 31 |
This project is actively under development. Currently, the DPO alignment exhibits performance regression (Pass@1 < 0.628) due to preference data sensitivity. We are investigating advanced filtering and reward modeling to resolve this. Optimized weights will be uploaded as soon as the alignment bottleneck is cleared.
|