Update README.md
Browse files
README.md
CHANGED
|
@@ -53,7 +53,7 @@ Interested parties may reach out via the Hugging Face discussion board or review
|
|
| 53 |
|
| 54 |
</details>
|
| 55 |
|
| 56 |
-
## Progress
|
| 57 |
|
| 58 |
<details>
|
| 59 |
<summary><b>Summary:</b> Phase 1 is underway, but achieving a high-fidelity "Teacher" model for Philippine languages using Llama 3.1 and machine-translated Alpaca data is currently bottlenecked. Llama 3.1's inherent English-centric bias combined with syntactically flawed, machine-translated training data creates a compounding error loop. This results in grammatical corruption, dialect mixing, and severe hallucinations rather than true Neural Machine Translation (NMT) parity. There is still a long way to go to build a reliable teacher model; we must pivot away from machine-translated shortcuts and invest in human-curated, native-first datasets before progressing to knowledge distillation.</summary>
|
|
|
|
| 53 |
|
| 54 |
</details>
|
| 55 |
|
| 56 |
+
## Progress Report for Phase 1
|
| 57 |
|
| 58 |
<details>
|
| 59 |
<summary><b>Summary:</b> Phase 1 is underway, but achieving a high-fidelity "Teacher" model for Philippine languages using Llama 3.1 and machine-translated Alpaca data is currently bottlenecked. Llama 3.1's inherent English-centric bias combined with syntactically flawed, machine-translated training data creates a compounding error loop. This results in grammatical corruption, dialect mixing, and severe hallucinations rather than true Neural Machine Translation (NMT) parity. There is still a long way to go to build a reliable teacher model; we must pivot away from machine-translated shortcuts and invest in human-curated, native-first datasets before progressing to knowledge distillation.</summary>
|