nraptisss
/

tmf921-intent-training

@@ -1,6 +1,6 @@
 # Zero-shot Qwen3-8B vs Fine-tuned Qwen3-8B QLoRA
-Zero-shot baseline was evaluated on 200 examples per split. Fine-tuned results are full split metrics.
 | Split | Zero-shot parse | Fine-tuned parse | Zero-shot norm field F1 | Fine-tuned norm field F1 | Zero-shot norm key F1 | Fine-tuned norm key F1 |
 |---|---:|---:|---:|---:|---:|---:|
@@ -10,4 +10,10 @@ Zero-shot baseline was evaluated on 200 examples per split. Fine-tuned results a
 | Sector OOD | 0.345 | 1.000 | 0.0008 | 0.7697 | 0.0171 | 0.9818 |
 | Adversarial | 0.000 | 1.000 | 0.0000 | 0.9697 | 0.0000 | 1.0000 |
-Conclusion: domain QLoRA fine-tuning is essential for structured telecom intent-to-config generation.

 # Zero-shot Qwen3-8B vs Fine-tuned Qwen3-8B QLoRA
+Zero-shot baseline was evaluated on 200 examples per split. Fine-tuned stage-1 results are full split metrics.
 | Split | Zero-shot parse | Fine-tuned parse | Zero-shot norm field F1 | Fine-tuned norm field F1 | Zero-shot norm key F1 | Fine-tuned norm key F1 |
 |---|---:|---:|---:|---:|---:|---:|
 | Sector OOD | 0.345 | 1.000 | 0.0008 | 0.7697 | 0.0171 | 0.9818 |
 | Adversarial | 0.000 | 1.000 | 0.0000 | 0.9697 | 0.0000 | 1.0000 |
+## Interpretation
+Zero-shot Qwen3-8B mostly fails structured telecom intent-to-configuration generation. Domain QLoRA fine-tuning is essential: it raises JSON parse rate from roughly one-third to near 100%, normalized key F1 from about 0.02 to about 0.98, and normalized field F1 from near zero to about 0.77-0.80 across non-adversarial ID/OOD splits.
+## Caveat
+The zero-shot baseline is sampled at 200 examples per split for compute efficiency. Fine-tuned metrics are reported on the full evaluation splits. If a strict apples-to-apples comparison is required, rerun the fine-tuned model on the same sampled subset.