AlicanKiraz0/Kara-Kumru-v1.0-2B · Cetvel benchmark context & inference observations

Cetvel benchmark context & inference observations

by O96a - opened 26 days ago

Solid work on the Turkish fine-tune. The 32.54 QA score on Cetvel is interesting — I've been benchmarking smaller models on Arabic dialect tasks and see similar patterns where generative QA struggles with domain-specific phrasing.

Curious: did you observe any degradation on translation tasks after fine-tuning on the QA/summarization mix? We found that multitask fine-tuning on low-resource languages often trades generative fluency for task-specific accuracy.

Also, have you tested inference latency on CPU-only setups? The 2.3B parameter count fits well within edge deployment constraints, and Mistral's KV-cache efficiency makes it practical for real-time applications if quantization holds.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment