Turkish News Classification

Türkçe haber metinlerini 8 kategoriye (çevre, eğitim, ekonomi, kültür-sanat, politika, sağlık, spor, teknoloji) otomatik sınıflandıran BERT tabanlı model.

Model Özeti

Bu model, dbmdz/bert-base-turkish-cased üzerinden interpress_news_category_tr_lite veri kümesi ile fine-tune edilmiştir. Haber sitelerinde otomatik etiketleme, içerik yönlendirme ve tema analizi için kullanılabilir.

Kategoriler

cevre, egitim, ekonomi, kultur-sanat, politika, saglik, spor, teknoloji

Kullanım

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="tugrulkaya/turkish-news-classification",
)

text = "Galatasaray bugün önemli bir galibiyet aldı"
print(classifier(text))
# [{'label': 'spor', 'score': 0.95...}]

Çoklu Metin

texts = [
    "Dolar kuru bugün 28 liraya yükseldi",
    "Yeni akıllı telefon modeli tanıtıldı",
    "Okullarda eğitim öğretim yılı başladı",
]
for t, r in zip(texts, classifier(texts)):
    print(f"{t} → {r['label']} ({r['score']:.2f})")

Manuel Kullanım

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("tugrulkaya/turkish-news-classification")
mdl = AutoModelForSequenceClassification.from_pretrained("tugrulkaya/turkish-news-classification")

def predict(text):
    inputs = tok(text, return_tensors="pt", truncation=True, max_length=256)
    with torch.no_grad():
        logits = mdl(**inputs).logits
    probs = torch.softmax(logits, dim=-1)[0]
    idx = probs.argmax().item()
    return {
        "category": mdl.config.id2label[str(idx)],
        "confidence": probs[idx].item(),
        "all_scores": {mdl.config.id2label[str(i)]: probs[i].item() for i in range(len(probs))},
    }

print(predict("Ekonomide yeni gelişmeler yaşanıyor"))

Eğitim Detayları

Parametre	Değer
Base model	`dbmdz/bert-base-turkish-cased`
Dataset	interpress_news_category_tr_lite
Görev	Multi-class text classification (8 sınıf)
Epoch	3
Batch size	16
Learning rate	2e-5
Max length	256

Sınırlamalar

Model yalnızca Türkçe haber metinleri için eğitilmiştir; başka alan veya dillerde iyi çalışmaz.
256 token üstü uzun metinler kesilir — performans düşebilir.
Kategori dağılımı eğitim setine özgüdür; niş/alt-kategoriler için sınırlı olabilir.
Test skorlarının %100'e yakın olması, eğitim verisinin homojen olduğuna işaret eder; gerçek dünya performansı daha düşük olabilir.

Atıf

@misc{kaya2025turkishnews,
  author = {Kaya, Tuğrul},
  title  = {Turkish News Classification with BERT},
  year   = {2025},
  url    = {https://huggingface.co/tugrulkaya/turkish-news-classification}
}

Lisans

Apache 2.0

Downloads last month: 17

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for tugrulkaya/turkish-news-classification

Base model

dbmdz/bert-base-turkish-cased

Finetuned

(161)

this model

Dataset used to train tugrulkaya/turkish-news-classification

Evaluation results

Accuracy on Turkish News
self-reported

1.000
F1 (Weighted) on Turkish News
self-reported

1.000