Hezam
/

ArabicT5_Classification

Text Classification

text2text-generation

Text Classification

Model card Files Files and versions

ArabicT5_Classification / README.md

Hezam's picture

Update README.md

233d3ea verified about 2 years ago

|

2.94 kB

	---
	language:
	- ar
	metrics:
	- bleu
	- accuracy
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- t5
	- Classification
	- ArabicT5
	- Text Classification
	widget:
	- example_title: >
	الديني
	- text: >
	الحمد لله رب العالمين والصلاة والسلام على سيد المرسلين نبينا محمد وآله وصحبه أجمعين،وبعد:فإنه يجب على العبد أن يتجنب الذنوب كلها دقها وجلها صغيرها وكبيرها وأن يتعاهد نفسه بالتوبة الصادقة والإنابة إلى ربه. قال تعالى: (وَتُوبُوا إِلَى اللَّهِ جَمِيعًا أَيُّهَا الْمُؤْمِنُونَ لَعَلَّكُمْ تُفْلِحُونَ)النور 31.
	---

	# # Arabic text classification using deep learning (ArabicT5)
	- SANAD: Single-label Arabic News Articles Dataset for automatic text categorization
	[https://www.researchgate.net/publication/333605992_SANAD_Single-Label_Arabic_News_Articles_Dataset_for_Automatic_Text_Categorization]
	[https://data.mendeley.com/datasets/57zpx667y9/2]

	# # The category mapping
	category_mapping = {

	'Politics':1,
	'Finance':2,
	'Medical':3,
	'Sports':4,
	'Culture':5,
	'Tech':6,
	'Religion':7
	}

	# # Training parameters

	\| \| \|
	\| :-------------------: \| :-----------:\|
	\| Training batch size \| `8` \|
	\| Evaluation batch size \| `8` \|
	\| Learning rate \| `1e-4` \|
	\| Max length input \| `200` \|
	\| Max length target \| `3` \|
	\| Number workers \| `4` \|
	\| Epoch \| `2` \|
	\| \| \|

	# # Results

	\| \| \|
	\| :---------------------: \| :-----------: \|
	\| Validation Loss \| `0.0479` \|
	\| Accuracy \| `96.49%` \|
	\| BLeU \| `96.49%` \|

	# # Example usage
	```python
	from transformers import T5ForConditionalGeneration, T5Tokenizer

	model_name="Hezam/ArabicT5_Classification"
	model = T5ForConditionalGeneration.from_pretrained(model_name)
	tokenizer = T5Tokenizer.from_pretrained(model_name)

	text = "الزين فيك القناه الاولي المغربيه الزين فيك القناه الاولي المغربيه اخبارنا المغربيه متابعه تفاجا زوار موقع القناه الاولي المغربي"
	tokens=tokenizer(text, max_length=200,
	truncation=True,
	padding="max_length",
	return_tensors="pt"
	)

	output= model.generate(tokens['input_ids'],
	max_length=3,
	length_penalty=10)

	output = [tokenizer.decode(ids, skip_special_tokens=True,clean_up_tokenization_spaces=True)for ids in output]
	output

	```
	```bash
	['5']
	```