DESTA-1B: Dedicated Eritrean Semitic Text Autoregressor

DESTA-1B (Dedicated Eritrean Semitic Text Autoregressor - 1B) is a fine-tuned version of TinyLlama-1.1B-Chat-v1.0 specifically optimized for Tigrinya text generation. The model has been trained on a comprehensive Tigrinya dataset and demonstrates significant improvements in perplexity and text quality for Tigrinya language tasks.

DESTA-1B is designed to serve as a dedicated language model for Eritrean Semitic languages, with a primary focus on Tigrinya text generation, understanding, and completion tasks.

Model Details

Model Description

Model Name: DESTA-1B (Dedicated Eritrean Semitic Text Autoregressor - 1B)
Architecture: LlamaForCausalLM (TinyLlama)
Parameters: ~1.0B (1,007,771,648)
Model Size: ~4.2 GB
Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Language: Tigrinya (ti, tig)
Tokenizer: mewaeltsegay/tokenizer_tigrinya

Model Architecture

Hidden Size: 2,048
Number of Layers: 22
Intermediate Size: 5,632
Attention Heads: 32
Key-Value Heads: 4 (GQA)
Head Dimension: 64
Max Position Embeddings: 2,048
Vocabulary Size: 32,000
Activation Function: SiLU
RoPE Theta: 10,000

Training Details

Training Data

Dataset: mewaeltsegay/finetuning_datatset
Data Type: Full articles and texts in Tigrinya
Tokenizer: mewaeltsegay/tokenizer_tigrinya

Training Procedure

Training Epochs: 6
Batch Size: 16
Gradient Accumulation Steps: 2
Effective Batch Size: 32
Learning Rate: 2e-5
Warmup Steps: 200
Weight Decay: 0.01
Max Sequence Length: 4,096
Mixed Precision: Enabled (FP16/BF16)
Gradient Checkpointing: Enabled
Optimizer: AdamW with learning rate scheduling

Training Infrastructure

Hardware: GPU with 67+ GB memory
Framework: PyTorch with Transformers
Logging: TensorBoard enabled
Checkpointing: Every 500 steps
Evaluation: Every 250 steps

Evaluation Results

Perplexity Comparison

The fine-tuned model shows significant improvements over the base model:

Metric	Base Model	DESTA	Improvement
Mean Perplexity	73.61	10.41	85.86% ↓
Median Perplexity	10.05	11.33	-
Std Deviation	81.11	2.20	97.29% ↓
Min Perplexity	8.65	6.36	-
Max Perplexity	199.99	12.85	93.57% ↓

Loss Comparison

Metric	Base Model	DESTA	Improvement
Mean Loss	3.38	2.32	31.47% ↓
Median Loss	2.31	2.43	-

Key Improvements

85.86% reduction in average perplexity
97.29% reduction in perplexity variance (more consistent performance)
31.47% reduction in average loss
Significantly more stable and predictable outputs

Usage

Installation

pip install torch transformers accelerate

Recommended Generation Parameters

For best results with Tigrinya text generation:

generation_config = {
    "max_new_tokens": 150,
    "min_new_tokens": 50,
    "temperature": 0.7,
    "top_p": 0.9,
    "top_k": 50,
    "repetition_penalty": 1.1,
    "length_penalty": 1.0,
    "do_sample": True,
    "no_repeat_ngram_size": 3,
    "early_stopping": False
}

Using with Pipeline

import os
from transformers import AutoTokenizer, pipeline
from huggingface_hub import snapshot_download
import torch

# Use GPU if available; -1 means CPU
device = 0 if torch.cuda.is_available() else -1

# Load tokenizer from model repo; use cached path + explicit vocab_file to fix path resolution
MODEL_ID = "mewaeltsegay/desta_1b"
model_path = snapshot_download(repo_id=MODEL_ID)
vocab_path = os.path.join(model_path, "sentencepiece.model")
tokenizer = AutoTokenizer.from_pretrained(model_path, vocab_file=vocab_path, trust_remote_code=True)

generator = pipeline(
    "text-generation",
    model=MODEL_ID,
    tokenizer=tokenizer,
    device=device,
    trust_remote_code=True,
)

result = generator(
    "ትግርኛ ቋንቋ እዩ",
    max_new_tokens=150,
    temperature=0.7,
    top_p=0.9
)

print(result[0]['generated_text'])

Limitations and Bias

Known Limitations

Context Length: Maximum context length is 2,048 tokens (can be extended to 4,096 with proper configuration)
Generation Speed: Fine-tuned model may be slower than base model during inference
Domain Specificity: Model is optimized for Tigrinya text; performance on other languages may vary
Training Data: Model performance depends on the quality and coverage of the training dataset

Potential Biases

The model may reflect biases present in the training data
Cultural and regional variations in Tigrinya may not be fully represented
The model may generate text that reflects the style and content of the training corpus

Citation

If you use this model in your research, please cite:

@misc{desta-1b-2026,
  title={DESTA-1B: Dedicated Eritrean Semitic Text Autoregressor},
  author={Mewael Tsegay Desta},
  year={2026},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/mewaeltsegay/desta_1b}}
}

Acknowledgments

Base model: TinyLlama
Tokenizer: mewaeltsegay/tokenizer_tigrinya
Training dataset: mewaeltsegay/finetuning_datatset

Model Card Contact

For questions, issues, or contributions, please open an issue on the model repository.

Note: DESTA-1B is fine-tuned from checkpoint-620 of the training process. For best results, use the recommended generation parameters and ensure proper tokenizer configuration.

Downloads last month: 3,188

Safetensors

Model size

1B params

Tensor type

F32

Model tree for mewaeltsegay/desta_1b

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Finetuned

(535)

this model

Finetunes

1 model