DESTA-1B: Dedicated Eritrean Semitic Text Autoregressor

DESTA-1B (Dedicated Eritrean Semitic Text Autoregressor - 1B) is a fine-tuned version of TinyLlama-1.1B-Chat-v1.0 specifically optimized for Tigrinya text generation. The model has been trained on a comprehensive Tigrinya dataset and demonstrates significant improvements in perplexity and text quality for Tigrinya language tasks.

DESTA-1B is designed to serve as a dedicated language model for Eritrean Semitic languages, with a primary focus on Tigrinya text generation, understanding, and completion tasks.

Model Details

Model Description

  • Model Name: DESTA-1B (Dedicated Eritrean Semitic Text Autoregressor - 1B)
  • Architecture: LlamaForCausalLM (TinyLlama)
  • Parameters: ~1.0B (1,007,771,648)
  • Model Size: ~4.2 GB
  • Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
  • Language: Tigrinya (ti, tig)
  • Tokenizer: mewaeltsegay/tokenizer_tigrinya

Model Architecture

  • Hidden Size: 2,048
  • Number of Layers: 22
  • Intermediate Size: 5,632
  • Attention Heads: 32
  • Key-Value Heads: 4 (GQA)
  • Head Dimension: 64
  • Max Position Embeddings: 2,048
  • Vocabulary Size: 32,000
  • Activation Function: SiLU
  • RoPE Theta: 10,000

Training Details

Training Data

Training Procedure

  • Training Epochs: 6
  • Batch Size: 16
  • Gradient Accumulation Steps: 2
  • Effective Batch Size: 32
  • Learning Rate: 2e-5
  • Warmup Steps: 200
  • Weight Decay: 0.01
  • Max Sequence Length: 4,096
  • Mixed Precision: Enabled (FP16/BF16)
  • Gradient Checkpointing: Enabled
  • Optimizer: AdamW with learning rate scheduling

Training Infrastructure

  • Hardware: GPU with 67+ GB memory
  • Framework: PyTorch with Transformers
  • Logging: TensorBoard enabled
  • Checkpointing: Every 500 steps
  • Evaluation: Every 250 steps

Evaluation Results

Perplexity Comparison

The fine-tuned model shows significant improvements over the base model:

Metric Base Model DESTA Improvement
Mean Perplexity 73.61 10.41 85.86%
Median Perplexity 10.05 11.33 -
Std Deviation 81.11 2.20 97.29%
Min Perplexity 8.65 6.36 -
Max Perplexity 199.99 12.85 93.57%

Loss Comparison

Metric Base Model DESTA Improvement
Mean Loss 3.38 2.32 31.47%
Median Loss 2.31 2.43 -

Key Improvements

  • 85.86% reduction in average perplexity
  • 97.29% reduction in perplexity variance (more consistent performance)
  • 31.47% reduction in average loss
  • Significantly more stable and predictable outputs

Usage

Installation

pip install torch transformers accelerate

Recommended Generation Parameters

For best results with Tigrinya text generation:

generation_config = {
    "max_new_tokens": 150,
    "min_new_tokens": 50,
    "temperature": 0.7,
    "top_p": 0.9,
    "top_k": 50,
    "repetition_penalty": 1.1,
    "length_penalty": 1.0,
    "do_sample": True,
    "no_repeat_ngram_size": 3,
    "early_stopping": False
}

Using with Pipeline

import os
from transformers import AutoTokenizer, pipeline
from huggingface_hub import snapshot_download
import torch

# Use GPU if available; -1 means CPU
device = 0 if torch.cuda.is_available() else -1

# Load tokenizer from model repo; use cached path + explicit vocab_file to fix path resolution
MODEL_ID = "mewaeltsegay/desta_1b"
model_path = snapshot_download(repo_id=MODEL_ID)
vocab_path = os.path.join(model_path, "sentencepiece.model")
tokenizer = AutoTokenizer.from_pretrained(model_path, vocab_file=vocab_path, trust_remote_code=True)

generator = pipeline(
    "text-generation",
    model=MODEL_ID,
    tokenizer=tokenizer,
    device=device,
    trust_remote_code=True,
)

result = generator(
    "ትግርኛ ቋንቋ እዩ",
    max_new_tokens=150,
    temperature=0.7,
    top_p=0.9
)

print(result[0]['generated_text'])

Limitations and Bias

Known Limitations

  1. Context Length: Maximum context length is 2,048 tokens (can be extended to 4,096 with proper configuration)
  2. Generation Speed: Fine-tuned model may be slower than base model during inference
  3. Domain Specificity: Model is optimized for Tigrinya text; performance on other languages may vary
  4. Training Data: Model performance depends on the quality and coverage of the training dataset

Potential Biases

  • The model may reflect biases present in the training data
  • Cultural and regional variations in Tigrinya may not be fully represented
  • The model may generate text that reflects the style and content of the training corpus

Citation

If you use this model in your research, please cite:

@misc{desta-1b-2026,
  title={DESTA-1B: Dedicated Eritrean Semitic Text Autoregressor},
  author={Mewael Tsegay Desta},
  year={2026},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/mewaeltsegay/desta_1b}}
}

Acknowledgments

Model Card Contact

For questions, issues, or contributions, please open an issue on the model repository.


Note: DESTA-1B is fine-tuned from checkpoint-620 of the training process. For best results, use the recommended generation parameters and ensure proper tokenizer configuration.

Downloads last month
3,188
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mewaeltsegay/desta_1b

Finetuned
(535)
this model
Finetunes
1 model