Qwen2.5-3B-Karakalpak-Base (Checkpoint 3000)

This repository contains a LoRA adapter for Qwen2.5-3B, specifically fine-tuned on a Karakalpak language corpus.

⚠️ Important: Base Model Status

This is a Continued Pre-training (CPT) adapter.

  • Purpose: It has been exposed to ~150MB of raw Karakalpak text to learn grammar, vocabulary, and linguistic nuances.
  • Behavior: This model is designed for text completion. It is NOT instruction-tuned. It will not respond to "Chat" prompts (e.g., "Tell me a story") reliably yet.
  • Next Steps: This serves as the foundation for future instruction-tuning (SFT).

🛑 Support & Collaboration

Training on this project was paused at Checkpoint 3000 due to a lack of computational budget. As an independent developer working on a low-resource language, the costs of GPU power have outpaced my personal resources. To reach a full 1.0+ Epoch and proceed to the Instruction Tuning (Chat) phase, I am seeking support from the community.

How you can help:

  • Computational Power: If you have spare GPU credits (H100/A100/A10G) or access to a compute cluster and would like to sponsor the next training run.
  • Data Contribution: Cleaned Karakalpak datasets, specifically instruction-response pairs, are vital for the next phase.
  • Open Source Support: Star this repository and share it with NLP researchers or within the Central Asian tech community to help find potential partners.
  • Development: If you are an expert in SFT or RLHF and want to collaborate on making "Bawir AI" more capable.

Model Details

  • Developed by: kdrnyzv890
  • Project Name: Bawir AI
  • Training Data: 150MB of raw Karakalpak text (news, literature, web content).
  • Training Depth: ~0.3 Epochs (Checkpoint 3000).
  • Fine-tuning Technique: LoRA (Low-Rank Adaptation).

Technical Specifications (LoRA)

  • Rank (r): 16
  • Alpha: 32
  • Target Modules: All Linear Layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj).
  • Dropout: 0.05

How to Use

You must load this adapter on top of the base Qwen/Qwen2.5-3B model.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

model_id = "Qwen/Qwen2.5-3B"
adapter_id = "kdrnyzv890/qwen2.5-3b-karakalpak-base"

# Load Tokenizer and Base Model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

# Load the Karakalpak Adapter
model = PeftModel.from_pretrained(model, adapter_id)

# Example: Text Completion
text = "Nókis qalası - Qaraqalpaqstan Respublikasınıń"
inputs = tokenizer(text, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=50)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
59
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kdrnyzv890/qwen2.5-3b-karakalpak-base

Base model

Qwen/Qwen2.5-3B
Adapter
(426)
this model