Qwen2.5-3B-Karakalpak-Base (Checkpoint 3000)
This repository contains a LoRA adapter for Qwen2.5-3B, specifically fine-tuned on a Karakalpak language corpus.
⚠️ Important: Base Model Status
This is a Continued Pre-training (CPT) adapter.
- Purpose: It has been exposed to ~150MB of raw Karakalpak text to learn grammar, vocabulary, and linguistic nuances.
- Behavior: This model is designed for text completion. It is NOT instruction-tuned. It will not respond to "Chat" prompts (e.g., "Tell me a story") reliably yet.
- Next Steps: This serves as the foundation for future instruction-tuning (SFT).
🛑 Support & Collaboration
Training on this project was paused at Checkpoint 3000 due to a lack of computational budget. As an independent developer working on a low-resource language, the costs of GPU power have outpaced my personal resources. To reach a full 1.0+ Epoch and proceed to the Instruction Tuning (Chat) phase, I am seeking support from the community.
How you can help:
- Computational Power: If you have spare GPU credits (H100/A100/A10G) or access to a compute cluster and would like to sponsor the next training run.
- Data Contribution: Cleaned Karakalpak datasets, specifically instruction-response pairs, are vital for the next phase.
- Open Source Support: Star this repository and share it with NLP researchers or within the Central Asian tech community to help find potential partners.
- Development: If you are an expert in SFT or RLHF and want to collaborate on making "Bawir AI" more capable.
Model Details
- Developed by: kdrnyzv890
- Project Name: Bawir AI
- Training Data: 150MB of raw Karakalpak text (news, literature, web content).
- Training Depth: ~0.3 Epochs (Checkpoint 3000).
- Fine-tuning Technique: LoRA (Low-Rank Adaptation).
Technical Specifications (LoRA)
- Rank (r): 16
- Alpha: 32
- Target Modules: All Linear Layers (
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj). - Dropout: 0.05
How to Use
You must load this adapter on top of the base Qwen/Qwen2.5-3B model.
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
model_id = "Qwen/Qwen2.5-3B"
adapter_id = "kdrnyzv890/qwen2.5-3b-karakalpak-base"
# Load Tokenizer and Base Model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
# Load the Karakalpak Adapter
model = PeftModel.from_pretrained(model, adapter_id)
# Example: Text Completion
text = "Nókis qalası - Qaraqalpaqstan Respublikasınıń"
inputs = tokenizer(text, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 59
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for kdrnyzv890/qwen2.5-3b-karakalpak-base
Base model
Qwen/Qwen2.5-3B