🧠 OmniGPT-355M (Knowledge Distillation from Chatbot Arena)
OmniGPT-355M is a Causal Decoder-Only Transformer model based on the robust gpt2-medium architecture. It represents an end-to-end MLOps and Model Finetuning project designed by Onur Demircioğlu.
The primary objective of this model is Teacher-Student Knowledge Distillation. By fine-tuning the 355M parameter architecture on the lmsys/chatbot_arena_conversations dataset, the model learns the highly complex reasoning, ethical alignment, and dialogue structure generated by state-of-the-art enormous language models (like GPT-4 and Claude 3).
🎛️ Model Details
- Developed by: Onur Demircioğlu
- Model Type: Causal Language Model (Transformer Decoder)
- Parameters: 355 Million (
n_layer=24,n_embd=1024,n_head=16) - Vocabulary Size: 50,257 (BPE Tokenizer)
- Base Architecture:
gpt2-medium - Language(s): Native English (En). Turkish (Tr) interactions are fully supported via a Bilateral Translation Pipeline configured on the Desktop Interface.
💾 Training Infrastructure & MLOps Optimization
This model was trained under strict GPU memory constraints (16GB VRAM) on Kaggle Notebooks utilizing Dual NVIDIA Tesla T4 GPUs.
To overcome Out-Of-Memory (OOM) failures:
- Gradient Checkpointing was heavily integrated into the neural graph.
- Memory-efficient Adafactor optimizer was utilized instead of standard AdamW.
- Native FP16 precision was active.
- Cloud-Persistent MLOps: Training checkpoints were automatically and seamlessly pushed to this Hugging Face Hub using custom Auto-Retry HTTP503 Bypasses.
📚 Dataset
The model digested 154,000 deep conversational rows from the LMSYS Chatbot Arena Conversations dataset. A recursive JSON digestion pipeline was developed to extract nested "content" values and feed them linearly to the causal language engine.
💻 How to use
You can hook this model up locally using transformers. It operates completely locally (offline) on your CPU/GPU once the .safetensors weights are fetched.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("OnurDemircioglu/OmniGPT-355M")
model = AutoModelForCausalLM.from_pretrained("OnurDemircioglu/OmniGPT-355M")
prompt = "How can humans reach Mars?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
inputs.input_ids,
max_new_tokens=100,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
⚠️ Limitations & Bias
As a 355M parameter model fine-tuned on arena data, it might confidently hallucinate and try to heavily mimic larger models by claiming to be a "large language model trained by OpenAI" due to the Chatbot Arena dataset distributions. Since it is still young in its epochs, it lacks robust guardrails.
This repository represents a monumental step in understanding PyTorch architecture, VRAM Optimization, huggingface-hub networking, and Matrix Multiplication logic parameters.
- Downloads last month
- 1,237
Model tree for OnurDemircioglu/OmniGPT-355M
Base model
openai-community/gpt2-medium