🧠 OmniGPT-355M (Knowledge Distillation from Chatbot Arena)

OmniGPT-355M is a Causal Decoder-Only Transformer model based on the robust gpt2-medium architecture. It represents an end-to-end MLOps and Model Finetuning project designed by Onur Demircioğlu.

The primary objective of this model is Teacher-Student Knowledge Distillation. By fine-tuning the 355M parameter architecture on the lmsys/chatbot_arena_conversations dataset, the model learns the highly complex reasoning, ethical alignment, and dialogue structure generated by state-of-the-art enormous language models (like GPT-4 and Claude 3).

🎛️ Model Details

Developed by: Onur Demircioğlu
Model Type: Causal Language Model (Transformer Decoder)
Parameters: 355 Million (n_layer=24, n_embd=1024, n_head=16)
Vocabulary Size: 50,257 (BPE Tokenizer)
Base Architecture: gpt2-medium
Language(s): Native English (En). Turkish (Tr) interactions are fully supported via a Bilateral Translation Pipeline configured on the Desktop Interface.

💾 Training Infrastructure & MLOps Optimization

This model was trained under strict GPU memory constraints (16GB VRAM) on Kaggle Notebooks utilizing Dual NVIDIA Tesla T4 GPUs.

To overcome Out-Of-Memory (OOM) failures:

Gradient Checkpointing was heavily integrated into the neural graph.
Memory-efficient Adafactor optimizer was utilized instead of standard AdamW.
Native FP16 precision was active.
Cloud-Persistent MLOps: Training checkpoints were automatically and seamlessly pushed to this Hugging Face Hub using custom Auto-Retry HTTP503 Bypasses.

📚 Dataset

The model digested 154,000 deep conversational rows from the LMSYS Chatbot Arena Conversations dataset. A recursive JSON digestion pipeline was developed to extract nested "content" values and feed them linearly to the causal language engine.

💻 How to use

You can hook this model up locally using transformers. It operates completely locally (offline) on your CPU/GPU once the .safetensors weights are fetched.

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("OnurDemircioglu/OmniGPT-355M")
model = AutoModelForCausalLM.from_pretrained("OnurDemircioglu/OmniGPT-355M")

prompt = "How can humans reach Mars?"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

⚠️ Limitations & Bias

As a 355M parameter model fine-tuned on arena data, it might confidently hallucinate and try to heavily mimic larger models by claiming to be a "large language model trained by OpenAI" due to the Chatbot Arena dataset distributions. Since it is still young in its epochs, it lacks robust guardrails.

This repository represents a monumental step in understanding PyTorch architecture, VRAM Optimization, huggingface-hub networking, and Matrix Multiplication logic parameters.

Downloads last month: 1,237

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for OnurDemircioglu/OmniGPT-355M

Base model

openai-community/gpt2-medium

Finetuned

(189)

this model

OnurDemircioglu
/

OmniGPT-355M