🧠 OmniGPT-355M (Knowledge Distillation from Chatbot Arena)

OmniGPT-355M is a Causal Decoder-Only Transformer model based on the robust gpt2-medium architecture. It represents an end-to-end MLOps and Model Finetuning project designed by Onur Demircioğlu.

The primary objective of this model is Teacher-Student Knowledge Distillation. By fine-tuning the 355M parameter architecture on the lmsys/chatbot_arena_conversations dataset, the model learns the highly complex reasoning, ethical alignment, and dialogue structure generated by state-of-the-art enormous language models (like GPT-4 and Claude 3).

🎛️ Model Details

  • Developed by: Onur Demircioğlu
  • Model Type: Causal Language Model (Transformer Decoder)
  • Parameters: 355 Million (n_layer=24, n_embd=1024, n_head=16)
  • Vocabulary Size: 50,257 (BPE Tokenizer)
  • Base Architecture: gpt2-medium
  • Language(s): Native English (En). Turkish (Tr) interactions are fully supported via a Bilateral Translation Pipeline configured on the Desktop Interface.

💾 Training Infrastructure & MLOps Optimization

This model was trained under strict GPU memory constraints (16GB VRAM) on Kaggle Notebooks utilizing Dual NVIDIA Tesla T4 GPUs.

To overcome Out-Of-Memory (OOM) failures:

  1. Gradient Checkpointing was heavily integrated into the neural graph.
  2. Memory-efficient Adafactor optimizer was utilized instead of standard AdamW.
  3. Native FP16 precision was active.
  4. Cloud-Persistent MLOps: Training checkpoints were automatically and seamlessly pushed to this Hugging Face Hub using custom Auto-Retry HTTP503 Bypasses.

📚 Dataset

The model digested 154,000 deep conversational rows from the LMSYS Chatbot Arena Conversations dataset. A recursive JSON digestion pipeline was developed to extract nested "content" values and feed them linearly to the causal language engine.

💻 How to use

You can hook this model up locally using transformers. It operates completely locally (offline) on your CPU/GPU once the .safetensors weights are fetched.

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("OnurDemircioglu/OmniGPT-355M")
model = AutoModelForCausalLM.from_pretrained("OnurDemircioglu/OmniGPT-355M")

prompt = "How can humans reach Mars?"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

⚠️ Limitations & Bias

As a 355M parameter model fine-tuned on arena data, it might confidently hallucinate and try to heavily mimic larger models by claiming to be a "large language model trained by OpenAI" due to the Chatbot Arena dataset distributions. Since it is still young in its epochs, it lacks robust guardrails.


This repository represents a monumental step in understanding PyTorch architecture, VRAM Optimization, huggingface-hub networking, and Matrix Multiplication logic parameters.

Downloads last month
1,237
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OnurDemircioglu/OmniGPT-355M

Finetuned
(189)
this model

Dataset used to train OnurDemircioglu/OmniGPT-355M