GAD-2-177M-SFT-Refined
Overview
GAD-2-177M-SFT-Refined is a specialized, small-scale language model developed by TopAI-IL.
This model is a significant evolution of the original GAD-2-Base. It has undergone Supervised Fine-Tuning (SFT) using a high-quality, curated dataset focused on educational content, STEM, and logical reasoning (derived from FineWeb-Edu).
During training, the model exhibited a fascinating phenomenon: instead of becoming a rigid chatbot, it evolved into a Highly Refined Base Model. It demonstrates superior linguistic coherence and factual density compared to standard models of its size (177M parameters).
Model Details
- Developed by: TopAI-IL
- Model Type: Causal Language Model (Transformer-based)
- Parameters: 177M
- Language(s): English
- License: Apache 2.0 (or your preferred license)
- Parent Model: GAD-2-Base
Key Features & Improvements
- Linguistic Generalization: Unlike many tiny models that suffer from repetitive loops or broken grammar, this model generates fluid, academic-level English prose.
- Knowledge-Dense: Internalized a vast array of facts across biology, physics, history, and computer science during the SFT phase.
- Logical Continuity: Shows an improved ability to maintain a "thread of thought" across multiple sentences, making it excellent for structured text generation.
- Optimized Latent Space: The SFT process acted as a "denoiser," shifting the model away from internet "junk" and toward high-quality, structured information.
Intended Use
- Speculative Decoding: An ideal candidate for serving as a "Draft Model" to accelerate much larger LLMs (e.g., Llama-3 70B).
- Edge Computing: Small enough to run on mobile devices, IoT, or browsers while delivering surprisingly high-quality text.
- Text Refinement: Excellent for expanding bullet points into full educational paragraphs or rewriting text with better flow.
- Educational Research: A transparent look at how SFT impacts "Reasoning" capabilities in sub-1B parameter models.
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "Raziel1234/GAD-2-177M-SFT-Preview"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
prompt = "User: Explain the significance of the Turing Test.\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=128,
repetition_penalty=1.2,
do_sample=True,
temperature=0.7
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 5
Model tree for Raziel1234/GAD-2-177M-SFT-Preview
Base model
Raziel1234/GAD-2