GAD-2-177M-SFT-Refined

Overview

GAD-2-177M-SFT-Refined is a specialized, small-scale language model developed by TopAI-IL.

This model is a significant evolution of the original GAD-2-Base. It has undergone Supervised Fine-Tuning (SFT) using a high-quality, curated dataset focused on educational content, STEM, and logical reasoning (derived from FineWeb-Edu).

During training, the model exhibited a fascinating phenomenon: instead of becoming a rigid chatbot, it evolved into a Highly Refined Base Model. It demonstrates superior linguistic coherence and factual density compared to standard models of its size (177M parameters).

Model Details

Developed by: TopAI-IL
Model Type: Causal Language Model (Transformer-based)
Parameters: 177M
Language(s): English
License: Apache 2.0 (or your preferred license)
Parent Model: GAD-2-Base

Key Features & Improvements

Linguistic Generalization: Unlike many tiny models that suffer from repetitive loops or broken grammar, this model generates fluid, academic-level English prose.
Knowledge-Dense: Internalized a vast array of facts across biology, physics, history, and computer science during the SFT phase.
Logical Continuity: Shows an improved ability to maintain a "thread of thought" across multiple sentences, making it excellent for structured text generation.
Optimized Latent Space: The SFT process acted as a "denoiser," shifting the model away from internet "junk" and toward high-quality, structured information.

Intended Use

Speculative Decoding: An ideal candidate for serving as a "Draft Model" to accelerate much larger LLMs (e.g., Llama-3 70B).
Edge Computing: Small enough to run on mobile devices, IoT, or browsers while delivering surprisingly high-quality text.
Text Refinement: Excellent for expanding bullet points into full educational paragraphs or rewriting text with better flow.
Educational Research: A transparent look at how SFT impacts "Reasoning" capabilities in sub-1B parameter models.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Raziel1234/GAD-2-177M-SFT-Preview"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

prompt = "User: Explain the significance of the Turing Test.\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs, 
        max_new_tokens=128, 
        repetition_penalty=1.2,
        do_sample=True,
        temperature=0.7
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 5

Safetensors

Model size

0.2B params

Tensor type

BF16

Model tree for Raziel1234/GAD-2-177M-SFT-Preview

Base model

Raziel1234/GAD-2

Finetuned

(1)

this model